09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Word Embeddings

Once trained, we save our fine-tuned model:

model.save_pretrained(FINE_TUNED_MODEL_DIR)

In order to predict that a pair of sentences are paraphrases of one another, we will

load the model back as a PyTorch model. The from_tf=True parameter indicates

that the saved model is a TensorFlow checkpoint. Note that it does not seem to

be possible at the moment to deserialize a TensorFlow checkpoint directly into

a TensorFlow Transformer model:

saved_model = BertForSequenceClassification.from_pretrained(FINE_

TUNED_MODEL_DIR, from_tf=True)

We then test our saved model using sentence pairs (sentence_0, sentence_1)

which are paraphrases of each other, and (sentence_0, sentence_2) which are not:

def print_result(id1, id2, pred):

if pred == 1:

print("sentence_1 is a paraphrase of sentence_0")

else:

print("sentence_1 is not a paraphrase of sentence_0")

sentence_0 = "At least 12 people were killed in the battle last week."

sentence_1 = "At least 12 people lost their lives in last weeks

fighting."

sentence_2 = "The fires burnt down the houses on the street."

inputs_1 = tokenizer.encode_plus(sentence_0, sentence_1,

add_special_tokens=False, return_tensors="pt")

inputs_2 = tokenizer.encode_plus(sentence_0, sentence_2,

add_special_tokens=False, return_tensors="pt")

pred_1 = saved_model(**inputs_1)[0].argmax().item()

pred_2 = saved_model(**inputs_2)[0].argmax().item()

print_result(0, 1, pred_1)

print_result(0, 2, pred_2)

As expected, the output of this code snippet is as follows:

sentence_1 is a paraphrase of sentence_0

sentence_1 is not a paraphrase of sentence_0

[ 274 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!