22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

We’re not using collators in our example, but they can be used

together with HuggingFace’s Trainer (more on that in the "Fine-

Tuning with HuggingFace" section) if you’re into training some

BERT from scratch on the MLM task.

But that’s not the only thing that BERT is trained to do…

Next Sentence Prediction (NSP)

The second pre-training task is a binary classification task: BERT was trained to

predict if a second sentence is actually the next sentence in the original text or

not. The purpose of this task is to give BERT the ability to understand the

relationship between sentences, which can be useful for some of the tasks BERT

can be fine-tuned for, like question answering.

So, BERT takes two sentences as inputs (with the special separator token [SEP]

between them):

• 50% of the time, the second sentence is indeed the next sentence (the positive

class).

• 50% of the time, the second sentence is a randomly chosen one (the negative

class).

This task uses the special classifier token [CLS], taking the values of the

corresponding final hidden state as features for a classifier. For example, let’s take

two sentences and tokenize them:

sentence1 = 'alice follows the white rabbit'

sentence2 = 'follow the white rabbit neo'

bert_tokenizer(sentence1, sentence2, return_tensors='pt')

Output

{'input_ids': tensor([[ 101, 5650, 4076, 1996, 2317, 10442,

102, 3582, 1996, 2317, 10442, 9253, 102]]),

'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1,

1]]),

'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1]])}

978 | Chapter 11: Down the Yellow Brick Rabbit Hole

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!