22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The BERT model may take many other arguments, and we’re using three of them to

get richer outputs:

bert_model.eval()

out = bert_model(input_ids=tokens['input_ids'],

attention_mask=tokens['attention_mask'],

output_attentions=True,

output_hidden_states=True,

return_dict=True)

out.keys()

Output

odict_keys(['last_hidden_state', 'pooler_output', 'hidden_states',

'attentions'])

Let’s see what’s inside each of these four outputs:

• last_hidden_state is returned by default and is the most important output of

all: It contains the final hidden states for each and every token in the input,

which can be used as contextual word embeddings.

Figure 11.28 - Word embeddings from BERT’s last layer

Don’t forget that the first token is the special classifier token

[CLS] and that there may be padding ([PAD]) and separator

([SEP]) tokens as well!

BERT | 981

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!