Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

The BERT model may take many other arguments, and we’re using three of them toget richer outputs:bert_model.eval()out = bert_model(input_ids=tokens['input_ids'],attention_mask=tokens['attention_mask'],output_attentions=True,output_hidden_states=True,return_dict=True)out.keys()Outputodict_keys(['last_hidden_state', 'pooler_output', 'hidden_states','attentions'])Let’s see what’s inside each of these four outputs:• last_hidden_state is returned by default and is the most important output ofall: It contains the final hidden states for each and every token in the input,which can be used as contextual word embeddings.Figure 11.28 - Word embeddings from BERT’s last layerDon’t forget that the first token is the special classifier token[CLS] and that there may be padding ([PAD]) and separator([SEP]) tokens as well!BERT | 981

last_hidden_batch = out['last_hidden_state']last_hidden_sentence = last_hidden_batch[0]# Removes hidden states for [PAD] tokens using the maskmask = tokens['attention_mask'].squeeze().bool()embeddings = last_hidden_sentence[mask]# Removes embeddings for the first [CLS] and last [SEP] tokensembeddings[1:-1]Outputtensor([[ 0.0100, 0.8575, -0.5429, ..., 0.4241, -0.2035],[-0.3705, 1.1001, 0.3326, ..., 0.0656, -0.5644],[-0.2947, 0.5797, 0.1997, ..., -0.3062, 0.6690],...,[ 0.0691, 0.7393, 0.0552, ..., -0.4896, -0.4832],[-0.1566, 0.6177, 0.1536, ..., 0.0904, -0.4917],[ 0.7511, 0.3110, -0.3116, ..., -0.1740, -0.2337]],grad_fn=<SliceBackward>)The flair library is doing exactly that under its hood! We can use ourget_embeddings() function to get embeddings for our sentence using the wrapperfor BERT from flair:get_embeddings(bert_flair, sentence)Outputtensor([[ 0.0100, 0.8575, -0.5429, ..., 0.4241, -0.2035],[-0.3705, 1.1001, 0.3326, ..., 0.0656, -0.5644],[-0.2947, 0.5797, 0.1997, ..., -0.3062, 0.6690],...,[ 0.0691, 0.7393, 0.0552, ..., -0.4896, -0.4832],[-0.1566, 0.6177, 0.1536, ..., 0.0904, -0.4917],[ 0.7511, 0.3110, -0.3116, ..., -0.1740, -0.2337]],device='cuda:0')Perfect match!982 | Chapter 11: Down the Yellow Brick Rabbit Hole

last_hidden_batch = out['last_hidden_state']

last_hidden_sentence = last_hidden_batch[0]

# Removes hidden states for [PAD] tokens using the mask

mask = tokens['attention_mask'].squeeze().bool()

embeddings = last_hidden_sentence[mask]

# Removes embeddings for the first [CLS] and last [SEP] tokens

embeddings[1:-1]

Output

tensor([[ 0.0100, 0.8575, -0.5429, ..., 0.4241, -0.2035],

[-0.3705, 1.1001, 0.3326, ..., 0.0656, -0.5644],

[-0.2947, 0.5797, 0.1997, ..., -0.3062, 0.6690],

...,

[ 0.0691, 0.7393, 0.0552, ..., -0.4896, -0.4832],

[-0.1566, 0.6177, 0.1536, ..., 0.0904, -0.4917],

[ 0.7511, 0.3110, -0.3116, ..., -0.1740, -0.2337]],

grad_fn=<SliceBackward>)

The flair library is doing exactly that under its hood! We can use our

get_embeddings() function to get embeddings for our sentence using the wrapper

for BERT from flair:

get_embeddings(bert_flair, sentence)

Output

tensor([[ 0.0100, 0.8575, -0.5429, ..., 0.4241, -0.2035],

[-0.3705, 1.1001, 0.3326, ..., 0.0656, -0.5644],

[-0.2947, 0.5797, 0.1997, ..., -0.3062, 0.6690],

...,

[ 0.0691, 0.7393, 0.0552, ..., -0.4896, -0.4832],

[-0.1566, 0.6177, 0.1536, ..., 0.0904, -0.4917],

[ 0.7511, 0.3110, -0.3116, ..., -0.1740, -0.2337]],

device='cuda:0')

Perfect match!

982 | Chapter 11: Down the Yellow Brick Rabbit Hole

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!