22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Output

tensor([[[ 0.0004, 0.0110, 0.0037, ..., -0.0034, -0.0086],

[ 0.0004, 0.0110, 0.0037, ..., -0.0034, -0.0086],

[ 0.0004, 0.0110, 0.0037, ..., -0.0034, -0.0086],

...,

[ 0.0011, -0.0030, -0.0032, ..., -0.0052, -0.0112],

[ 0.0011, -0.0030, -0.0032, ..., -0.0052, -0.0112],

[ 0.0011, -0.0030, -0.0032, ..., -0.0052, -0.0112]]],

grad_fn=<EmbeddingBackward>)

Since the first tokens, up to and including the first separator, belong to the first

sentence, they will all have the same (first) segment embedding values. The tokens

after the first separator, up to and including the last token, will have the same

(second) segment embedding values.

Finally, BERT adds up all three embeddings (token, position, and segment):

input_emb = input_token_emb + input_pos_emb + input_seg_emb

input_emb

Output

tensor([[[ 0.0316, -0.0411, -0.0564, ..., 0.0044, 0.0219],

[-0.0615, -0.0750, -0.0107, ..., 0.0482, -0.0277],

[-0.0469, -0.0156, -0.0336, ..., 0.0135, 0.0109],

...,

[-0.0081, -0.0051, -0.0172, ..., -0.0103, 0.0083],

[-0.0425, -0.0756, -0.0414, ..., -0.0180, -0.0060],

[-0.0138, -0.0138, -0.0194, ..., -0.0011, -0.0133]]],

grad_fn=<AddBackward0>)

It will still layer normalize the embeddings and apply dropout to them, but that’s

it—these are the inputs BERT uses.

Now, let’s take a look at its…

974 | Chapter 11: Down the Yellow Brick Rabbit Hole

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!