22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The three embeddings are there: word, position, and segment (named

token_type_embeddings). Let’s go over each of them:

token_embeddings = input_embeddings.word_embeddings

token_embeddings

Output

Embedding(30522, 768, padding_idx=0)

The word / token embedding layer has 30,522 entries, the size of BERT’s

vocabulary, and it has 768 hidden dimensions. As usual, embeddings will be

returned by each token ID in the input:

input_token_emb = token_embeddings(tokens['input_ids'])

input_token_emb

Output

tensor([[[ 1.3630e-02, -2.6490e-02, ..., 7.1340e-03, 1.5147e-02],

...,

[-1.4521e-02, -9.9615e-03, ..., 4.6379e-03, -1.5378e-03]]],

grad_fn=<EmbeddingBackward>)

Since each input may have up to 512 tokens, the position embedding layer has

exactly that number of entries:

position_embeddings = input_embeddings.position_embeddings

position_embeddings

Output

Embedding(512, 768)

Each sequentially numbered position, up to the total length of the input, will return

its corresponding embedding:

972 | Chapter 11: Down the Yellow Brick Rabbit Hole

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!