22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Pre-trained PyTorch Embeddings

The embedding layer in PyTorch, nn.Embedding, can be either trained like any

other layer or loaded using its from_pretrained() method. Let’s load the extended

version of the pre-trained GloVe embeddings:

extended_embeddings = torch.as_tensor(extended_embeddings).float()

torch_embeddings = nn.Embedding.from_pretrained(extended_embeddings)

By default, the embeddings are frozen; that is, they won’t be

updated during model training. You can change this behavior by

setting the freeze argument to False, though.

Then, let’s take the first mini-batch of tokenized sentences and their labels:

token_ids, labels = next(iter(train_loader))

token_ids

Output

tensor([[ 36, 63, 1, ..., 0, 0, 0],

[ 934, 16, 14, ..., 0, 0, 0],

[ 57, 311, 8, ..., 140, 3, 83],

...,

[7101, 59, 1536, ..., 0, 0, 0],

[ 43, 59, 1995, ..., 0, 0, 0],

[ 102, 41, 210, ..., 685, 3, 7]])

There are 32 sentences of 60 tokens each. We can use this batch of token IDs to

retrieve their corresponding embeddings:

token_embeddings = torch_embeddings(token_ids)

token_embeddings.shape

Output

torch.Size([32, 60, 50])

Word Embeddings | 939

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!