22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

model architectures: continuous bag-of-words (CBoW) and skip-gram (SG). We’re

focusing on the former.

In the CBoW architecture, the target is the central word. In other words, we’re

dealing with a multiclass classification problem where the number of classes is

given by the size of the vocabulary (any word in the vocabulary can be the central

word). And we’ll be using the context words, better yet, their corresponding

embeddings (vectors), as inputs.

Figure 11.11 - Target and context words

"Wait, how come we’re using the embeddings as inputs? That’s what

we’re trying to learn in the first place!"

Exactly! The embeddings are also parameters of the model, and, as such, they are

randomly initialized as well. As training progresses, their weights are being

updated by gradient descent like any other parameter, and, in the end, we’ll have

embeddings for each word in the vocabulary.

For each pair of context words and corresponding target, the model will average

the embeddings of the context words and feed the result to a linear layer that will

compute one logit for each word in the vocabulary. That’s it! Let’s check the

corresponding code:

class CBOW(nn.Module):

def __init__(self, vocab_size, embedding_size):

super().__init__()

self.embedding = nn.Embedding(vocab_size, embedding_size)

self.linear = nn.Linear(embedding_size, vocab_size)

def forward(self, X):

embeddings = self.embedding(X)

bow = embeddings.mean(dim=1)

logits = self.linear(bow)

return logits

920 | Chapter 11: Down the Yellow Brick Rabbit Hole

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!