09.05.2023 Views

pdfcoffee

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 7

Word2Vec

The models known as Word2Vec were first created in 2013 by a team of researchers

at Google led by Tomas Mikolov [1, 2, 3]. The models are self-supervised, that is,

they are supervised models that depend on the structure of natural language to

provide labeled training data.

The two architectures for Word2Vec are as follows:

• Continuous Bag of Words (CBOW)

• Skip-gram

Figure 2: Architecture of the CBOW and Skip-gram Word2Vec models

In the CBOW architecture, the model predicts the current word given a window

of surrounding words. The order of context words do not influence the prediction

(that is, the bag of words assumption, hence the name). In the skip-gram architecture,

the model predicts the surrounding words given the context word. According to

the Word2Vec website, CBOW is faster but skip-gram does a better job at predicting

infrequent words.

Figure 2 summarizes the CBOW and skip-gram architectures. To understand the

inputs and outputs, consider the following example sentence:

The Earth travels around the Sun once per year.

[ 235 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!