22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

You got that right—arithmetic—really! Maybe you’ve seen this "equation"

somewhere else already:

KING - MAN + WOMAN = QUEEN

Awesome, right? We’ll try this "equation" out shortly, hang in there!

Pre-trained Word2Vec

Word2Vec is a simple model but it still requires a sizable amount of text data to

learn meaningful embeddings. Luckily for us, someone else had already done the

hard work of training these models, and we can use Gensim’s downloader to choose

from a variety of pre-trained word embeddings.

For a detailed list of the available models (embeddings), please

check Gensim-data’s repository [184] on GitHub.

"Why so many embeddings? How are they different from each other?"

Good question! It turns out, using different text corpora to train a Word2Vec

model produces different embeddings. On the one hand, this shouldn’t be a

surprise; after all, these are different datasets and it’s expected that they will

produce different results. On the other hand, if these datasets all contain

sentences in the same language (English, for example), how come the embeddings

are different?

The embeddings will be influenced by the kind of language used in the text: The

phrasing and wording used in novels are different from those used in news articles

and radically different from those used on Twitter, for example.

"Choose your word embeddings wisely."

Grail Knight

Moreover, not every word embedding is learned using a Word2Vec model

architecture. There are many different ways of learning word embeddings, one of

them being…

Word Embeddings | 927

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!