pdfcoffee

soumyasankar99
from soumyasankar99 More from this publisher
09.05.2023 Views

Chapter 7Word embeddings are based on the distributional hypothesis, which states thatwords that occur in similar contexts tend to have similar meanings. Hence the classof word embedding-based encodings are also known as distributed representations,which we will talk about next.Distributed representationsDistributed representations attempt to capture the meaning of a word byconsidering its relations with other words in its context. The idea behind thedistributed hypothesis is captured in this quote from J. R. Firth, a linguist whofirst proposed this idea:"You shall know a word by the company it keeps."How does this work? By way of example, consider the following pair of sentences:Paris is the capital of France.Berlin is the capital of Germany.Even assuming no knowledge of world geography, the sentence pair implies somesort of relationship between the entities Paris, France, Berlin, and Germany thatcould be represented as:"Paris" is to "France" as "Berlin" is to "Germany"Distributed representations are based on the idea that there exists sometransformation φφ such that:φφ("Paris") − φφ("France") ≈ φφ("Berlin") − φφ("GGGGGGGGGGGGGG")In other words, a distributed embedding space is one where words that are usedin similar contexts are close to one another. Therefore, similarity between the wordvectors in this space would roughly correspond to the semantic similarity betweenthe words.Figure 1 shows a TensorBoard visualization of word embedding of words around theword "important" in the embedding space. As you can see, the neighbors of the wordtend to be closely related, or interchangeable with the original word.[ 233 ]

Word EmbeddingsFor example, "crucial" is virtually a synonym, and it is easy to see how the words"historical" or "valuable" could be substituted in certain situations:Figure 1: Visualization of nearest neighbors of the word "important" in a word embedding dataset,from the TensorFlow Embedding Guide (https://www.tensorflow.org/guide/embedding)In the next section we will look at various types of distributed representations(or word embeddings).Static embeddingsStatic embeddings are the oldest type of word embedding. The embeddings aregenerated against a large corpus but the number of words, though large, is finite.You can think of a static embedding as a dictionary, with words as the keys andtheir corresponding vector as the value. If you have a word whose embeddingneeds to be looked up that was not in the original corpus, then you are out of luck.In addition, a word has the same embedding regardless of how it is used, so staticembeddings cannot address the problem of polysemy, that is, words with multiplemeanings. We will explore this issue further when we cover non-static embeddingslater in this chapter.[ 234 ]

Chapter 7

Word embeddings are based on the distributional hypothesis, which states that

words that occur in similar contexts tend to have similar meanings. Hence the class

of word embedding-based encodings are also known as distributed representations,

which we will talk about next.

Distributed representations

Distributed representations attempt to capture the meaning of a word by

considering its relations with other words in its context. The idea behind the

distributed hypothesis is captured in this quote from J. R. Firth, a linguist who

first proposed this idea:

"You shall know a word by the company it keeps."

How does this work? By way of example, consider the following pair of sentences:

Paris is the capital of France.

Berlin is the capital of Germany.

Even assuming no knowledge of world geography, the sentence pair implies some

sort of relationship between the entities Paris, France, Berlin, and Germany that

could be represented as:

"Paris" is to "France" as "Berlin" is to "Germany"

Distributed representations are based on the idea that there exists some

transformation φφ such that:

φφ("Paris") − φφ("France") ≈ φφ("Berlin") − φφ("GGGGGGGGGGGGGG")

In other words, a distributed embedding space is one where words that are used

in similar contexts are close to one another. Therefore, similarity between the word

vectors in this space would roughly correspond to the semantic similarity between

the words.

Figure 1 shows a TensorBoard visualization of word embedding of words around the

word "important" in the embedding space. As you can see, the neighbors of the word

tend to be closely related, or interchangeable with the original word.

[ 233 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!