09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Word Embeddings

Deep learning models, like other machine learning models, typically don't work

directly with text; the text needs to be converted to numbers instead. The process

of converting text to numbers is a process called vectorization. An early technique for

vectorizing words was one-hot encoding, which you have learned about in Chapter 1,

Neural Network Foundations with TensorFlow 2.0. As you will recall, a major problem

with one-hot encoding is that it treats each word as completely independent from all

the others, since similarity between any two words (measured by the dot product of

the two-word vectors) is always zero.

The dot product is an algebraic operation that operates on two

vectors a = [a 1

, …, a N

] and b = [b 1

,…, b N

] of equal length and returns

a number. It is also known as the inner product or scalar product:

NN

aa bb = ∑ aa ii bb ii = aa ii bb ii + ⋯ + aa NN bb NN

ii=1

Why is the dot product of one-hot vectors of two words always

0? Consider two words w i

and w j

. Assuming a vocabulary size

of V, their corresponding one-hot vectors are a zero vector of

rank V with positions i and j set to 1. When combined using the

dot product operation, the 1 in a[i] is multiplied by 0 in b[j], and

1 in b[j] is multiplied by 0 in a[j], and all other elements in both

vectors are 0, so the resulting dot product is also 0.

To overcome the limitations of one-hot encoding, the NLP community has borrowed

techniques from Information Retrieval (IR) to vectorize text using the document as

the context. Notable techniques are Term Frequency-Inverse Document Frequency

(TF-IDF) [36], Latent Semantic Analysis (LSA) [37], and topic modeling [38]. These

representations attempt to capture a document centric idea of semantic similarity

between words. Of these, one-hot and TF-IDF are relatively sparse embeddings,

since vocabularies are usually quite large, and a word is unlikely to occur in more

than a few documents in the corpus.

Development of word embedding techniques began around 2000. These techniques

differ from previous IR-based techniques in that they use neighboring words as their

context, leading to a more natural semantic similarity from a human understanding

perspective. Today, word embedding is a foundational technique for all kinds of

NLP tasks, such as text classification, document clustering, part of speech tagging,

named entity recognition, sentiment analysis, and many more. Word embeddings

result in dense, low-dimensional vectors, and along with LSA and topic models

can be thought of as a vector of latent features for the word.

[ 232 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!