pdfcoffee

soumyasankar99
from soumyasankar99 More from this publisher
09.05.2023 Views

Chapter 71971 5443 0.000 0.3481971 1377 0.000 0.3481971 3682 0.017 0.3281971 51 0.022 0.3221971 857 0.000 0.3181971 1161 0.000 0.3131971 4971 0.000 0.3131971 5168 0.000 0.3121971 3099 0.000 0.3111971 462 0.000 0.310The code for this embedding strategy is available in neurips_papers_node2vec.pyin the source code folder accompanying this chapter. Next, we will move on to lookat character and subword embeddings.Character and subword embeddingsAnother evolution of the basic word embedding strategy has been to look atcharacter and subword embeddings instead of word embeddings. Characterlevel embeddings were first proposed by Xiang and LeCun [17], and foundto have some key advantages over word embeddings.First, a character vocabulary is finite and small – for example, a vocabulary forEnglish would contain around 70 characters (26 characters, 10 numbers, and restspecial characters), leading to character models that are also small and compact.Second, unlike word embeddings, which provide vectors for a large but finite setof words, there is no concept of out-of-vocabulary for character embeddings, sinceany word can be represented by the vocabulary. Third, character embeddings tendto be better for rare and misspelled words because there is much less imbalance forcharacter inputs than for word inputs.Character embeddings tend to work better for applications that require the notionof syntactic rather than semantic similarity. However, unlike word embeddings,character embeddings tend to be task-specific and are usually generated inlinewithin a network to support the task. For this reason, third party characterembeddings are generally not available.Subword embeddings combine the idea of character and word embeddings bytreating a word as a bag of character n-grams, that is, sequences of n consecutivewords. They were first proposed by Bojanowski, et al. [18] based on research fromFacebook AI Research (FAIR), which they later released as fastText embeddings.fastText embeddings are available for 157 languages, including English. The paperhas reported state of the art performance on a number of NLP tasks.[ 259 ]

Word EmbeddingsfastText computes embeddings for character n-grams where n is between 3 and6 characters (default settings, can be changed), as well as for the words themselves.For example, character n-grams for n=3 for the word "green" would be "<gr", "gre","ree", "een", and "en>". Beginning and end of words are marked with "<" and ">"characters respectively, to distinguish between short words and their n-gramssuch as "<cat>" and "cat".During lookup, you can look up a vector from the fastText embedding using theword as the key if the word exists in the embedding. However, unlike traditionalword embeddings, you can still construct a fastText vector for a word that does notexist in the embedding. This is done by decomposing the word into its constituenttrigram subwords as shown in the preceding example, looking up the vectors forthe subwords, and then taking the average of these subword vectors. The fastTextPython API [19] will do this automatically, but you will need to do this manually ifyou use other APIs to access fastText word embeddings, such as gensim or NumPy.Next up, we will look at Dynamic embeddings.Dynamic embeddingsSo far, all the embeddings we have considered have been static; that is, they aredeployed as a dictionary of words (and subwords) mapped to fixed dimensionalvectors. The vector corresponding to a word in these embeddings is going to bethe same regardless of whether it is being used as a noun or verb in the sentence,for example the word "ensure" (the name of a health supplement when used asa noun, and to make certain when used as a verb). It also provides the same vectorfor polysemous words or words with multiple meanings, such as "bank" (whichcan mean different things depending on whether it co-occurs with the word"money" or "river"). In both cases, the meaning of the word changes dependingon clues available in its context, the sentence. Dynamic embeddings attempt touse these signals to provide different vectors for words based on its context.Dynamic embeddings are deployed as trained networks that convert your input(typically a sequence of one-hot vectors) into a lower dimensional dense fixed-sizeembedding by looking at the entire sequence, not just individual words. You caneither preprocess your input to this dense embedding and then use this as input toyour task-specific network, or wrap the network and treat it similar to the tf.keras.layers.Embedding layer for static embeddings. Using a dynamic embeddingnetwork in this way is usually much more expensive compared to generating itahead of time (the first option), or using traditional embeddings.[ 260 ]

Chapter 7

1971 5443 0.000 0.348

1971 1377 0.000 0.348

1971 3682 0.017 0.328

1971 51 0.022 0.322

1971 857 0.000 0.318

1971 1161 0.000 0.313

1971 4971 0.000 0.313

1971 5168 0.000 0.312

1971 3099 0.000 0.311

1971 462 0.000 0.310

The code for this embedding strategy is available in neurips_papers_node2vec.py

in the source code folder accompanying this chapter. Next, we will move on to look

at character and subword embeddings.

Character and subword embeddings

Another evolution of the basic word embedding strategy has been to look at

character and subword embeddings instead of word embeddings. Character

level embeddings were first proposed by Xiang and LeCun [17], and found

to have some key advantages over word embeddings.

First, a character vocabulary is finite and small – for example, a vocabulary for

English would contain around 70 characters (26 characters, 10 numbers, and rest

special characters), leading to character models that are also small and compact.

Second, unlike word embeddings, which provide vectors for a large but finite set

of words, there is no concept of out-of-vocabulary for character embeddings, since

any word can be represented by the vocabulary. Third, character embeddings tend

to be better for rare and misspelled words because there is much less imbalance for

character inputs than for word inputs.

Character embeddings tend to work better for applications that require the notion

of syntactic rather than semantic similarity. However, unlike word embeddings,

character embeddings tend to be task-specific and are usually generated inline

within a network to support the task. For this reason, third party character

embeddings are generally not available.

Subword embeddings combine the idea of character and word embeddings by

treating a word as a bag of character n-grams, that is, sequences of n consecutive

words. They were first proposed by Bojanowski, et al. [18] based on research from

Facebook AI Research (FAIR), which they later released as fastText embeddings.

fastText embeddings are available for 157 languages, including English. The paper

has reported state of the art performance on a number of NLP tasks.

[ 259 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!