Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
Output(tensor(-0.5047, device='cuda:0'), tensor(-0.5047, device='cuda:0'))Since the classical word embeddings are context-independent, it also means thatboth uses of "watch" have exactly the same values in their first 1,024dimensions:(token_watch1.embedding[:1024] ==token_watch2.embedding[:1024]).all()Outputtensor(True, device='cuda:0')Contextual Word Embeddings | 955
GloVeGloVe embeddings are not contextual, as you already know, but they canalso be easily retrieved using WordEmbeddings from flair:from flair.embeddings import WordEmbeddingsglove_embedding = WordEmbeddings('glove')Now, let’s retrieve the word embeddings for our sentences, but first, and thisis very important, we need to create new Sentence objects for them:new_flair_sentences = [Sentence(s) for s in sentences]glove_embedding.embed(new_flair_sentences)Output[Sentence: "The Hatter was the first to break the silence . `What day of the month is it ? ' he said , turning to Alice :he had taken his watch out of his pocket , and was looking atit uneasily , shaking it every now and then , and holding itto his ear ." [ Tokens: 58],Sentence: "Alice thought this a very curious thing , and shewent nearer to watch them , and just as she came up to themshe heard one of them say , ` Look out now , Five ! Do n't gosplashing paint over me like that !" [ Tokens: 48]]Never reuse a Sentence object to retrieve different wordembeddings! The embedding attribute may be partiallyoverwritten (depending on the number of dimensions), andyou may end up with mixed embeddings (e.g., 3,072dimensions from ELMo, but the first 100 values areoverwritten by GloVe embeddings).Since GloVe is not contextual, the word "watch" will have the sameembedding regardless of which sentence you retrieve it from:956 | Chapter 11: Down the Yellow Brick Rabbit Hole
- Page 930 and 931: Once we’re happy with the size an
- Page 932 and 933: from transformers import BertTokeni
- Page 934 and 935: "What about the separation token?"T
- Page 936 and 937: The last output, attention_mask, wo
- Page 938 and 939: Outputtensor([[ 3, 27, 1, ..., 0, 0
- Page 940 and 941: vector, right? And our vocabulary i
- Page 942 and 943: Maybe you filled this blank in with
- Page 944 and 945: Continuous Bag-of-Words (CBoW)In th
- Page 946 and 947: That’s a fairly simple model, rig
- Page 948 and 949: Figure 11.13 - Continuous bag-of-wo
- Page 950 and 951: Figure 11.15 - Reviewing restaurant
- Page 952 and 953: You got that right—arithmetic—r
- Page 954 and 955: There we go, 50 dimensions! It’s
- Page 956 and 957: Equation 11.1 - Embedding arithmeti
- Page 958 and 959: Only 82 out of 50,802 words in the
- Page 960 and 961: Now we can use its encode() method
- Page 962 and 963: Model I — GloVE + ClassifierData
- Page 964 and 965: Pre-trained PyTorch EmbeddingsThe e
- Page 966 and 967: Model Configuration & TrainingLet
- Page 968 and 969: 6 self.encoder = encoder7 self.mlp
- Page 970 and 971: Figure 11.20 - Losses—Transformer
- Page 972 and 973: Outputtensor([[[2.6334e-01, 6.9912e
- Page 974 and 975: I want to introduce you to…ELMoBo
- Page 976 and 977: OutputToken: 32 watchThe get_token(
- Page 978 and 979: Helper Function to Retrieve Embeddi
- Page 982 and 983: torch.all(new_flair_sentences[0].to
- Page 984 and 985: Outputtensor(0.3504, device='cuda:0
- Page 986 and 987: We can leverage this fact to slight
- Page 988 and 989: We can easily get the embeddings fo
- Page 990 and 991: Figure 11.24 - Losses—simple clas
- Page 992 and 993: We can inspect the pre-trained mode
- Page 994 and 995: Every word piece is prefixed with #
- Page 996 and 997: far, our models used these embeddin
- Page 998 and 999: position_ids = torch.arange(512).ex
- Page 1000 and 1001: Pre-training TasksMasked Language M
- Page 1002 and 1003: Then, let’s create an instance of
- Page 1004 and 1005: If these two sentences were the inp
- Page 1006 and 1007: The BERT model may take many other
- Page 1008 and 1009: The contextual word embeddings are
- Page 1010 and 1011: Model Configuration1 class BERTClas
- Page 1012 and 1013: "Which BERT is that? DistilBERT?!"D
- Page 1014 and 1015: Well, you probably don’t want to
- Page 1016 and 1017: set num_labels=1 as argument.If you
- Page 1018 and 1019: Output{'attention_mask': [1, 1, 1,
- Page 1020 and 1021: OutputTrainingArguments(output_dir=
- Page 1022 and 1023: Method for Computing Accuracy1 def
- Page 1024 and 1025: loaded_model = (AutoModelForSequenc
- Page 1026 and 1027: logits.logits.argmax(dim=1)Outputte
- Page 1028 and 1029: For a complete list of available ta
GloVe
GloVe embeddings are not contextual, as you already know, but they can
also be easily retrieved using WordEmbeddings from flair:
from flair.embeddings import WordEmbeddings
glove_embedding = WordEmbeddings('glove')
Now, let’s retrieve the word embeddings for our sentences, but first, and this
is very important, we need to create new Sentence objects for them:
new_flair_sentences = [Sentence(s) for s in sentences]
glove_embedding.embed(new_flair_sentences)
Output
[Sentence: "The Hatter was the first to break the silence . `
What day of the month is it ? ' he said , turning to Alice :
he had taken his watch out of his pocket , and was looking at
it uneasily , shaking it every now and then , and holding it
to his ear ." [ Tokens: 58],
Sentence: "Alice thought this a very curious thing , and she
went nearer to watch them , and just as she came up to them
she heard one of them say , ` Look out now , Five ! Do n't go
splashing paint over me like that !" [ Tokens: 48]]
Never reuse a Sentence object to retrieve different word
embeddings! The embedding attribute may be partially
overwritten (depending on the number of dimensions), and
you may end up with mixed embeddings (e.g., 3,072
dimensions from ELMo, but the first 100 values are
overwritten by GloVe embeddings).
Since GloVe is not contextual, the word "watch" will have the same
embedding regardless of which sentence you retrieve it from:
956 | Chapter 11: Down the Yellow Brick Rabbit Hole