pdfcoffee
Chapter 7The usage pattern of instantiating a model and tokenizer from a pretrained model,optionally fine-tuning it using a comparatively small labeled dataset, and thenusing it for predictions, is fairly typical and applicable for the other fine tuningclasses as well. The Transformers API provides a standardized API to work withmultiple Transformer models and do standard fine-tuning tasks on them. Thepreceding described code can be found in the file bert_paraphrase.py in thecode accompanying this chapter.SummaryIn this chapter, we have learned about the concepts behind distributionalrepresentations of words and its various implementations, starting from staticword embeddings such as Word2Vec and GloVe.We have then looked at improvements to the basic idea, such as subwordembeddings, sentence embeddings that capture the context of the word in thesentence, as well as the use of entire language models for generating embeddings.While the language model-based embeddings are achieving state of the art resultsnowadays, there are still plenty of applications where more traditional approachesyield very good results, so it is important to know them all and understand thetradeoffs.We have also looked briefly at other interesting uses of word embeddings outsidethe realm of natural language, where the distributional properties of other kindsof sequences are leveraged to make predictions in domains such as informationretrieval and recommendation systems.You are now ready to use embeddings not only for your text-based neuralnetworks, which we will look at in greater depth in the next chapter, but alsoto use embeddings in other areas of machine learning.References1. Mikolov, T., et al. (2013, Sep 7) Efficient Estimation of Word Representationsin Vector Space. arXiv:1301.3781v3 [cs.CL].2. Mikolov, T., et al. (2013, Sep 17). Exploiting Similarities among Languagesfor Machine Translation. arXiv:1309.4168v1 [cs.CL].3. Mikolov, T., et al. (2013). Distributed Representations of Words and Phrasesand their Compositionality. Advances in Neural Information ProcessingSystems 26 (NIPS 2013).[ 275 ]
Word Embeddings4. Pennington, J., Socher, R., Manning, C. (2014). GloVe: Global Vectors for WordRepresentation. D14-1162, Proceedings of the 2014 Conference on EmpiricalMethods in Natural Language Processing (EMNLP).5. Niu, F., et al (2011, 11 Nov). HOGWILD! A Lock-Free Approach to ParallelizingStochastic Gradient Descent. arXiv:1106.5730v2 [math.OC].6. Levy, O., Goldberg, Y. (2014). Neural Word Embedding as Implicit MatrixFactorization. Advances in Neural Information Processing Systems27 (NIPS 2014).7. Mahoney, M. (2011, 1 Sep). text8 dataset. http://mattmahoney.net/dc/textdata.html.8. Rehurek, R. (2019, 10 Apr). gensim documentation for Word2Vec model.https://radimrehurek.com/gensim/models/word2vec.html.9. Levy, O., Goldberg, Y. (2014, 26-27 June). Linguistic Regularities in Sparseand Explicit Word Representations. Proceedings of the Eighteenth Conferenceon Computational Language Learning, pp 171-180 (ACL 2014).10. Rehurek, R. (2019, 10 Apr). gensim documentation for KeyedVectors.https://radimrehurek.com/gensim/models/keyedvectors.html.11. Almeida, T. A., Gamez Hidalgo, J. M., and Yamakami, A. (2011).Contributions to the Study of SMS Spam Filtering: New Collectionand Results. Proceedings of the 2011 ACM Symposium on DocumentEngineering (DOCENG). URL: http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/.12. Speer, R., Chin, J. (2016, 6 Apr). An Ensemble Method to Produce High-QualityWord Embeddings. arXiv:1604.01692v1 [cs.CL].13. Speer, R. (2016, 25 May). ConceptNet Numberbatch: a new name for the bestWord Embeddings you can download. URL: http://blog.conceptnet.io/posts/2016/conceptnet-numberbatch-a-new-name-for-the-best-wordembeddings-you-can-download/.14. Barkan, O., Koenigstein, N. (2016, 13-16 Sep). Item2Vec: Neural Item Embeddingfor Collaborative Filtering. IEEE 26th International Workshop on MachineLearning for Signal Processing (MLSP 2016).15. Grover, A., Leskovec, J. (2016, 13-17 Aug). node2vec: Scalable FeatureLearning for Networks. Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining. (KDD 2016).16. TensorFlow 2.0 Models on TensorFlow Hub. URL: https://tfhub.dev/s?q=tf2-preview.17. Zhang, X., LeCun, Y. (2016, 4 Apr). Text Understanding from Scratch. arXiv1502.01710v5 [cs.LG].[ 276 ]
- Page 260 and 261: Chapter 6d_loss = (dA_loss + dB_los
- Page 262 and 263: Chapter 6generator_AB.save_weights(
- Page 264: 6. Ledig, Christian, et al. Photo-R
- Page 267 and 268: Word EmbeddingsDeep learning models
- Page 269 and 270: Word EmbeddingsFor example, "crucia
- Page 271 and 272: Word EmbeddingsAssuming a window si
- Page 273 and 274: Word EmbeddingsGloVeThe Global vect
- Page 275 and 276: Word Embeddingsgensim is an open so
- Page 277 and 278: Word Embeddingsgensim also provides
- Page 279 and 280: Word EmbeddingsSpecifically, we wil
- Page 281 and 282: Word EmbeddingsWe will also convert
- Page 283 and 284: Word EmbeddingsE = np.zeros((vocab_
- Page 285 and 286: Word Embeddingsx = self.embedding(x
- Page 287 and 288: Word EmbeddingsThe change in valida
- Page 289 and 290: Word EmbeddingsThe dataset is a 114
- Page 291 and 292: Word Embeddingsprint("random walks
- Page 293 and 294: Word Embeddingssize=128, # size of
- Page 295 and 296: Word EmbeddingsfastText computes em
- Page 297 and 298: Word EmbeddingsIn the future, once
- Page 299 and 300: Word EmbeddingsA much earlier relat
- Page 301 and 302: Word EmbeddingsOnce you have the fi
- Page 303 and 304: Word EmbeddingsThis will create the
- Page 305 and 306: Word EmbeddingsClassifying with BER
- Page 307 and 308: Word Embeddings2. Each Transformer
- Page 309: Word EmbeddingsOnce trained, we sav
- Page 313 and 314: Word Embeddings34. Google Research,
- Page 315 and 316: Recurrent Neural NetworksWe will th
- Page 317 and 318: Recurrent Neural NetworksFor notati
- Page 319 and 320: Recurrent Neural NetworksThis probl
- Page 321 and 322: Recurrent Neural NetworksThe line a
- Page 323 and 324: Recurrent Neural NetworksGated recu
- Page 325 and 326: Recurrent Neural NetworksThis probl
- Page 327 and 328: Recurrent Neural NetworksThe topolo
- Page 329 and 330: Recurrent Neural Networkstexts = do
- Page 331 and 332: Recurrent Neural Networksdef call(s
- Page 333 and 334: Recurrent Neural Networks# callback
- Page 335 and 336: Recurrent Neural NetworksExample
- Page 337 and 338: Recurrent Neural NetworksAs can be
- Page 339 and 340: Recurrent Neural Networksdata_dir =
- Page 341 and 342: Recurrent Neural NetworksWe can als
- Page 343 and 344: Recurrent Neural NetworksIn order t
- Page 345 and 346: Recurrent Neural Networkssource_voc
- Page 347 and 348: Recurrent Neural NetworksFinally, w
- Page 349 and 350: Recurrent Neural Networks38 - val_l
- Page 351 and 352: Recurrent Neural NetworksIf you wou
- Page 353 and 354: Recurrent Neural NetworksExample
- Page 355 and 356: Recurrent Neural NetworksNext we ha
- Page 357 and 358: Recurrent Neural Networksself.embed
- Page 359 and 360: Recurrent Neural NetworksThis is a
Chapter 7
The usage pattern of instantiating a model and tokenizer from a pretrained model,
optionally fine-tuning it using a comparatively small labeled dataset, and then
using it for predictions, is fairly typical and applicable for the other fine tuning
classes as well. The Transformers API provides a standardized API to work with
multiple Transformer models and do standard fine-tuning tasks on them. The
preceding described code can be found in the file bert_paraphrase.py in the
code accompanying this chapter.
Summary
In this chapter, we have learned about the concepts behind distributional
representations of words and its various implementations, starting from static
word embeddings such as Word2Vec and GloVe.
We have then looked at improvements to the basic idea, such as subword
embeddings, sentence embeddings that capture the context of the word in the
sentence, as well as the use of entire language models for generating embeddings.
While the language model-based embeddings are achieving state of the art results
nowadays, there are still plenty of applications where more traditional approaches
yield very good results, so it is important to know them all and understand the
tradeoffs.
We have also looked briefly at other interesting uses of word embeddings outside
the realm of natural language, where the distributional properties of other kinds
of sequences are leveraged to make predictions in domains such as information
retrieval and recommendation systems.
You are now ready to use embeddings not only for your text-based neural
networks, which we will look at in greater depth in the next chapter, but also
to use embeddings in other areas of machine learning.
References
1. Mikolov, T., et al. (2013, Sep 7) Efficient Estimation of Word Representations
in Vector Space. arXiv:1301.3781v3 [cs.CL].
2. Mikolov, T., et al. (2013, Sep 17). Exploiting Similarities among Languages
for Machine Translation. arXiv:1309.4168v1 [cs.CL].
3. Mikolov, T., et al. (2013). Distributed Representations of Words and Phrases
and their Compositionality. Advances in Neural Information Processing
Systems 26 (NIPS 2013).
[ 275 ]