pdfcoffee

soumyasankar99
from soumyasankar99 More from this publisher
09.05.2023 Views

Chapter 7The usage pattern of instantiating a model and tokenizer from a pretrained model,optionally fine-tuning it using a comparatively small labeled dataset, and thenusing it for predictions, is fairly typical and applicable for the other fine tuningclasses as well. The Transformers API provides a standardized API to work withmultiple Transformer models and do standard fine-tuning tasks on them. Thepreceding described code can be found in the file bert_paraphrase.py in thecode accompanying this chapter.SummaryIn this chapter, we have learned about the concepts behind distributionalrepresentations of words and its various implementations, starting from staticword embeddings such as Word2Vec and GloVe.We have then looked at improvements to the basic idea, such as subwordembeddings, sentence embeddings that capture the context of the word in thesentence, as well as the use of entire language models for generating embeddings.While the language model-based embeddings are achieving state of the art resultsnowadays, there are still plenty of applications where more traditional approachesyield very good results, so it is important to know them all and understand thetradeoffs.We have also looked briefly at other interesting uses of word embeddings outsidethe realm of natural language, where the distributional properties of other kindsof sequences are leveraged to make predictions in domains such as informationretrieval and recommendation systems.You are now ready to use embeddings not only for your text-based neuralnetworks, which we will look at in greater depth in the next chapter, but alsoto use embeddings in other areas of machine learning.References1. Mikolov, T., et al. (2013, Sep 7) Efficient Estimation of Word Representationsin Vector Space. arXiv:1301.3781v3 [cs.CL].2. Mikolov, T., et al. (2013, Sep 17). Exploiting Similarities among Languagesfor Machine Translation. arXiv:1309.4168v1 [cs.CL].3. Mikolov, T., et al. (2013). Distributed Representations of Words and Phrasesand their Compositionality. Advances in Neural Information ProcessingSystems 26 (NIPS 2013).[ 275 ]

Word Embeddings4. Pennington, J., Socher, R., Manning, C. (2014). GloVe: Global Vectors for WordRepresentation. D14-1162, Proceedings of the 2014 Conference on EmpiricalMethods in Natural Language Processing (EMNLP).5. Niu, F., et al (2011, 11 Nov). HOGWILD! A Lock-Free Approach to ParallelizingStochastic Gradient Descent. arXiv:1106.5730v2 [math.OC].6. Levy, O., Goldberg, Y. (2014). Neural Word Embedding as Implicit MatrixFactorization. Advances in Neural Information Processing Systems27 (NIPS 2014).7. Mahoney, M. (2011, 1 Sep). text8 dataset. http://mattmahoney.net/dc/textdata.html.8. Rehurek, R. (2019, 10 Apr). gensim documentation for Word2Vec model.https://radimrehurek.com/gensim/models/word2vec.html.9. Levy, O., Goldberg, Y. (2014, 26-27 June). Linguistic Regularities in Sparseand Explicit Word Representations. Proceedings of the Eighteenth Conferenceon Computational Language Learning, pp 171-180 (ACL 2014).10. Rehurek, R. (2019, 10 Apr). gensim documentation for KeyedVectors.https://radimrehurek.com/gensim/models/keyedvectors.html.11. Almeida, T. A., Gamez Hidalgo, J. M., and Yamakami, A. (2011).Contributions to the Study of SMS Spam Filtering: New Collectionand Results. Proceedings of the 2011 ACM Symposium on DocumentEngineering (DOCENG). URL: http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/.12. Speer, R., Chin, J. (2016, 6 Apr). An Ensemble Method to Produce High-QualityWord Embeddings. arXiv:1604.01692v1 [cs.CL].13. Speer, R. (2016, 25 May). ConceptNet Numberbatch: a new name for the bestWord Embeddings you can download. URL: http://blog.conceptnet.io/posts/2016/conceptnet-numberbatch-a-new-name-for-the-best-wordembeddings-you-can-download/.14. Barkan, O., Koenigstein, N. (2016, 13-16 Sep). Item2Vec: Neural Item Embeddingfor Collaborative Filtering. IEEE 26th International Workshop on MachineLearning for Signal Processing (MLSP 2016).15. Grover, A., Leskovec, J. (2016, 13-17 Aug). node2vec: Scalable FeatureLearning for Networks. Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining. (KDD 2016).16. TensorFlow 2.0 Models on TensorFlow Hub. URL: https://tfhub.dev/s?q=tf2-preview.17. Zhang, X., LeCun, Y. (2016, 4 Apr). Text Understanding from Scratch. arXiv1502.01710v5 [cs.LG].[ 276 ]

Chapter 7

The usage pattern of instantiating a model and tokenizer from a pretrained model,

optionally fine-tuning it using a comparatively small labeled dataset, and then

using it for predictions, is fairly typical and applicable for the other fine tuning

classes as well. The Transformers API provides a standardized API to work with

multiple Transformer models and do standard fine-tuning tasks on them. The

preceding described code can be found in the file bert_paraphrase.py in the

code accompanying this chapter.

Summary

In this chapter, we have learned about the concepts behind distributional

representations of words and its various implementations, starting from static

word embeddings such as Word2Vec and GloVe.

We have then looked at improvements to the basic idea, such as subword

embeddings, sentence embeddings that capture the context of the word in the

sentence, as well as the use of entire language models for generating embeddings.

While the language model-based embeddings are achieving state of the art results

nowadays, there are still plenty of applications where more traditional approaches

yield very good results, so it is important to know them all and understand the

tradeoffs.

We have also looked briefly at other interesting uses of word embeddings outside

the realm of natural language, where the distributional properties of other kinds

of sequences are leveraged to make predictions in domains such as information

retrieval and recommendation systems.

You are now ready to use embeddings not only for your text-based neural

networks, which we will look at in greater depth in the next chapter, but also

to use embeddings in other areas of machine learning.

References

1. Mikolov, T., et al. (2013, Sep 7) Efficient Estimation of Word Representations

in Vector Space. arXiv:1301.3781v3 [cs.CL].

2. Mikolov, T., et al. (2013, Sep 17). Exploiting Similarities among Languages

for Machine Translation. arXiv:1309.4168v1 [cs.CL].

3. Mikolov, T., et al. (2013). Distributed Representations of Words and Phrases

and their Compositionality. Advances in Neural Information Processing

Systems 26 (NIPS 2013).

[ 275 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!