Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
"Down the Yellow Brick Rabbit Hole"Where does the phrase in the title come from? On the one hand, if it were "downthe rabbit hole," one could guess Alice’s Adventures in Wonderland. On the otherhand, if it were "the yellow brick road," one could guess The Wonderful Wizard of Oz.But it is neither (or maybe it is both?). What if, instead of trying to guess itourselves, we trained a model to classify sentences? This is a book about deeplearning, after all :-)Training models on text data is what natural language processing (NLP) is all about.The whole field is enormous, and we’ll be only scratching the surface of it in thischapter. We’ll start with the most obvious question: "how do you convert text datainto numerical data?", we’ll end up using a pre-trained model—our famous Muppetfriend, BERT—to classify sentences.Building a DatasetThere are many freely available datasets for NLP. The texts are usually alreadynicely organized into sentences that you can easily feed to a pre-trained model likeBERT. Isn’t it awesome? Well, yeah, but…"But what?"But the texts you’ll find in the real world are not nicely organized into sentences.You have to organize them yourself.So, we’ll start our NLP journey by following the steps of Alice and Dorothy, fromAlice’s Adventures in Wonderland [158] by Lewis Carroll and The Wonderful Wizard of Oz[159]by L. Frank Baum.Both texts are freely available at the Oxford Text Archive (OTA)[160]under an Attribution-NonCommercial-ShareAlike 3.0Unported (CC BY-NC-SA 3.0) license."Down the Yellow Brick Rabbit Hole" | 883
Figure 11.1 - Left: "Alice and the Baby Pig" illustration by John Tenniel, from "Alice’s Adventures inWonderland" (1865). Right: "Dorothy meets the Cowardly Lion" illustration by W. W. Denslow,from "The Wonderful Wizard of Oz" (1900).The direct links to both texts are alice28-1476.txt [161] (we’re naming it ALICE_URL)and wizoz10-1740.txt [162] (we’re naming it WIZARD_URL). You can download both ofthem to a local folder using the helper function download_text() (included indata_generation.nlp):Data Loading1 localfolder = 'texts'2 download_text(ALICE_URL, localfolder)3 download_text(WIZARD_URL, localfolder)If you open these files in a text editor, you’ll see that there is a lot of information atthe beginning (and some at the end) that has been added to the original text of thebooks for legal reasons. We need to remove these additions to the original texts:Downloading Books1 fname1 = os.path.join(localfolder, 'alice28-1476.txt')2 with open(fname1, 'r') as f:3 alice = ''.join(f.readlines()[104:3704])4 fname2 = os.path.join(localfolder, 'wizoz10-1740.txt')5 with open(fname2, 'r') as f:6 wizard = ''.join(f.readlines()[310:5100])884 | Chapter 11: Down the Yellow Brick Rabbit Hole
- Page 858 and 859: "values") in the decoder.• decode
- Page 860 and 861: Data Preparation1 # Generating trai
- Page 862 and 863: Figure 10.15 - Losses—Transformer
- Page 864 and 865: • First, and most important, PyTo
- Page 866 and 867: decode(), with a single one, encode
- Page 868 and 869: 46 for i in range(self.target_len):
- Page 870 and 871: Figure 10.18 - Losses - PyTorch’s
- Page 872 and 873: Figure 10.20 - Sample image—label
- Page 874 and 875: 4041 # Builds a weighted random sam
- Page 876 and 877: Figure 10.23 - Sample image—split
- Page 878 and 879: Einops"There is more than one way t
- Page 880 and 881: Figure 10.26 - Two patch embeddings
- Page 882 and 883: Now each sequence has ten elements,
- Page 884 and 885: It takes an instance of a Transform
- Page 886 and 887: Putting It All TogetherIn this chap
- Page 888 and 889: 1. Encoder-DecoderThe encoder-decod
- Page 890 and 891: This is the actual encoder-decoder
- Page 892 and 893: 3. DecoderThe Transformer decoder h
- Page 894 and 895: 5. Encoder "Layer"The encoder "laye
- Page 896 and 897: 7. "Sub-Layer" WrapperThe "sub-laye
- Page 898 and 899: 8. Multi-Headed AttentionThe multi-
- Page 900 and 901: Model Configuration & TrainingModel
- Page 902 and 903: • training the Transformer to tac
- Page 904 and 905: Part IVNatural Language Processing|
- Page 906 and 907: Additional SetupThis is a special c
- Page 910 and 911: The actual texts of the books are c
- Page 912 and 913: "What is this punkt?"That’s the P
- Page 914 and 915: 14 # If there is a configuration fi
- Page 916 and 917: Sentence Tokenization in spaCyBy th
- Page 918 and 919: AttributesThe Dataset has many attr
- Page 920 and 921: Output{'labels': 1,'sentence': 'The
- Page 922 and 923: elements from the text. But preproc
- Page 924 and 925: Data AugmentationLet’s briefly ad
- Page 926 and 927: The corpora’s dictionary is not a
- Page 928 and 929: Finally, if we want to convert a li
- Page 930 and 931: Once we’re happy with the size an
- Page 932 and 933: from transformers import BertTokeni
- Page 934 and 935: "What about the separation token?"T
- Page 936 and 937: The last output, attention_mask, wo
- Page 938 and 939: Outputtensor([[ 3, 27, 1, ..., 0, 0
- Page 940 and 941: vector, right? And our vocabulary i
- Page 942 and 943: Maybe you filled this blank in with
- Page 944 and 945: Continuous Bag-of-Words (CBoW)In th
- Page 946 and 947: That’s a fairly simple model, rig
- Page 948 and 949: Figure 11.13 - Continuous bag-of-wo
- Page 950 and 951: Figure 11.15 - Reviewing restaurant
- Page 952 and 953: You got that right—arithmetic—r
- Page 954 and 955: There we go, 50 dimensions! It’s
- Page 956 and 957: Equation 11.1 - Embedding arithmeti
"Down the Yellow Brick Rabbit Hole"
Where does the phrase in the title come from? On the one hand, if it were "down
the rabbit hole," one could guess Alice’s Adventures in Wonderland. On the other
hand, if it were "the yellow brick road," one could guess The Wonderful Wizard of Oz.
But it is neither (or maybe it is both?). What if, instead of trying to guess it
ourselves, we trained a model to classify sentences? This is a book about deep
learning, after all :-)
Training models on text data is what natural language processing (NLP) is all about.
The whole field is enormous, and we’ll be only scratching the surface of it in this
chapter. We’ll start with the most obvious question: "how do you convert text data
into numerical data?", we’ll end up using a pre-trained model—our famous Muppet
friend, BERT—to classify sentences.
Building a Dataset
There are many freely available datasets for NLP. The texts are usually already
nicely organized into sentences that you can easily feed to a pre-trained model like
BERT. Isn’t it awesome? Well, yeah, but…
"But what?"
But the texts you’ll find in the real world are not nicely organized into sentences.
You have to organize them yourself.
So, we’ll start our NLP journey by following the steps of Alice and Dorothy, from
Alice’s Adventures in Wonderland [158] by Lewis Carroll and The Wonderful Wizard of Oz
[159]
by L. Frank Baum.
Both texts are freely available at the Oxford Text Archive (OTA)
[160]
under an Attribution-NonCommercial-ShareAlike 3.0
Unported (CC BY-NC-SA 3.0) license.
"Down the Yellow Brick Rabbit Hole" | 883