Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
[215]. For a demo of GPT-2’s capabilities, please check AllenNLP’s LanguageModeling Demo, [216] which uses GPT-2’s medium model (345 million parameters).You can also check GPT-2’s documentation [217] and model card, [218]available at HuggingFace, for a quick overview of the modeland its training procedure.For a general overview of GPT-2, see this great post by JayAlammar: "The Illustrated GPT-2 (Visualizing TransformerLanguage Models)." [219]To learn more details about GPT-2’s architecture, please check"The Annotated GPT-2" [220] by Aman Arora.There is also Andrej Karpathy’s minimalistic implementation ofGPT, minGPT, [221] if you feel like trying to train a GPT model fromscratch.Let’s load the GPT-2-based text generation pipeline:text_generator = pipeline("text-generation")Then, let’s use the first two paragraphs from Alice’s Adventures in Wonderland asour base text:base_text = """Alice was beginning to get very tired of sitting by her sister onthe bank, and of having nothing to do: once or twice she had peepedinto the book her sister was reading, but it had no pictures orconversations in it, `and what is the use of a book,'thought Alice`without pictures or conversation?' So she was considering in herown mind (as well as she could, for the hot day made her feel verysleepy and stupid), whether the pleasure of making a daisy-chainwould be worth the trouble of getting up and picking the daisies,when suddenly a White Rabbit with pink eyes ran close by her."""The generator will produce a text of size max_length, including the base text, so thisvalue has to be larger than the length of the base text. By default, the model in theGPT-2 | 1005
text generation pipeline has its do_sample argument set to True to generate wordsusing beam search instead of greedy decoding:text_generator.model.config.task_specific_paramsOutput{'text-generation': {'do_sample': True, 'max_length': 50}}result = text_generator(base_text, max_length=250)print(result[0]['generated_text'])Output...Alice stared at the familiar looking man in red, in a white dress,and smiled shyly.She saw the cat on the grass and sat down with it gently andeagerly, with her arms up.There was a faint, long, dark stench, the cat had its tail held atthe end by a large furry, white fur over it.Alice glanced at it.It was a cat, but a White Rabbit was very good at drawing this,thinking over its many attributes, and making sure that no redsappearedI’ve removed the base text from the output above, so that’s generated text only.Looks decent, right? I tried it several times, and the generated text is usuallyconsistent, even though it digresses some times and, on occasion, generates somereally weird pieces of text."What is this beam search? That sounds oddly familiar."That’s true, we briefly discussed beam search (and its alternative, greedy decoding)1006 | Chapter 11: Down the Yellow Brick Rabbit Hole
- Page 980 and 981: Output(tensor(-0.5047, device='cuda
- Page 982 and 983: torch.all(new_flair_sentences[0].to
- Page 984 and 985: Outputtensor(0.3504, device='cuda:0
- Page 986 and 987: We can leverage this fact to slight
- Page 988 and 989: We can easily get the embeddings fo
- Page 990 and 991: Figure 11.24 - Losses—simple clas
- Page 992 and 993: We can inspect the pre-trained mode
- Page 994 and 995: Every word piece is prefixed with #
- Page 996 and 997: far, our models used these embeddin
- Page 998 and 999: position_ids = torch.arange(512).ex
- Page 1000 and 1001: Pre-training TasksMasked Language M
- Page 1002 and 1003: Then, let’s create an instance of
- Page 1004 and 1005: If these two sentences were the inp
- Page 1006 and 1007: The BERT model may take many other
- Page 1008 and 1009: The contextual word embeddings are
- Page 1010 and 1011: Model Configuration1 class BERTClas
- Page 1012 and 1013: "Which BERT is that? DistilBERT?!"D
- Page 1014 and 1015: Well, you probably don’t want to
- Page 1016 and 1017: set num_labels=1 as argument.If you
- Page 1018 and 1019: Output{'attention_mask': [1, 1, 1,
- Page 1020 and 1021: OutputTrainingArguments(output_dir=
- Page 1022 and 1023: Method for Computing Accuracy1 def
- Page 1024 and 1025: loaded_model = (AutoModelForSequenc
- Page 1026 and 1027: logits.logits.argmax(dim=1)Outputte
- Page 1028 and 1029: For a complete list of available ta
- Page 1032 and 1033: in Chapter 9, and I reproduce it be
- Page 1034 and 1035: Data Preparation1 auto_tokenizer =
- Page 1036 and 1037: Data Preparation1 lm_train_dataset
- Page 1038 and 1039: The training arguments are roughly
- Page 1040 and 1041: device_index = (model.device.indexi
- Page 1042 and 1043: • learning that a language model
- Page 1044 and 1045: [167] https://huggingface.co/docs/d
[215]
. For a demo of GPT-2’s capabilities, please check AllenNLP’s Language
Modeling Demo, [216] which uses GPT-2’s medium model (345 million parameters).
You can also check GPT-2’s documentation [217] and model card, [
218]
available at HuggingFace, for a quick overview of the model
and its training procedure.
For a general overview of GPT-2, see this great post by Jay
Alammar: "The Illustrated GPT-2 (Visualizing Transformer
Language Models)." [219]
To learn more details about GPT-2’s architecture, please check
"The Annotated GPT-2" [220] by Aman Arora.
There is also Andrej Karpathy’s minimalistic implementation of
GPT, minGPT, [221] if you feel like trying to train a GPT model from
scratch.
Let’s load the GPT-2-based text generation pipeline:
text_generator = pipeline("text-generation")
Then, let’s use the first two paragraphs from Alice’s Adventures in Wonderland as
our base text:
base_text = """
Alice was beginning to get very tired of sitting by her sister on
the bank, and of having nothing to do: once or twice she had peeped
into the book her sister was reading, but it had no pictures or
conversations in it, `and what is the use of a book,'thought Alice
`without pictures or conversation?' So she was considering in her
own mind (as well as she could, for the hot day made her feel very
sleepy and stupid), whether the pleasure of making a daisy-chain
would be worth the trouble of getting up and picking the daisies,
when suddenly a White Rabbit with pink eyes ran close by her.
"""
The generator will produce a text of size max_length, including the base text, so this
value has to be larger than the length of the base text. By default, the model in the
GPT-2 | 1005