Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
logits.logits.argmax(dim=1)Outputtensor([0], device='cuda:0')BERT has spoken: The sentence "Down the Yellow Brick Rabbit Hole" is more likelycoming from The Wonderful Wizard of Oz (the negative class of our binaryclassification task).Don’t you think that’s a lot of work to get predictions for a single sentence? I mean,tokenizing, sending it to the device, feeding the inputs to the model, getting thelargest logit—that’s a lot, right? I think so, too.PipelinesPipelines can handle all these steps for us, we just have to choose the appropriateone. There are many different pipelines, one for each task, like theTextClassificationPipeline and the TextGenerationPipeline. Let’s use theformer to run our tokenizer and trained model at once:from transformers import TextClassificationPipelinedevice_index = (loaded_model.device.indexif loaded_model.device.type != 'cpu'else -1)classifier = TextClassificationPipeline(model=loaded_model,tokenizer=auto_tokenizer,device=device_index)Every pipeline takes at least two required arguments: a model and a tokenizer. Wecan also send it straight to the same device as our model, but we would need to usethe device index instead (-1 if it’s on a CPU, 0 if it’s on the first or only GPU, 1 if it’son the second one, and so on).Now we can make predictions using the original sentences:classifier(['Down the Yellow Brick Rabbit Hole', 'Alice rules!'])Fine-Tuning with HuggingFace | 1001
Output[{'label': 'LABEL_0', 'score': 0.9951714277267456},{'label': 'LABEL_1', 'score': 0.9985325336456299}]The model seems pretty confident that the first sentence is from The WonderfulWizard of Oz (negative class) and that the second sentence is from Alice’sAdventures in Wonderland (positive class).We can make the output a bit more intuitive, though, by setting proper labels foreach of the classes using the id2label attribute of our model’s configuration:loaded_model.config.id2label = {0: 'Wizard', 1: 'Alice'}Let’s try it again:classifier(['Down the Yellow Brick Rabbit Hole', 'Alice rules!'])Output[{'label': 'Wizard', 'score': 0.9951714277267456},{'label': 'Alice', 'score': 0.9985325336456299}]That’s much better!More PipelinesIt’s also possible to use pre-trained pipelines for typical tasks like sentimentanalysis without having to fine-tune your own model:from transformers import pipelinesentiment = pipeline('sentiment-analysis')That’s it! The task defines which pipeline is used. For sentiment analysis, thepipeline above loads a TextClassificationPipeline like ours, but one that’s pretrainedto perform that task.1002 | Chapter 11: Down the Yellow Brick Rabbit Hole
- Page 976 and 977: OutputToken: 32 watchThe get_token(
- Page 978 and 979: Helper Function to Retrieve Embeddi
- Page 980 and 981: Output(tensor(-0.5047, device='cuda
- Page 982 and 983: torch.all(new_flair_sentences[0].to
- Page 984 and 985: Outputtensor(0.3504, device='cuda:0
- Page 986 and 987: We can leverage this fact to slight
- Page 988 and 989: We can easily get the embeddings fo
- Page 990 and 991: Figure 11.24 - Losses—simple clas
- Page 992 and 993: We can inspect the pre-trained mode
- Page 994 and 995: Every word piece is prefixed with #
- Page 996 and 997: far, our models used these embeddin
- Page 998 and 999: position_ids = torch.arange(512).ex
- Page 1000 and 1001: Pre-training TasksMasked Language M
- Page 1002 and 1003: Then, let’s create an instance of
- Page 1004 and 1005: If these two sentences were the inp
- Page 1006 and 1007: The BERT model may take many other
- Page 1008 and 1009: The contextual word embeddings are
- Page 1010 and 1011: Model Configuration1 class BERTClas
- Page 1012 and 1013: "Which BERT is that? DistilBERT?!"D
- Page 1014 and 1015: Well, you probably don’t want to
- Page 1016 and 1017: set num_labels=1 as argument.If you
- Page 1018 and 1019: Output{'attention_mask': [1, 1, 1,
- Page 1020 and 1021: OutputTrainingArguments(output_dir=
- Page 1022 and 1023: Method for Computing Accuracy1 def
- Page 1024 and 1025: loaded_model = (AutoModelForSequenc
- Page 1028 and 1029: For a complete list of available ta
- Page 1030 and 1031: [215]. For a demo of GPT-2’s capa
- Page 1032 and 1033: in Chapter 9, and I reproduce it be
- Page 1034 and 1035: Data Preparation1 auto_tokenizer =
- Page 1036 and 1037: Data Preparation1 lm_train_dataset
- Page 1038 and 1039: The training arguments are roughly
- Page 1040 and 1041: device_index = (model.device.indexi
- Page 1042 and 1043: • learning that a language model
- Page 1044 and 1045: [167] https://huggingface.co/docs/d
logits.logits.argmax(dim=1)
Output
tensor([0], device='cuda:0')
BERT has spoken: The sentence "Down the Yellow Brick Rabbit Hole" is more likely
coming from The Wonderful Wizard of Oz (the negative class of our binary
classification task).
Don’t you think that’s a lot of work to get predictions for a single sentence? I mean,
tokenizing, sending it to the device, feeding the inputs to the model, getting the
largest logit—that’s a lot, right? I think so, too.
Pipelines
Pipelines can handle all these steps for us, we just have to choose the appropriate
one. There are many different pipelines, one for each task, like the
TextClassificationPipeline and the TextGenerationPipeline. Let’s use the
former to run our tokenizer and trained model at once:
from transformers import TextClassificationPipeline
device_index = (loaded_model.device.index
if loaded_model.device.type != 'cpu'
else -1)
classifier = TextClassificationPipeline(model=loaded_model,
tokenizer=auto_tokenizer,
device=device_index)
Every pipeline takes at least two required arguments: a model and a tokenizer. We
can also send it straight to the same device as our model, but we would need to use
the device index instead (-1 if it’s on a CPU, 0 if it’s on the first or only GPU, 1 if it’s
on the second one, and so on).
Now we can make predictions using the original sentences:
classifier(['Down the Yellow Brick Rabbit Hole', 'Alice rules!'])
Fine-Tuning with HuggingFace | 1001