Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
set num_labels=1 as argument.If you want to learn more about the arguments the pre-trainedmodels may take, please check the documentation onconfiguration: PretrainedConfig. [210]To learn more about the outputs of several pre-trained models,please check the documentation on model outputs. [211]The ForSequenceClassification models add a single linear layer (classifier) on topof the pooled output from the underlying base model to produce the logitsoutput.More AutoModelsIf you want to quickly try different fine-tuning models without having to importtheir corresponding classes, you can use HuggingFace’s AutoModelcorresponding to a fine-tuning task:from transformers import AutoModelForSequenceClassificationauto_cls = AutoModelForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=2)print(auto_cls.__class__)Output<class 'transformers.modeling_distilbert.DistilBertForSequenceClassification'>As you can see, it infers the correct model class based on the name of themodel you’re loading, e.g., distilbert-base-uncased.We already have a model, let’s look at our dataset…Fine-Tuning with HuggingFace | 991
Tokenized DatasetThe training and test datasets are HF’s Datasets, and, finally, we’ll keep them likethat instead of building TensorDatasets out of them. We still have to tokenizethem, though:Data Preparation1 auto_tokenizer = AutoTokenizer.from_pretrained(2 'distilbert-base-uncased'3 )4 def tokenize(row):5 return auto_tokenizer(row['sentence'],6 truncation=True,7 padding='max_length',8 max_length=30)We load a pre-trained tokenizer and build a simple function that takes one rowfrom the dataset and tokenizes it. So far, so good, right?IMPORTANT: The pre-trained tokenizer and pre-trained modelmust have matching architectures—in our case, both are pretrainedon distilbert-base-uncased.Next, we use the map() method of HF’s Dataset to create new columns by usingour tokenize() function:Data Preparation1 tokenized_train_dataset = train_dataset.map(2 tokenize, batched=True3 )4 tokenized_test_dataset = test_dataset.map(tokenize, batched=True)The batched argument speeds up the tokenization, but the tokenizer must returnlists instead of tensors (notice the missing return_tensors='pt' in the call toauto_tokenizer):print(tokenized_train_dataset[0])992 | Chapter 11: Down the Yellow Brick Rabbit Hole
- Page 966 and 967: Model Configuration & TrainingLet
- Page 968 and 969: 6 self.encoder = encoder7 self.mlp
- Page 970 and 971: Figure 11.20 - Losses—Transformer
- Page 972 and 973: Outputtensor([[[2.6334e-01, 6.9912e
- Page 974 and 975: I want to introduce you to…ELMoBo
- Page 976 and 977: OutputToken: 32 watchThe get_token(
- Page 978 and 979: Helper Function to Retrieve Embeddi
- Page 980 and 981: Output(tensor(-0.5047, device='cuda
- Page 982 and 983: torch.all(new_flair_sentences[0].to
- Page 984 and 985: Outputtensor(0.3504, device='cuda:0
- Page 986 and 987: We can leverage this fact to slight
- Page 988 and 989: We can easily get the embeddings fo
- Page 990 and 991: Figure 11.24 - Losses—simple clas
- Page 992 and 993: We can inspect the pre-trained mode
- Page 994 and 995: Every word piece is prefixed with #
- Page 996 and 997: far, our models used these embeddin
- Page 998 and 999: position_ids = torch.arange(512).ex
- Page 1000 and 1001: Pre-training TasksMasked Language M
- Page 1002 and 1003: Then, let’s create an instance of
- Page 1004 and 1005: If these two sentences were the inp
- Page 1006 and 1007: The BERT model may take many other
- Page 1008 and 1009: The contextual word embeddings are
- Page 1010 and 1011: Model Configuration1 class BERTClas
- Page 1012 and 1013: "Which BERT is that? DistilBERT?!"D
- Page 1014 and 1015: Well, you probably don’t want to
- Page 1018 and 1019: Output{'attention_mask': [1, 1, 1,
- Page 1020 and 1021: OutputTrainingArguments(output_dir=
- Page 1022 and 1023: Method for Computing Accuracy1 def
- Page 1024 and 1025: loaded_model = (AutoModelForSequenc
- Page 1026 and 1027: logits.logits.argmax(dim=1)Outputte
- Page 1028 and 1029: For a complete list of available ta
- Page 1030 and 1031: [215]. For a demo of GPT-2’s capa
- Page 1032 and 1033: in Chapter 9, and I reproduce it be
- Page 1034 and 1035: Data Preparation1 auto_tokenizer =
- Page 1036 and 1037: Data Preparation1 lm_train_dataset
- Page 1038 and 1039: The training arguments are roughly
- Page 1040 and 1041: device_index = (model.device.indexi
- Page 1042 and 1043: • learning that a language model
- Page 1044 and 1045: [167] https://huggingface.co/docs/d
set num_labels=1 as argument.
If you want to learn more about the arguments the pre-trained
models may take, please check the documentation on
configuration: PretrainedConfig. [210]
To learn more about the outputs of several pre-trained models,
please check the documentation on model outputs. [211]
The ForSequenceClassification models add a single linear layer (classifier) on top
of the pooled output from the underlying base model to produce the logits
output.
More AutoModels
If you want to quickly try different fine-tuning models without having to import
their corresponding classes, you can use HuggingFace’s AutoModel
corresponding to a fine-tuning task:
from transformers import AutoModelForSequenceClassification
auto_cls = AutoModelForSequenceClassification.from_pretrained(
'distilbert-base-uncased', num_labels=2
)
print(auto_cls.__class__)
Output
<class 'transformers.modeling_distilbert
.DistilBertForSequenceClassification'>
As you can see, it infers the correct model class based on the name of the
model you’re loading, e.g., distilbert-base-uncased.
We already have a model, let’s look at our dataset…
Fine-Tuning with HuggingFace | 991