Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Well, you probably don’t want to go through all this trouble—adjusting the datasetsand writing a model class—to fine-tune a BERT model, right?Say no more!Fine-Tuning with HuggingFaceWhat if I told you that there is a BERT model for every task, and you just need tofine-tune it? Cool, isn’t it? Then, what if I told you that you can use a trainer to domost of the fine-tuning work for you? Amazing, right? The HuggingFace library isthat good, really!There are BERT models available for many different tasks:• Pre-training tasks:◦ Masked language model (BertForMaskedLM)◦ Next sentence prediction (BertForNextSentencePrediction)• Typical tasks (also available as AutoModel):◦ Sequence classification (BertForSequenceClassification)◦ Token classification (BertForTokenClassification)◦ Question answering (BertForQuestionAnswering)• BERT (and family) specific:◦ Multiple choice (BertForMultipleChoice)We’re sticking with the sequence classification task using DistilBERT instead ofregular BERT so as to make the fine-tuning faster.Sequence Classification (or Regression)Let’s load the pre-trained model using its corresponding class:Fine-Tuning with HuggingFace | 989

Model Configuration1 from transformers import DistilBertForSequenceClassification2 torch.manual_seed(42)3 bert_cls = DistilBertForSequenceClassification.from_pretrained(4 'distilbert-base-uncased', num_labels=25 )It comes with a warning:OutputYou should probably TRAIN this model on a down-stream task to beable to use it for predictions and inference.It makes sense!Since ours is a binary classification task, the num_labels argument is two, whichhappens to be the default value. Unfortunately, at the time of writing, thedocumentation is not as explicit as it should be in this case. There is no mention ofnum_labels as a possible argument of the model, and it’s only referred to in thedocumentation of the forward() method ofDistilBertForSequenceClassification (highlights are mine):• labels (torch.LongTensor of shape (batch_size,), optional) – Labels forcomputing the sequence classification / regression loss. Indices should be in [0,..., config.num_labels - 1]. If config.num_labels == 1 a regression loss iscomputed (Mean-Square loss), If config.num_labels > 1 a classification lossis computed (Cross-Entropy).Some of the returning values of the forward() method also include references tothe num_labels argument:• loss (torch.FloatTensor of shape (1,), optional, returned when labels isprovided) – Classification (or regression if config.num_labels==1) loss.• logits (torch.FloatTensor of shape (batch_size, config.num_labels)) –Classification (or regression if config.num_labels==1) scores (before SoftMax).That’s right! DistilBertForSequenceClassification (or any otherForSequenceClassification model) can be used for regression too as long as you990 | Chapter 11: Down the Yellow Brick Rabbit Hole

Well, you probably don’t want to go through all this trouble—adjusting the datasets

and writing a model class—to fine-tune a BERT model, right?

Say no more!

Fine-Tuning with HuggingFace

What if I told you that there is a BERT model for every task, and you just need to

fine-tune it? Cool, isn’t it? Then, what if I told you that you can use a trainer to do

most of the fine-tuning work for you? Amazing, right? The HuggingFace library is

that good, really!

There are BERT models available for many different tasks:

• Pre-training tasks:

◦ Masked language model (BertForMaskedLM)

◦ Next sentence prediction (BertForNextSentencePrediction)

• Typical tasks (also available as AutoModel):

◦ Sequence classification (BertForSequenceClassification)

◦ Token classification (BertForTokenClassification)

◦ Question answering (BertForQuestionAnswering)

• BERT (and family) specific:

◦ Multiple choice (BertForMultipleChoice)

We’re sticking with the sequence classification task using DistilBERT instead of

regular BERT so as to make the fine-tuning faster.

Sequence Classification (or Regression)

Let’s load the pre-trained model using its corresponding class:

Fine-Tuning with HuggingFace | 989

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!