Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Model Configuration1 class BERTClassifier(nn.Module):2 def __init__(self, bert_model, ff_units,3 n_outputs, dropout=0.3):4 super().__init__()5 self.d_model = bert_model.config.dim6 self.n_outputs = n_outputs7 self.encoder = bert_model8 self.mlp = nn.Sequential(9 nn.Linear(self.d_model, ff_units),10 nn.ReLU(),11 nn.Dropout(dropout),12 nn.Linear(ff_units, n_outputs)13 )1415 def encode(self, source, source_mask=None):16 states = self.encoder(17 input_ids=source, attention_mask=source_mask)[0]18 cls_state = states[:, 0]19 return cls_state2021 def forward(self, X):22 source_mask = (X > 0)23 # Featurizer24 cls_state = self.encode(X, source_mask)25 # Classifier26 out = self.mlp(cls_state)27 return outBoth encode() and forward() methods are roughly the same as before, but theclassifier (mlp) has both hidden and dropout layers now.Our model takes an instance of a pre-trained BERT model, the number of units inthe hidden layer of the classifier, and the desired number of outputs (logits)corresponding to the number of existing classes. The forward() method takes amini-batch of token IDs, encodes them using BERT (featurizer), and outputs logits(classifier)."Why does the model compute the source mask itself instead of usingthe output from the tokenizer?"BERT | 985

Good catch! I know that’s less than ideal, but our StepByStep class can only take asingle mini-batch of inputs, and no additional information like the attention masks.Of course, we could modify our class to handle that, but HuggingFace has its owntrainer (more on that soon!), so there’s no point in doing so.This is actually the last time we’ll use the StepByStep class since it requires toomany adjustments to the inputs to work well with HuggingFace’s tokenizers andmodels.Data PreparationTo turn the sentences in our datasets into mini-batches of token IDs and labels fora binary classification task, we can create a helper function that takes an HF’sDataset, the names of the fields corresponding to the sentences and labels, and atokenizer and builds a TensorDataset out of them:From HF’s Dataset to Tokenized TensorDataset1 def tokenize_dataset(hf_dataset, sentence_field,2 label_field, tokenizer, **kwargs):3 sentences = hf_dataset[sentence_field]4 token_ids = tokenizer(5 sentences, return_tensors='pt', **kwargs6 )['input_ids']7 labels = torch.as_tensor(hf_dataset[label_field])8 dataset = TensorDataset(token_ids, labels)9 return datasetFirst, we create a tokenizer and define the parameters we’ll use while tokenizingthe sentences:Data Preparation1 auto_tokenizer = AutoTokenizer.from_pretrained(2 'distilbert-base-uncased'3 )4 tokenizer_kwargs = dict(truncation=True,5 padding=True,6 max_length=30,7 add_special_tokens=True)986 | Chapter 11: Down the Yellow Brick Rabbit Hole

Model Configuration

1 class BERTClassifier(nn.Module):

2 def __init__(self, bert_model, ff_units,

3 n_outputs, dropout=0.3):

4 super().__init__()

5 self.d_model = bert_model.config.dim

6 self.n_outputs = n_outputs

7 self.encoder = bert_model

8 self.mlp = nn.Sequential(

9 nn.Linear(self.d_model, ff_units),

10 nn.ReLU(),

11 nn.Dropout(dropout),

12 nn.Linear(ff_units, n_outputs)

13 )

14

15 def encode(self, source, source_mask=None):

16 states = self.encoder(

17 input_ids=source, attention_mask=source_mask)[0]

18 cls_state = states[:, 0]

19 return cls_state

20

21 def forward(self, X):

22 source_mask = (X > 0)

23 # Featurizer

24 cls_state = self.encode(X, source_mask)

25 # Classifier

26 out = self.mlp(cls_state)

27 return out

Both encode() and forward() methods are roughly the same as before, but the

classifier (mlp) has both hidden and dropout layers now.

Our model takes an instance of a pre-trained BERT model, the number of units in

the hidden layer of the classifier, and the desired number of outputs (logits)

corresponding to the number of existing classes. The forward() method takes a

mini-batch of token IDs, encodes them using BERT (featurizer), and outputs logits

(classifier).

"Why does the model compute the source mask itself instead of using

the output from the tokenizer?"

BERT | 985

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!