09.05.2023 Views

pdfcoffee

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 7

We then instantiate a tokenizer and model using the wrappers from the

Transformers library. The underlying model file comes from the pretrained BERT

base cased model. Notice that the model class is a TensorFlow compatible class since

we will fine-tune it from our TensorFlow 2.x code:

tokenizer = BertTokenizer.from_pretrained("bert-base-cased")

model = TFBertForSequenceClassification.from_pretrained("bert-basecased")

We then load up the training and validation data from the tensorflow-datasets

package using its API, then create TensorFlow datasets that will be used to fine-tune

our model:

# Load dataset via TensorFlow Datasets

data, info = tensorflow_datasets.load(

"glue/mrpc", with_info=True)

num_train = info.splits["train"].num_examples

num_valid = info.splits["validation"].num_examples

# Prepare dataset for GLUE as a tf.data.Dataset instance

Xtrain = glue_convert_examples_to_features(

data["train"], tokenizer, 128, "mrpc")

Xtrain = Xtrain.shuffle(128).batch(BATCH_SIZE).repeat(-1)

Xvalid = glue_convert_examples_to_features(

data["validation"], tokenizer, 128, "mrpc")

Xvalid = Xvalid.batch(BATCH_SIZE)

We then define our loss function, optimizer, and metric, and fit the model for a few

epochs of training. Since we are fine-tuning the model, the number of epochs is only

two, and our learning rate is also very small:

opt = tf.keras.optimizers.Adam(

learning_rate=3e-5, epsilon=1e-08)

loss = tf.keras.losses.SparseCategoricalCrossentropy(

from_logits=True)

metric = tf.keras.metrics.SparseCategoricalAccuracy("accuracy")

model.compile(optimizer=opt, loss=loss, metrics=[metric])

train_steps = num_train // BATCH_SIZE

valid_steps = num_valid // BATCH_SIZE

history = model.fit(Xtrain,

epochs=2, steps_per_epoch=train_steps,

validation_data=Xvalid, validation_steps=valid_steps)

[ 273 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!