pdfcoffee
Chapter 7$ python run_classifier.py \--task_name=COLA|MRPC \--do_train=true \--do_eval=true \--do_predict=true \--data_dir=${CLASSIFIER_DATA} \--vocab_file=${BERT_BASE_DIR}/vocab.txt \--bert_config_file=${BERT_BASE_DIR}/bert_config_file.json \--init_checkpoint=${BERT_BASE_DIR}/bert_model.ckpt \--max_seq_length=128 \--train_batch_size=8 \--learning_rate=2e-5 \--num_train_epochs=2.0 \--output_dir=${TRAINED_CLASSIFIER}To predict only using a trained network, turn the --do_train and --do_eval flagsto false.Using BERT as part of your own networkCurrently BERT is available on TensorFlow Hub as an estimator, but at the momentit is not fully compliant with TensorFlow 2.x, in the sense that it is not yet callableas a tf.hub.KerasLayer. Meanwhile, Zweig demonstrates how to include BERTin your Keras/TensorFlow 1.x-based network in his blog post [35].The more popular way to use BERT in your own TensorFlow 2.x code is via theHuggingFace Transformers library. This library provides convenience classes forvarious popular Transformer architectures such as BERT, as well as convenienceclasses for fine-tuning on several downstream tasks. It was originally writtenfor PyTorch, but has since been extended with convenience classes callable fromTensorFlow as well. However, in order to use this library, you must have PyTorchinstalled as well.The Transformers library provides the following classes:1. A set of Transformer classes for 10 (at the time of writing) differentTransformer architectures that can be instantiated from PyTorch client code.The naming convention is to append "Model" to the name of the architecture,for example, BertModel, XLNetModel, and so on. There is also a correspondingset of classes that can be instantiated from TensorFlow 2.x code; these areprefixed by "TF", for example, TFBertModel, TFXLNetModel, and so on.[ 271 ]
Word Embeddings2. Each Transformer model has a corresponding tokenizer class, which knowhow to tokenize text input for these classes. So the tokenizer correspondingto the BERT model is named BertTokenizer.3. Each Transformer class has a set of convenience classes to allow finetuningthe Transformer model to a set of downstream tasks. For example,the convenience classes corresponding to BERT are BertForPreTraining,BertForMaskedLM, BertForNextSentencePrediction,BertForSequenceClassification, BertForMultipleChoice,BertForTokenClassification, and BertForQuestionAnswering.In order to install PyTorch, head over to the PyTorch site (http://pytorch.org) and find the section titled Quick Start Locally. Underit is a form where you have to specify some information about yourplatform, and the site will generate an installation command thatwill download and install PyTorch on your environment. Copy thecommand to your terminal and run it to install PyTorch in yourenvironment.Once you have installed PyTorch, install the Transformers libraryusing the following pip command. Refer to https://github.com/huggingface/transformers for additional documentationon how to use the library:$ pip install transformersIn order to run the example, you will also need to install thetensorflow_datasets package. You can do so using the pipcommand as follows:$ pip install tensorflow-datasetsThe following code instantiates a BERT cased model and fine-tunes it with datafrom the MRPC dataset. The MRPC task tries to predict if a pair of sentences areparaphrases of one another. The dataset is available from the tensorflow-datasetspackage. As usual, we first import the necessary libraries:import osimport tensorflow as tfimport tensorflow_datasetsfrom transformers import BertTokenizer, \TFBertForSequenceClassification, BertForSequenceClassification,\ glue_convert_examples_to_featuresWe declare a few constants that we will use later in the code:BATCH_SIZE = 32FINE_TUNED_MODEL_DIR = "./data/"[ 272 ]
- Page 256 and 257: Chapter 6initializer = tf.random_no
- Page 258 and 259: Cool, right? Now we can define the
- Page 260 and 261: Chapter 6d_loss = (dA_loss + dB_los
- Page 262 and 263: Chapter 6generator_AB.save_weights(
- Page 264: 6. Ledig, Christian, et al. Photo-R
- Page 267 and 268: Word EmbeddingsDeep learning models
- Page 269 and 270: Word EmbeddingsFor example, "crucia
- Page 271 and 272: Word EmbeddingsAssuming a window si
- Page 273 and 274: Word EmbeddingsGloVeThe Global vect
- Page 275 and 276: Word Embeddingsgensim is an open so
- Page 277 and 278: Word Embeddingsgensim also provides
- Page 279 and 280: Word EmbeddingsSpecifically, we wil
- Page 281 and 282: Word EmbeddingsWe will also convert
- Page 283 and 284: Word EmbeddingsE = np.zeros((vocab_
- Page 285 and 286: Word Embeddingsx = self.embedding(x
- Page 287 and 288: Word EmbeddingsThe change in valida
- Page 289 and 290: Word EmbeddingsThe dataset is a 114
- Page 291 and 292: Word Embeddingsprint("random walks
- Page 293 and 294: Word Embeddingssize=128, # size of
- Page 295 and 296: Word EmbeddingsfastText computes em
- Page 297 and 298: Word EmbeddingsIn the future, once
- Page 299 and 300: Word EmbeddingsA much earlier relat
- Page 301 and 302: Word EmbeddingsOnce you have the fi
- Page 303 and 304: Word EmbeddingsThis will create the
- Page 305: Word EmbeddingsClassifying with BER
- Page 309 and 310: Word EmbeddingsOnce trained, we sav
- Page 311 and 312: Word Embeddings4. Pennington, J., S
- Page 313 and 314: Word Embeddings34. Google Research,
- Page 315 and 316: Recurrent Neural NetworksWe will th
- Page 317 and 318: Recurrent Neural NetworksFor notati
- Page 319 and 320: Recurrent Neural NetworksThis probl
- Page 321 and 322: Recurrent Neural NetworksThe line a
- Page 323 and 324: Recurrent Neural NetworksGated recu
- Page 325 and 326: Recurrent Neural NetworksThis probl
- Page 327 and 328: Recurrent Neural NetworksThe topolo
- Page 329 and 330: Recurrent Neural Networkstexts = do
- Page 331 and 332: Recurrent Neural Networksdef call(s
- Page 333 and 334: Recurrent Neural Networks# callback
- Page 335 and 336: Recurrent Neural NetworksExample
- Page 337 and 338: Recurrent Neural NetworksAs can be
- Page 339 and 340: Recurrent Neural Networksdata_dir =
- Page 341 and 342: Recurrent Neural NetworksWe can als
- Page 343 and 344: Recurrent Neural NetworksIn order t
- Page 345 and 346: Recurrent Neural Networkssource_voc
- Page 347 and 348: Recurrent Neural NetworksFinally, w
- Page 349 and 350: Recurrent Neural Networks38 - val_l
- Page 351 and 352: Recurrent Neural NetworksIf you wou
- Page 353 and 354: Recurrent Neural NetworksExample
- Page 355 and 356: Recurrent Neural NetworksNext we ha
Chapter 7
$ python run_classifier.py \
--task_name=COLA|MRPC \
--do_train=true \
--do_eval=true \
--do_predict=true \
--data_dir=${CLASSIFIER_DATA} \
--vocab_file=${BERT_BASE_DIR}/vocab.txt \
--bert_config_file=${BERT_BASE_DIR}/bert_config_file.json \
--init_checkpoint=${BERT_BASE_DIR}/bert_model.ckpt \
--max_seq_length=128 \
--train_batch_size=8 \
--learning_rate=2e-5 \
--num_train_epochs=2.0 \
--output_dir=${TRAINED_CLASSIFIER}
To predict only using a trained network, turn the --do_train and --do_eval flags
to false.
Using BERT as part of your own network
Currently BERT is available on TensorFlow Hub as an estimator, but at the moment
it is not fully compliant with TensorFlow 2.x, in the sense that it is not yet callable
as a tf.hub.KerasLayer. Meanwhile, Zweig demonstrates how to include BERT
in your Keras/TensorFlow 1.x-based network in his blog post [35].
The more popular way to use BERT in your own TensorFlow 2.x code is via the
HuggingFace Transformers library. This library provides convenience classes for
various popular Transformer architectures such as BERT, as well as convenience
classes for fine-tuning on several downstream tasks. It was originally written
for PyTorch, but has since been extended with convenience classes callable from
TensorFlow as well. However, in order to use this library, you must have PyTorch
installed as well.
The Transformers library provides the following classes:
1. A set of Transformer classes for 10 (at the time of writing) different
Transformer architectures that can be instantiated from PyTorch client code.
The naming convention is to append "Model" to the name of the architecture,
for example, BertModel, XLNetModel, and so on. There is also a corresponding
set of classes that can be instantiated from TensorFlow 2.x code; these are
prefixed by "TF", for example, TFBertModel, TFXLNetModel, and so on.
[ 271 ]