09.05.2023 Views

pdfcoffee

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

As with the OpenAI paper, BERT proposes configurations for using it for several

supervised learning tasks such as single, and multiple-sentence classification,

question answering, and tagging.

Chapter 7

The BERT model comes in two major flavors—BERT-base and BERT-large.

BERT-base has 12 encoder layers, 768 hidden units, and 12 attention heads, with

110 million parameters in all. BERT-large has 24 encoder layers, 1024 hidden units,

and 16 attention heads, with 340 million parameters. More details can be found in the

BERT GitHub repository [34].

BERT Pretraining is a very expensive process and can currently only be achieved

using Tensor Processing Units (TPUs), which are only available from Google via

its Colab network [32] or Google Cloud Platform [33]. However, fine-tuning the

BERT-base with custom datasets is usually achievable on GPU instances.

Once the BERT model is fine-tuned for your domain, the embeddings from the

last four hidden layers usually produce good results for downstream tasks. Which

embedding or combination of embeddings (via summing, averaging, max-pooling,

or concatenating) to use is usually based on the type of task.

In the following sections, we will look at how to work with the BERT language

model for various language model embedding related tasks.

Using BERT as a feature extractor

The BERT project [34] provides a set of Python scripts that can be run from the

command line to fine-tune BERT:

$ git clone https://github.com/google-research/bert.git

$ cd bert

We then download the appropriate BERT model we want to fine-tune. As mentioned

earlier, BERT comes in two sizes—BERT-base and BERT-large. In addition, each

model has a cased and uncased version. The cased version differentiates between

upper and lowercase words, while the uncased version does not. For our example,

we will use the BERT-base-uncased pretrained model. You can find the download

URL for this and the other models further down the README.md page:

$ mkdir data

$ cd data

$ wget \

https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-

768_A-12.zip

$ unzip -a uncased_L-12_H-768_A-12.zip

[ 267 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!