09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

poss_as_catints.append(tf.keras.utils.to_categorical(p,

num_classes=target_vocab_size+1, dtype="int32"))

poss_as_catints = tf.keras.preprocessing.sequence.pad_sequences(

poss_as_catints, maxlen=max_seqlen)

dataset = tf.data.Dataset.from_tensor_slices(

(sents_as_ints, poss_as_catints))

idx2word_s[0], idx2word_t[0] = "PAD", "PAD"

# split into training, validation, and test datasets

dataset = dataset.shuffle(10000)

test_size = len(sents) // 3

val_size = (len(sents) - test_size) // 10

test_dataset = dataset.take(test_size)

val_dataset = dataset.skip(test_size).take(val_size)

train_dataset = dataset.skip(test_size + val_size)

# create batches

batch_size = 128

train_dataset = train_dataset.batch(batch_size)

val_dataset = val_dataset.batch(batch_size)

test_dataset = test_dataset.batch(batch_size)

Chapter 8

Next, we will define our model and instantiate it. Our model is a sequential model

consisting of an embedding layer, a dropout layer, a bidirectional GRU layer, a dense

layer, and a softmax activation layer. The input is a batch of integer sequences, with

shape (batch_size, max_seqlen). When passed through the embedding layer, each

integer in the sequence is converted to a vector of size (embedding_dim), so now the

shape of our tensor is (batch_size, max_seqlen, embedding_dim). Each of these

vectors are passed to corresponding time steps of a bidirectional GRU with an output

dimension of 256. Because the GRU is bidirectional, this is equivalent to stacking one

GRU on top of the other, so the tensor that comes out of the bidirectional GRU has

the dimension (batch_size, max_seqlen, 2*rnn_output_dimension). Each timestep

tensor of shape (batch_size, 1, 2*rnn_output_dimension) is fed into a dense layer,

which converts each time step to a vector of the same size as the target vocabulary,

that is, (batch_size, number_of_timesteps, output_vocab_size). Each time step

represents a probability distribution of output tokens, so the final softmax layer is

applied to each time step to return a sequence of output POS tokens.

[ 311 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!