pdfcoffee
Chapter 819/19 [==============================] - 6s 291ms/step - loss: 0.0575 -accuracy: 0.9833 - masked_accuracy_fn: 0.8140 - val_loss: 0.1569 - val_accuracy: 0.9615 - val_masked_accuracy_fn: 0.551111/11 [==============================] - 2s 170ms/step - loss: 0.1436 -accuracy: 0.9637 - masked_accuracy_fn: 0.5786test loss: 0.144, test accuracy: 0.963, masked test accuracy: 0.578Here are some examples of POS tags generated for some random sentences in the testset, shown together with the POS tags in the corresponding ground truth sentences.As you can see, while the metric values are not perfect, it seems to have learned to doPOS tagging fairly well:labeled : among/IN segments/NNS that/WDT t/NONE 1/VBP continue/NONE 2/TO to/VB operate/RB though/DT the/NN company/POS 's/NN steel/NN division/VBD continued/NONE 3/TO to/VB suffer/IN from/JJ soft/NN demand/IN for/PRPits/JJ tubular/NNS goods/VBG serving/DT the/NN oil/NN industry/CC and/JJother/NNSpredicted: among/IN segments/NNS that/WDT t/NONE 1/NONE continue/NONE 2/TO to/VB operate/IN though/DT the/NN company/NN 's/NN steel/NN division/NONE continued/NONE 3/TO to/IN suffer/IN from/IN soft/JJ demand/NN for/INits/JJ tubular/NNS goods/DT serving/DT the/NNP oil/NN industry/CC and/JJother/NNSlabeled : as/IN a/DT result/NN ms/NNP ganes/NNP said/VBD 0/NONE t/NONE2/PRP it/VBZ is/VBN believed/IN that/JJ little/CC or/DT no/NN sugar/INfrom/DT the/CD 1989/NN 90/VBZ crop/VBN has/VBN been/NONE shipped/RB 1/RByet/IN even/DT though/NN the/NN crop/VBZ year/CD is/NNS six/JJpredicted: as/IN a/DT result/NN ms/IN ganes/NNP said/VBD 0/NONE t/NONE 2/PRP it/VBZ is/VBN believed/NONE that/DT little/NN or/DT no/NN sugar/INfrom/DT the/DT 1989/CD 90/NN crop/VBZ has/VBN been/VBN shipped/VBN 1/RByet/RB even/IN though/DT the/NN crop/NN year/NN is/JJlabeled : in/IN the/DT interview/NN at/IN headquarters/NN yesterday/NNafternoon/NN both/DT men/NNS exuded/VBD confidence/NN and/CC seemed/VBD1/NONE to/TO work/VB well/RB together/RBpredicted: in/IN the/DT interview/NN at/IN headquarters/NN yesterday/NNafternoon/NN both/DT men/NNS exuded/NNP confidence/NN and/CC seemed/VBD1/NONE to/TO work/VB well/RB together/RBlabeled : all/DT came/VBD from/IN cray/NNP research/NNPpredicted: all/NNP came/VBD from/IN cray/NNP research/NNPlabeled : primerica/NNP closed/VBD at/IN 28/CD 25/NONE u/RB down/CD 50/NNSpredicted: primerica/NNP closed/VBD at/CD 28/CD 25/CD u/CD down/CD[ 315 ]
Recurrent Neural NetworksIf you would like to run this code yourself, you can find the code in the code folderfor this chapter. In order to run it from the command line, enter the followingcommand. The output is written to the console:$ python gru_pos_tagger.pyNow that we have seen some examples of three common RNN network topologies,let us explore the most popular of them all – the seq2seq model, also known as theRecurrent encoder-decoder architecture.Encoder-Decoder architecture – seq2seqThe example of a many-to-many network we just saw was mostly similar to themany-to-one network. The one important difference was that the RNN returnsoutputs at each time step instead of a single combined output at the end. One othernoticeable feature was that the number of input time steps was equal to the numberof output time steps. As you learn about the encoder-decoder architecture, which isthe "other," and arguably more popular, style of a many-to-many network, you willnotice another difference – the output is in line with the input in a many-to-manynetwork, that is, it is not necessary for the network to wait until all of the input isconsumed before generating the output.The Encoder-Decoder architecture is also called a seq2seq model. As the nameimplies, the network is composed of an encoder and a decoder part, both RNNbased,and capable of consuming and returning sequences of outputs correspondingto multiple time steps. The biggest application of the seq2seq network has been inneural machine translation, although it is equally applicable for problems that can beroughly structured as translation problems. Some examples are sentence parsing[10] and image captioning [24]. The seq2seq model has also been used for time seriesanalysis [25] and question answering.In the seq2seq model, the encoder consumes the source sequence, which is a batchof integer sequences. The length of the sequence is the number of input time steps,which corresponds to the maximum input sequence length (padded or truncatedas necessary). Thus the dimensions of the input tensor is (batch_size, number_of_encoder_timesteps). This is passed into an embedding layer, which will convertthe integer at each time step to an embedding vector. The output of the embeddingis a tensor of shape (batch_size, number_of_encoder_timesteps, encoder_embedding_dim).[ 316 ]
- Page 299 and 300: Word EmbeddingsA much earlier relat
- Page 301 and 302: Word EmbeddingsOnce you have the fi
- Page 303 and 304: Word EmbeddingsThis will create the
- Page 305 and 306: Word EmbeddingsClassifying with BER
- Page 307 and 308: Word Embeddings2. Each Transformer
- Page 309 and 310: Word EmbeddingsOnce trained, we sav
- Page 311 and 312: Word Embeddings4. Pennington, J., S
- Page 313 and 314: Word Embeddings34. Google Research,
- Page 315 and 316: Recurrent Neural NetworksWe will th
- Page 317 and 318: Recurrent Neural NetworksFor notati
- Page 319 and 320: Recurrent Neural NetworksThis probl
- Page 321 and 322: Recurrent Neural NetworksThe line a
- Page 323 and 324: Recurrent Neural NetworksGated recu
- Page 325 and 326: Recurrent Neural NetworksThis probl
- Page 327 and 328: Recurrent Neural NetworksThe topolo
- Page 329 and 330: Recurrent Neural Networkstexts = do
- Page 331 and 332: Recurrent Neural Networksdef call(s
- Page 333 and 334: Recurrent Neural Networks# callback
- Page 335 and 336: Recurrent Neural NetworksExample
- Page 337 and 338: Recurrent Neural NetworksAs can be
- Page 339 and 340: Recurrent Neural Networksdata_dir =
- Page 341 and 342: Recurrent Neural NetworksWe can als
- Page 343 and 344: Recurrent Neural NetworksIn order t
- Page 345 and 346: Recurrent Neural Networkssource_voc
- Page 347 and 348: Recurrent Neural NetworksFinally, w
- Page 349: Recurrent Neural Networks38 - val_l
- Page 353 and 354: Recurrent Neural NetworksExample
- Page 355 and 356: Recurrent Neural NetworksNext we ha
- Page 357 and 358: Recurrent Neural Networksself.embed
- Page 359 and 360: Recurrent Neural NetworksThis is a
- Page 361 and 362: Recurrent Neural Networksreturn np.
- Page 363 and 364: Recurrent Neural NetworksAttention
- Page 365 and 366: Recurrent Neural NetworksFinally, V
- Page 367 and 368: Recurrent Neural Networks# query.sh
- Page 369 and 370: Recurrent Neural Networksself.atten
- Page 371 and 372: Recurrent Neural Networks30 try to
- Page 373 and 374: Recurrent Neural Networks3. Because
- Page 375 and 376: Recurrent Neural NetworksSummaryIn
- Page 377 and 378: Recurrent Neural Networks18. Shi, X
- Page 380 and 381: AutoencodersAutoencoders are feed-f
- Page 382 and 383: Depending upon the actual dimension
- Page 384 and 385: • __init__(): Here, you define al
- Page 386 and 387: Chapter 9And then we reshape the te
- Page 388 and 389: Chapter 9plt.imshow(x_test[index].r
- Page 390 and 391: Chapter 9Keeping the rest of the co
- Page 392 and 393: noise = np.random.normal(loc=0.5, s
- Page 394 and 395: Chapter 9x_train,validation_data=(x
- Page 396 and 397: Chapter 9import matplotlib.pyplot a
- Page 398 and 399: Chapter 9self.conv4 = Conv2D(1, 3,
Recurrent Neural Networks
If you would like to run this code yourself, you can find the code in the code folder
for this chapter. In order to run it from the command line, enter the following
command. The output is written to the console:
$ python gru_pos_tagger.py
Now that we have seen some examples of three common RNN network topologies,
let us explore the most popular of them all – the seq2seq model, also known as the
Recurrent encoder-decoder architecture.
Encoder-Decoder architecture – seq2seq
The example of a many-to-many network we just saw was mostly similar to the
many-to-one network. The one important difference was that the RNN returns
outputs at each time step instead of a single combined output at the end. One other
noticeable feature was that the number of input time steps was equal to the number
of output time steps. As you learn about the encoder-decoder architecture, which is
the "other," and arguably more popular, style of a many-to-many network, you will
notice another difference – the output is in line with the input in a many-to-many
network, that is, it is not necessary for the network to wait until all of the input is
consumed before generating the output.
The Encoder-Decoder architecture is also called a seq2seq model. As the name
implies, the network is composed of an encoder and a decoder part, both RNNbased,
and capable of consuming and returning sequences of outputs corresponding
to multiple time steps. The biggest application of the seq2seq network has been in
neural machine translation, although it is equally applicable for problems that can be
roughly structured as translation problems. Some examples are sentence parsing
[10] and image captioning [24]. The seq2seq model has also been used for time series
analysis [25] and question answering.
In the seq2seq model, the encoder consumes the source sequence, which is a batch
of integer sequences. The length of the sequence is the number of input time steps,
which corresponds to the maximum input sequence length (padded or truncated
as necessary). Thus the dimensions of the input tensor is (batch_size, number_of_
encoder_timesteps). This is passed into an embedding layer, which will convert
the integer at each time step to an embedding vector. The output of the embedding
is a tensor of shape (batch_size, number_of_encoder_timesteps, encoder_
embedding_dim).
[ 316 ]