pdfcoffee
Chapter 8RNN topologiesWe have seen examples of how MLP and CNN architectures can be composed toform more complex networks. RNNs offer yet another degree of freedom, in thatit allows sequence input and output. This means that RNN cells can be arrangedin different ways to build networks that are adapted to solve different types ofproblems. Figure 4 shows five different configurations of inputs, hidden layers, andoutputs, represented by red, green, and blue boxes respectively:Of these, the first one (one-to-one) is not interesting from a sequence processingpoint of view, since it can be implemented as a simple Dense network with one inputand one output.The one-to-many case has a single input and outputs a sequence. An example of sucha network might be a network that can generate text tags from images [6], containingshort text descriptions of different aspects of the image. Such a network would betrained with image input and labeled sequences of text representing the image tags:Figure 4: Common RNN topologies. Image Source: Andrej Karpathy [5]The many-to-one case is the reverse; it takes a sequence of tensors as input butoutputs a single tensor. Examples of such networks would be a sentiment analysisnetwork [7], which takes as input a block of text such as a movie review and outputsa single sentiment value.The many-to-many use case comes in two flavors. The first one is more popularand is better known as the seq2seq model. In this model, a sequence is read in andproduces a context vector representing the input sequence, which is used to generatethe output sequence.[ 291 ]
Recurrent Neural NetworksThe topology has been used with great success in the field of machine translation,as well as problems that can be reframed as machine translation problems. Reallife examples of the former can be found in [8, 9], and an example of the latter isdescribed in [10].The second many-to-many type has an output cell corresponding to each inputcell. This kind of network is suited for use cases where there is a 1:1 correspondencebetween the input and output, such as time series. The major difference betweenthis model and the seq2seq model is that the input does not have to be completelyencoded before the decoding process begins.In the next three sections, we provide examples of a one-to-many network thatlearns to generate text, a many-to-one network that does sentiment analysis, and amany-to-many network of the second type, which predicts Part-of-speech (POS) forwords in a sentence. Because of the popularity of the seq2seq network, we will coverit in more detail later in this chapter.Example ‒ One-to-Many – learning to generatetextRNNs have been used extensively by the Natural Language Processing (NLP)community for various applications. One such application is to build languagemodels. A language model is a model that allows us to predict the probability of aword in a text given previous words. Language models are important for varioushigher-level tasks such as machine translation, spelling correction, and so on.The ability of a language model to predict the next word in a sequence makes ita generative model that allows us to generate text by sampling from the outputprobabilities of different words in the vocabulary. The training data is a sequenceof words, and the label is the word appearing at the next time step in the sequence.For our example, we will train a character-based RNN on the text of the children'sstories "Alice in Wonderland" and its sequel "Through the Looking Glass" byLewis Carroll. We have chosen to build a character-based model because it has asmaller vocabulary and trains quicker. The idea is the same as training and using aword-based language model, except we will use characters instead of words. Oncetrained, the model can be used to generate some text in the same style.The data for our example will come from the plain texts of two novels from theProject Gutenberg website [36]. Input to the network are sequences of 100 characters,and the corresponding output is another sequence of 100 characters, offset from theinput by 1 position.[ 292 ]
- Page 275 and 276: Word Embeddingsgensim is an open so
- Page 277 and 278: Word Embeddingsgensim also provides
- Page 279 and 280: Word EmbeddingsSpecifically, we wil
- Page 281 and 282: Word EmbeddingsWe will also convert
- Page 283 and 284: Word EmbeddingsE = np.zeros((vocab_
- Page 285 and 286: Word Embeddingsx = self.embedding(x
- Page 287 and 288: Word EmbeddingsThe change in valida
- Page 289 and 290: Word EmbeddingsThe dataset is a 114
- Page 291 and 292: Word Embeddingsprint("random walks
- Page 293 and 294: Word Embeddingssize=128, # size of
- Page 295 and 296: Word EmbeddingsfastText computes em
- Page 297 and 298: Word EmbeddingsIn the future, once
- Page 299 and 300: Word EmbeddingsA much earlier relat
- Page 301 and 302: Word EmbeddingsOnce you have the fi
- Page 303 and 304: Word EmbeddingsThis will create the
- Page 305 and 306: Word EmbeddingsClassifying with BER
- Page 307 and 308: Word Embeddings2. Each Transformer
- Page 309 and 310: Word EmbeddingsOnce trained, we sav
- Page 311 and 312: Word Embeddings4. Pennington, J., S
- Page 313 and 314: Word Embeddings34. Google Research,
- Page 315 and 316: Recurrent Neural NetworksWe will th
- Page 317 and 318: Recurrent Neural NetworksFor notati
- Page 319 and 320: Recurrent Neural NetworksThis probl
- Page 321 and 322: Recurrent Neural NetworksThe line a
- Page 323 and 324: Recurrent Neural NetworksGated recu
- Page 325: Recurrent Neural NetworksThis probl
- Page 329 and 330: Recurrent Neural Networkstexts = do
- Page 331 and 332: Recurrent Neural Networksdef call(s
- Page 333 and 334: Recurrent Neural Networks# callback
- Page 335 and 336: Recurrent Neural NetworksExample
- Page 337 and 338: Recurrent Neural NetworksAs can be
- Page 339 and 340: Recurrent Neural Networksdata_dir =
- Page 341 and 342: Recurrent Neural NetworksWe can als
- Page 343 and 344: Recurrent Neural NetworksIn order t
- Page 345 and 346: Recurrent Neural Networkssource_voc
- Page 347 and 348: Recurrent Neural NetworksFinally, w
- Page 349 and 350: Recurrent Neural Networks38 - val_l
- Page 351 and 352: Recurrent Neural NetworksIf you wou
- Page 353 and 354: Recurrent Neural NetworksExample
- Page 355 and 356: Recurrent Neural NetworksNext we ha
- Page 357 and 358: Recurrent Neural Networksself.embed
- Page 359 and 360: Recurrent Neural NetworksThis is a
- Page 361 and 362: Recurrent Neural Networksreturn np.
- Page 363 and 364: Recurrent Neural NetworksAttention
- Page 365 and 366: Recurrent Neural NetworksFinally, V
- Page 367 and 368: Recurrent Neural Networks# query.sh
- Page 369 and 370: Recurrent Neural Networksself.atten
- Page 371 and 372: Recurrent Neural Networks30 try to
- Page 373 and 374: Recurrent Neural Networks3. Because
- Page 375 and 376: Recurrent Neural NetworksSummaryIn
Chapter 8
RNN topologies
We have seen examples of how MLP and CNN architectures can be composed to
form more complex networks. RNNs offer yet another degree of freedom, in that
it allows sequence input and output. This means that RNN cells can be arranged
in different ways to build networks that are adapted to solve different types of
problems. Figure 4 shows five different configurations of inputs, hidden layers, and
outputs, represented by red, green, and blue boxes respectively:
Of these, the first one (one-to-one) is not interesting from a sequence processing
point of view, since it can be implemented as a simple Dense network with one input
and one output.
The one-to-many case has a single input and outputs a sequence. An example of such
a network might be a network that can generate text tags from images [6], containing
short text descriptions of different aspects of the image. Such a network would be
trained with image input and labeled sequences of text representing the image tags:
Figure 4: Common RNN topologies. Image Source: Andrej Karpathy [5]
The many-to-one case is the reverse; it takes a sequence of tensors as input but
outputs a single tensor. Examples of such networks would be a sentiment analysis
network [7], which takes as input a block of text such as a movie review and outputs
a single sentiment value.
The many-to-many use case comes in two flavors. The first one is more popular
and is better known as the seq2seq model. In this model, a sequence is read in and
produces a context vector representing the input sequence, which is used to generate
the output sequence.
[ 291 ]