09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Recurrent Neural Networks

The topology has been used with great success in the field of machine translation,

as well as problems that can be reframed as machine translation problems. Real

life examples of the former can be found in [8, 9], and an example of the latter is

described in [10].

The second many-to-many type has an output cell corresponding to each input

cell. This kind of network is suited for use cases where there is a 1:1 correspondence

between the input and output, such as time series. The major difference between

this model and the seq2seq model is that the input does not have to be completely

encoded before the decoding process begins.

In the next three sections, we provide examples of a one-to-many network that

learns to generate text, a many-to-one network that does sentiment analysis, and a

many-to-many network of the second type, which predicts Part-of-speech (POS) for

words in a sentence. Because of the popularity of the seq2seq network, we will cover

it in more detail later in this chapter.

Example ‒ One-to-Many – learning to generate

text

RNNs have been used extensively by the Natural Language Processing (NLP)

community for various applications. One such application is to build language

models. A language model is a model that allows us to predict the probability of a

word in a text given previous words. Language models are important for various

higher-level tasks such as machine translation, spelling correction, and so on.

The ability of a language model to predict the next word in a sequence makes it

a generative model that allows us to generate text by sampling from the output

probabilities of different words in the vocabulary. The training data is a sequence

of words, and the label is the word appearing at the next time step in the sequence.

For our example, we will train a character-based RNN on the text of the children's

stories "Alice in Wonderland" and its sequel "Through the Looking Glass" by

Lewis Carroll. We have chosen to build a character-based model because it has a

smaller vocabulary and trains quicker. The idea is the same as training and using a

word-based language model, except we will use characters instead of words. Once

trained, the model can be used to generate some text in the same style.

The data for our example will come from the plain texts of two novels from the

Project Gutenberg website [36]. Input to the network are sequences of 100 characters,

and the corresponding output is another sequence of 100 characters, offset from the

input by 1 position.

[ 292 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!