09.05.2023 Views

pdfcoffee

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Recurrent Neural Networks

We will then look at some standard RNN topologies and the kind of applications

they can be used to solve. RNNs can be adapted to different types of applications

by rearranging the cells in the graph. We will see some examples of these

configurations and how they are used to solve specific problems. We will also

consider the sequence to sequence (or seq2seq) architecture, which has been used

with great success in machine translation and various other fields. We will then

look at what an attention mechanism is, and how it can be used to improve the

performance of sequence to sequence architectures.

Finally, we will look at the transformer architecture, which combines ideas from

CNNs, RNNs, and attention mechanisms. Transformer architecture has been used

to create novel architectures such as BERT.

The basic RNN cell

Traditional multilayer perceptron neural networks make the assumption that all

inputs are independent of each other. This assumption is not true for many types

of sequence data. For example, words in a sentence, musical notes in a composition,

stock prices over time, or even molecules in a compound, are examples of

sequences where an element will display a dependence on previous elements.

RNN cells incorporate this dependence by having a hidden state, or memory, that

holds the essence of what has been seen so far. The value of the hidden state at any

point in time is a function of the value of the hidden state at the previous time step,

and the value of the input at the current time step, that is:

h tt = φφ(h tt−1 , XX tt )

Here, h t

and h t-1

are the values of the hidden states at the time t and t-1 respectively,

and x t

is the value of the input at time t. Notice that the equation is recursive, that

is, h t-1

can be represented in terms of h t-2

and x t-1

, and so on, until the beginning

of the sequence. This is how RNNs encode and incorporate information from

arbitrarily long sequences.

We can also represent the RNN cell graphically as shown in Figure 1(a). At time t,

the cell has an input x(t) and output y(t). Part of the output y(t) (represented by the

hidden state h t

) is fed back into the cell for use at a later time step t+1.

[ 280 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!