09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 8

However, an understanding of the structure and equations is helpful for when you

need to build your own specialized RNN cell to overcome a specific problem.

Now that we understand the flow of data forward through the RNN cell, that is,

how it combines its input and hidden states to produce the output and the next

hidden state, let us now examine the flow of gradients in the reverse direction.

This is a process called Backpropagation through time, or BPTT.

Backpropagation through time (BPTT)

Just like traditional neural networks, training RNNs also involves backpropagation

of gradients. The difference in this case is that since the weights are shared by all

time steps, the gradient at each output depends not only on the current time step,

but also on the previous ones. This process is called backpropagation through time

[11]. Because the weights U, V, and W, are shared across the different time steps in

case of RNNs, we need to sum up the gradients across the various time steps in case

of BPTT. This is the key difference between traditional backpropagation and BPTT.

Consider the RNN with five time steps shown in Figure 2. During the forward pass,

the network produces predictions ŷ t

at time t that are compared with the label y t

to

compute a loss L t

. During backpropagation (shown by the dotted lines), the gradients

of the loss with respect to the weights U, V, and W, are computed at each time step

and the parameters updated with the sum of the gradients:

Figure 2: Backpropagation through time

The following equation shows the gradient of the loss with respect to W. We focus on

this weight because it is the cause for the phenomenon known as the vanishing and

exploding gradient problem.

[ 283 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!