09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Recurrent Neural Networks

This problem manifests as the gradients of the loss approaching either zero or

infinity, making the network hard to train. To understand why this happens,

consider the equation of the SimpleRNN we saw earlier; the hidden state h t

is

dependent on h t-1

, which in turn is dependent on h t-2

, and so on:

∂∂∂∂

∂∂∂∂ = ∑ ∂∂LL tt

∂∂∂∂

tt

Let us now see what happens to this gradient at timestep t=3. By the chain rule,

the gradient of the loss with respect to W can be decomposed to a product of

three sub-gradients. The gradient of the hidden state h 2

with respect to W can be

further decomposed as the sum of the gradient of each hidden state with respect

to the previous one. Finally, each gradient of the hidden state with respect to the

previous one can be further decomposed as the product of gradients of the current

hidden state against the previous hidden state:

3

∂∂LL 3

∂∂WW = ∂∂LL 3 ∂∂yŷ3 ∂∂h 3

∂∂yŷ3 ∂∂h 3 ∂∂WW

3

= ∑ ∂∂LL 3 ∂∂yŷ3 ∂∂h 3 ∂∂h tt

∂∂yŷ3 ∂∂h 3 ∂∂h tt ∂∂WW

tt=0

= ∑ ∂∂LL 3 ∂∂yŷ3

( ∏

∂∂yŷ3 ∂∂h 3

tt=0

3

∂∂h jj

∂∂h jj−1

jj=tt+1

) ∂∂h tt

∂∂WW

Similar calculations are done to compute the gradient of the other losses L 0

through L 4

with respect to W, and sum them up into the gradient update for W.

We will not explore the math further in this book, but this WildML blog post [12]

has a very good explanation of BPTT, including a more detailed derivation of the

math behind the process.

Vanishing and exploding gradients

The reason BPTT is particularly sensitive to the problem of vanishing and exploding

gradients comes from the product part of the expression representing the final

formulation of the gradient of the loss with respect to W. Consider the case where

the individual gradients of a hidden state with respect to the previous one is less

than 1.

[ 284 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!