pdfcoffee

soumyasankar99
from soumyasankar99 More from this publisher
09.05.2023 Views

Chapter 15Thinking about backpropagation andconvnetsIn this section we want to give an intuition behind backpropagation and convnets.For the sake of simplicity we will focus on an example of convolution with input X ofsize 3 × 3, one single filter W of size 2 × 2 with no padding, stride 1, and no dilation(see Chapter 5, Advanced Convolutional Neural Networks). The generalization is left asan exercise.The standard convolution operation is represented in Figure 15. Simply put, theconvolutional operation is the forward pass:Figure 15: Forward pass for our convnet toy exampleFollowing the intuition of Figure 15, we can now focus our attention to the backwardpass for the current layer. The key assumption is that we receive a backpropagatedsignal ∂∂∂∂∂∂h iiiias input, and we need to compute ∂∂∂∂∂∂ww iiiiand ∂∂∂∂∂∂xx iiii. This computation is left asan exercise but please note that each weight in the filter contributes to each pixel inthe output map or, in other words, any change in a weight of a filter affects all theoutput pixels.Thinking about backpropagation andRNNsAs you remember from Chapter 8, Recurrent Neural Networks, the basic equation foran RNN is ss tt = tanh⁡(UU xxxx + WW ssss−1 ), the final prediction is yŷ tt = ssssssssssssss(VVss tt ) at step t,the correct value is y t, and the error E is the cross-entropy. Here U, V, W are learningparameters used for the RNNs' equations. These equations can be visualized as inFigure 16 where we unroll the recurrency. The core idea is that total error is just thesum of the errors at each time step.[ 565 ]

The Math Behind Deep LearningIf we used SGD, we need to sum the errors and the gradients at each timestep for onegiven training example:Figure 16: Recurrent neural network unrolled with equationsWe are not going to write all the tedious math behind all the gradients, but ratherfocus only on a few peculiar cases. For instance, with math computations similar tothe one made in the previous chapters, it can be proven by using the chain rule thatthe gradient for V depends only on the value at the current timestep s 3, y 3and yŷ 3 :∂∂EE 3∂∂∂∂ = ∂∂EE 3∂∂yŷ3∂∂yŷ3∂∂∂∂ = ∂∂EE 3∂∂yy 3∂∂yŷ3̂∂∂zz 3∂∂zz 3∂∂∂∂ = (yy 3̂ − yy 3 )ss 3However, ∂∂EE 3has dependencies carried across timesteps because for instance∂∂∂∂ss 3 = tanh⁡(UU xxtt + WW ss2 ) depends on s 2which depends on W 2and s 1. As a consequence,the gradient is a bit more complicated because we need to sum up the contributionsof each time step:3∂∂EE 3∂∂∂∂ = ∑ ∂∂EE 3 ∂∂yŷ3 ∂∂ss 3 ∂∂ss kk∂∂yŷ3 ∂∂ss 3 ∂∂ss kk ∂∂∂∂kk=0In order to understand the preceding equation, you can think that we are usingthe standard backpropagation algorithm used for traditional feed-forward neuralnetworks, but for RNNs we need to additionally add the gradients of W acrosstimesteps. That's because we can effectively make the dependencies across timeexplicit by unrolling the RNN. This is the reason why backpropagation for RNNsis frequently called Backpropagation Through Time (BTT). The intuition is shownin Figure 17 where the backpropagated signals are represented:[ 566 ]

Chapter 15

Thinking about backpropagation and

convnets

In this section we want to give an intuition behind backpropagation and convnets.

For the sake of simplicity we will focus on an example of convolution with input X of

size 3 × 3, one single filter W of size 2 × 2 with no padding, stride 1, and no dilation

(see Chapter 5, Advanced Convolutional Neural Networks). The generalization is left as

an exercise.

The standard convolution operation is represented in Figure 15. Simply put, the

convolutional operation is the forward pass:

Figure 15: Forward pass for our convnet toy example

Following the intuition of Figure 15, we can now focus our attention to the backward

pass for the current layer. The key assumption is that we receive a backpropagated

signal ∂∂∂∂

∂∂h iiii

as input, and we need to compute ∂∂∂∂

∂∂ww iiii

and ∂∂∂∂

∂∂xx iiii

. This computation is left as

an exercise but please note that each weight in the filter contributes to each pixel in

the output map or, in other words, any change in a weight of a filter affects all the

output pixels.

Thinking about backpropagation and

RNNs

As you remember from Chapter 8, Recurrent Neural Networks, the basic equation for

an RNN is ss tt = tanh⁡(UU xxxx + WW ssss−1 ), the final prediction is yŷ tt = ssssssssssssss(VVss tt ) at step t,

the correct value is y t

, and the error E is the cross-entropy. Here U, V, W are learning

parameters used for the RNNs' equations. These equations can be visualized as in

Figure 16 where we unroll the recurrency. The core idea is that total error is just the

sum of the errors at each time step.

[ 565 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!