pdfcoffee
Chapter 15Thinking about backpropagation andconvnetsIn this section we want to give an intuition behind backpropagation and convnets.For the sake of simplicity we will focus on an example of convolution with input X ofsize 3 × 3, one single filter W of size 2 × 2 with no padding, stride 1, and no dilation(see Chapter 5, Advanced Convolutional Neural Networks). The generalization is left asan exercise.The standard convolution operation is represented in Figure 15. Simply put, theconvolutional operation is the forward pass:Figure 15: Forward pass for our convnet toy exampleFollowing the intuition of Figure 15, we can now focus our attention to the backwardpass for the current layer. The key assumption is that we receive a backpropagatedsignal ∂∂∂∂∂∂h iiiias input, and we need to compute ∂∂∂∂∂∂ww iiiiand ∂∂∂∂∂∂xx iiii. This computation is left asan exercise but please note that each weight in the filter contributes to each pixel inthe output map or, in other words, any change in a weight of a filter affects all theoutput pixels.Thinking about backpropagation andRNNsAs you remember from Chapter 8, Recurrent Neural Networks, the basic equation foran RNN is ss tt = tanh(UU xxxx + WW ssss−1 ), the final prediction is yŷ tt = ssssssssssssss(VVss tt ) at step t,the correct value is y t, and the error E is the cross-entropy. Here U, V, W are learningparameters used for the RNNs' equations. These equations can be visualized as inFigure 16 where we unroll the recurrency. The core idea is that total error is just thesum of the errors at each time step.[ 565 ]
The Math Behind Deep LearningIf we used SGD, we need to sum the errors and the gradients at each timestep for onegiven training example:Figure 16: Recurrent neural network unrolled with equationsWe are not going to write all the tedious math behind all the gradients, but ratherfocus only on a few peculiar cases. For instance, with math computations similar tothe one made in the previous chapters, it can be proven by using the chain rule thatthe gradient for V depends only on the value at the current timestep s 3, y 3and yŷ 3 :∂∂EE 3∂∂∂∂ = ∂∂EE 3∂∂yŷ3∂∂yŷ3∂∂∂∂ = ∂∂EE 3∂∂yy 3∂∂yŷ3̂∂∂zz 3∂∂zz 3∂∂∂∂ = (yy 3̂ − yy 3 )ss 3However, ∂∂EE 3has dependencies carried across timesteps because for instance∂∂∂∂ss 3 = tanh(UU xxtt + WW ss2 ) depends on s 2which depends on W 2and s 1. As a consequence,the gradient is a bit more complicated because we need to sum up the contributionsof each time step:3∂∂EE 3∂∂∂∂ = ∑ ∂∂EE 3 ∂∂yŷ3 ∂∂ss 3 ∂∂ss kk∂∂yŷ3 ∂∂ss 3 ∂∂ss kk ∂∂∂∂kk=0In order to understand the preceding equation, you can think that we are usingthe standard backpropagation algorithm used for traditional feed-forward neuralnetworks, but for RNNs we need to additionally add the gradients of W acrosstimesteps. That's because we can effectively make the dependencies across timeexplicit by unrolling the RNN. This is the reason why backpropagation for RNNsis frequently called Backpropagation Through Time (BTT). The intuition is shownin Figure 17 where the backpropagated signals are represented:[ 566 ]
- Page 549 and 550: An introduction to AutoMLLet's star
- Page 551 and 552: An introduction to AutoMLThe token
- Page 553 and 554: An introduction to AutoMLThis will
- Page 555 and 556: An introduction to AutoMLFigure 37:
- Page 557 and 558: An introduction to AutoMLAt the end
- Page 559 and 560: An introduction to AutoMLUsing Clou
- Page 561 and 562: An introduction to AutoMLOnce the d
- Page 563 and 564: An introduction to AutoMLAt the end
- Page 565 and 566: An introduction to AutoMLAs the nex
- Page 567 and 568: An introduction to AutoMLOnce the m
- Page 569 and 570: An introduction to AutoMLFigure 65:
- Page 571 and 572: An introduction to AutoMLOnce the m
- Page 573 and 574: An introduction to AutoMLWe can als
- Page 575 and 576: An introduction to AutoMLThe most e
- Page 577 and 578: An introduction to AutoMLReferences
- Page 579 and 580: The Math Behind Deep LearningSome m
- Page 581 and 582: The Math Behind Deep LearningSuppos
- Page 583 and 584: The Math Behind Deep LearningNote t
- Page 585 and 586: The Math Behind Deep LearningTheref
- Page 587 and 588: The Math Behind Deep LearningThe ea
- Page 589 and 590: The Math Behind Deep LearningThe re
- Page 591 and 592: The Math Behind Deep LearningCase 2
- Page 593 and 594: The Math Behind Deep LearningIn thi
- Page 595 and 596: The Math Behind Deep LearningHere,
- Page 597 and 598: The Math Behind Deep Learning(Note
- Page 599: The Math Behind Deep LearningIn man
- Page 603 and 604: The Math Behind Deep LearningChapte
- Page 605 and 606: The Math Behind Deep LearningThis c
- Page 607 and 608: Tensor Processing UnitMany people b
- Page 609 and 610: Tensor Processing UnitThe sequentia
- Page 611 and 612: Tensor Processing UnitIf you want t
- Page 613 and 614: Tensor Processing UnitOn the other
- Page 615 and 616: Tensor Processing UnitHow to use TP
- Page 617 and 618: Tensor Processing UnitNote that ful
- Page 619 and 620: Tensor Processing UnitEpoch 10/1060
- Page 621 and 622: Tensor Processing UnitFigure 11: Go
- Page 623 and 624: Tensor Processing UnitThen the usag
- Page 626 and 627: Other Books YouMay EnjoyIf you enjo
- Page 628 and 629: Other Books You May EnjoyAI Crash C
- Page 630: Other Books You May EnjoyLeave a re
- Page 633 and 634: AutoML pipelinedata preparation 493
- Page 635 and 636: Deep Deterministic Policy Gradient(
- Page 637 and 638: Google cloud consolereference link
- Page 639 and 640: used, for building GAN 193-198MNIST
- Page 641 and 642: regularizersreference link 38reinfo
- Page 643 and 644: TensorFlow Lite 81TensorFlow Core r
- Page 645: Xxception networks 160, 162YYOLO ne
Chapter 15
Thinking about backpropagation and
convnets
In this section we want to give an intuition behind backpropagation and convnets.
For the sake of simplicity we will focus on an example of convolution with input X of
size 3 × 3, one single filter W of size 2 × 2 with no padding, stride 1, and no dilation
(see Chapter 5, Advanced Convolutional Neural Networks). The generalization is left as
an exercise.
The standard convolution operation is represented in Figure 15. Simply put, the
convolutional operation is the forward pass:
Figure 15: Forward pass for our convnet toy example
Following the intuition of Figure 15, we can now focus our attention to the backward
pass for the current layer. The key assumption is that we receive a backpropagated
signal ∂∂∂∂
∂∂h iiii
as input, and we need to compute ∂∂∂∂
∂∂ww iiii
and ∂∂∂∂
∂∂xx iiii
. This computation is left as
an exercise but please note that each weight in the filter contributes to each pixel in
the output map or, in other words, any change in a weight of a filter affects all the
output pixels.
Thinking about backpropagation and
RNNs
As you remember from Chapter 8, Recurrent Neural Networks, the basic equation for
an RNN is ss tt = tanh(UU xxxx + WW ssss−1 ), the final prediction is yŷ tt = ssssssssssssss(VVss tt ) at step t,
the correct value is y t
, and the error E is the cross-entropy. Here U, V, W are learning
parameters used for the RNNs' equations. These equations can be visualized as in
Figure 16 where we unroll the recurrency. The core idea is that total error is just the
sum of the errors at each time step.
[ 565 ]