pdfcoffee
Chapter 15Let's see in detail how the forward and backward steps are realized. It might beuseful to have a look back at Figure 5 and recall that a small change in a specific w ijwill be propagated through the network following its topology (see Figure 5, wherethe edges in bold are the ones impacted by the small change in a specific weight).Forward stepDuring the forward steps the inputs are multiplied with the weights and then theyall are summed together. Then the activation function is applied (see Figure 12).This step is repeated for each layer, one after another. The first layer takes the inputfeatures as input and it produces its output. Then, each subsequent layer takes asinput the output of the previous layer:Figure 12: Forward propagationIf we look at one single layer, mathematically we have two equations:1. The transfer equation: zz = ∑ ww ii xx ii + bb, where x iare the input values, w iare theiiweights, and b is the bias. In vector notation z = W TX. Note that b can beabsorbed in the summatory by setting ww 0 = bb and x 0= 1.2. The activation function: yy = σσ(zz) , where σσ is the chosen activation function.An artificial neural network consists of an input layer I, an output layer O and anynumber of hidden layers H isituated between the input and the output layers. Forthe sake of simplicity let's assume that there is only one hidden layer, since theresults can be easily generalized.As shown in Figure 12, The features x ifrom the input layer are multiplied by a setof fully-connected weights w ijconnecting the input layer to the hidden layer (see theleft side of Figure 12). The weighted signals are summed together with the bias tocalculate the result zz jj = ∑ ww ii xx ii + bb jj (see the center of Figure 12).ii[ 553 ]
The Math Behind Deep LearningThe result is passed through the activation function yy jj = σσ jj (zz jj ) which leaves thehidden layer to the output layer (see the right side of Figure 12).Summarizing, during the forward step we need to run the following operations:• For each neuron in a layer, multiply each input by its corresponding weight.• Then for each neuron in the layer, sum all input x weights together.• Finally, for each neuron, apply the activation function on the result tocompute the new output.At the end of the forward step, we get a predicted vector y ofrom the output layero given the input vector x presented at the input layer. Now the question is: howclose is the predicted vector y oto the true value vector t?That's where the backstep comes in.BackstepFor understanding how close the predicted vector yo is to the true value vector t weneed a function that measures the error at the output layer o. That is the loss functiondefined earlier in the book. There are many choices for the loss function. For instance,we can define the Mean Squared Error (MSE) as follows:EE = 1 2 ∑(yy oo − tt oo ) 2Note that E is a quadratic function and, therefore, the difference is quadraticallylarger when t is far away from y o, and the sign is not important. Note that thisquadratic error (loss) function is not the only one that we can use. Later in thischapter we will see how to deal with cross-entropy.ooNow, remember that the key point is that during the training we want to adjust theweights of the network in order to minimize the final error. As discussed, we canmove towards a local minimum by moving in the opposite direction to the gradient−∇w . Moving in the opposite direction to the gradient is the reason why thisalgorithm is called gradient descent. Therefore, it is reasonable to define the equationfor updating the weight w ijas follows:ww iiii ← ww iiii − ∇ww iiii[ 554 ]
- Page 537 and 538: An introduction to AutoMLOnce the d
- Page 539 and 540: An introduction to AutoMLIf your mo
- Page 541 and 542: An introduction to AutoMLClicking o
- Page 543 and 544: An introduction to AutoMLFigure 16:
- Page 545 and 546: An introduction to AutoMLYou can al
- Page 547 and 548: An introduction to AutoMLPut simply
- Page 549 and 550: An introduction to AutoMLLet's star
- Page 551 and 552: An introduction to AutoMLThe token
- Page 553 and 554: An introduction to AutoMLThis will
- Page 555 and 556: An introduction to AutoMLFigure 37:
- Page 557 and 558: An introduction to AutoMLAt the end
- Page 559 and 560: An introduction to AutoMLUsing Clou
- Page 561 and 562: An introduction to AutoMLOnce the d
- Page 563 and 564: An introduction to AutoMLAt the end
- Page 565 and 566: An introduction to AutoMLAs the nex
- Page 567 and 568: An introduction to AutoMLOnce the m
- Page 569 and 570: An introduction to AutoMLFigure 65:
- Page 571 and 572: An introduction to AutoMLOnce the m
- Page 573 and 574: An introduction to AutoMLWe can als
- Page 575 and 576: An introduction to AutoMLThe most e
- Page 577 and 578: An introduction to AutoMLReferences
- Page 579 and 580: The Math Behind Deep LearningSome m
- Page 581 and 582: The Math Behind Deep LearningSuppos
- Page 583 and 584: The Math Behind Deep LearningNote t
- Page 585 and 586: The Math Behind Deep LearningTheref
- Page 587: The Math Behind Deep LearningThe ea
- Page 591 and 592: The Math Behind Deep LearningCase 2
- Page 593 and 594: The Math Behind Deep LearningIn thi
- Page 595 and 596: The Math Behind Deep LearningHere,
- Page 597 and 598: The Math Behind Deep Learning(Note
- Page 599 and 600: The Math Behind Deep LearningIn man
- Page 601 and 602: The Math Behind Deep LearningIf we
- Page 603 and 604: The Math Behind Deep LearningChapte
- Page 605 and 606: The Math Behind Deep LearningThis c
- Page 607 and 608: Tensor Processing UnitMany people b
- Page 609 and 610: Tensor Processing UnitThe sequentia
- Page 611 and 612: Tensor Processing UnitIf you want t
- Page 613 and 614: Tensor Processing UnitOn the other
- Page 615 and 616: Tensor Processing UnitHow to use TP
- Page 617 and 618: Tensor Processing UnitNote that ful
- Page 619 and 620: Tensor Processing UnitEpoch 10/1060
- Page 621 and 622: Tensor Processing UnitFigure 11: Go
- Page 623 and 624: Tensor Processing UnitThen the usag
- Page 626 and 627: Other Books YouMay EnjoyIf you enjo
- Page 628 and 629: Other Books You May EnjoyAI Crash C
- Page 630: Other Books You May EnjoyLeave a re
- Page 633 and 634: AutoML pipelinedata preparation 493
- Page 635 and 636: Deep Deterministic Policy Gradient(
- Page 637 and 638: Google cloud consolereference link
Chapter 15
Let's see in detail how the forward and backward steps are realized. It might be
useful to have a look back at Figure 5 and recall that a small change in a specific w ij
will be propagated through the network following its topology (see Figure 5, where
the edges in bold are the ones impacted by the small change in a specific weight).
Forward step
During the forward steps the inputs are multiplied with the weights and then they
all are summed together. Then the activation function is applied (see Figure 12).
This step is repeated for each layer, one after another. The first layer takes the input
features as input and it produces its output. Then, each subsequent layer takes as
input the output of the previous layer:
Figure 12: Forward propagation
If we look at one single layer, mathematically we have two equations:
1. The transfer equation: zz = ∑ ww ii xx ii + bb, where x i
are the input values, w i
are the
ii
weights, and b is the bias. In vector notation z = W T
X. Note that b can be
absorbed in the summatory by setting ww 0 = bb and x 0
= 1.
2. The activation function: yy = σσ(zz) , where σσ is the chosen activation function.
An artificial neural network consists of an input layer I, an output layer O and any
number of hidden layers H i
situated between the input and the output layers. For
the sake of simplicity let's assume that there is only one hidden layer, since the
results can be easily generalized.
As shown in Figure 12, The features x i
from the input layer are multiplied by a set
of fully-connected weights w ij
connecting the input layer to the hidden layer (see the
left side of Figure 12). The weighted signals are summed together with the bias to
calculate the result zz jj = ∑ ww ii xx ii + bb jj (see the center of Figure 12).
ii
[ 553 ]