pdfcoffee

soumyasankar99
from soumyasankar99 More from this publisher
09.05.2023 Views

Chapter 15Let's see in detail how the forward and backward steps are realized. It might beuseful to have a look back at Figure 5 and recall that a small change in a specific w ijwill be propagated through the network following its topology (see Figure 5, wherethe edges in bold are the ones impacted by the small change in a specific weight).Forward stepDuring the forward steps the inputs are multiplied with the weights and then theyall are summed together. Then the activation function is applied (see Figure 12).This step is repeated for each layer, one after another. The first layer takes the inputfeatures as input and it produces its output. Then, each subsequent layer takes asinput the output of the previous layer:Figure 12: Forward propagationIf we look at one single layer, mathematically we have two equations:1. The transfer equation: zz = ∑ ww ii xx ii + bb, where x iare the input values, w iare theiiweights, and b is the bias. In vector notation z = W TX. Note that b can beabsorbed in the summatory by setting ww 0 = bb and x 0= 1.2. The activation function: yy = σσ(zz) , where σσ is the chosen activation function.An artificial neural network consists of an input layer I, an output layer O and anynumber of hidden layers H isituated between the input and the output layers. Forthe sake of simplicity let's assume that there is only one hidden layer, since theresults can be easily generalized.As shown in Figure 12, The features x ifrom the input layer are multiplied by a setof fully-connected weights w ijconnecting the input layer to the hidden layer (see theleft side of Figure 12). The weighted signals are summed together with the bias tocalculate the result zz jj = ∑ ww ii xx ii + bb jj (see the center of Figure 12).ii[ 553 ]

The Math Behind Deep LearningThe result is passed through the activation function yy jj = σσ jj (zz jj ) which leaves thehidden layer to the output layer (see the right side of Figure 12).Summarizing, during the forward step we need to run the following operations:• For each neuron in a layer, multiply each input by its corresponding weight.• Then for each neuron in the layer, sum all input x weights together.• Finally, for each neuron, apply the activation function on the result tocompute the new output.At the end of the forward step, we get a predicted vector y ofrom the output layero given the input vector x presented at the input layer. Now the question is: howclose is the predicted vector y oto the true value vector t?That's where the backstep comes in.BackstepFor understanding how close the predicted vector yo is to the true value vector t weneed a function that measures the error at the output layer o. That is the loss functiondefined earlier in the book. There are many choices for the loss function. For instance,we can define the Mean Squared Error (MSE) as follows:EE = 1 2 ∑(yy oo − tt oo ) 2Note that E is a quadratic function and, therefore, the difference is quadraticallylarger when t is far away from y o, and the sign is not important. Note that thisquadratic error (loss) function is not the only one that we can use. Later in thischapter we will see how to deal with cross-entropy.ooNow, remember that the key point is that during the training we want to adjust theweights of the network in order to minimize the final error. As discussed, we canmove towards a local minimum by moving in the opposite direction to the gradient−∇w . Moving in the opposite direction to the gradient is the reason why thisalgorithm is called gradient descent. Therefore, it is reasonable to define the equationfor updating the weight w ijas follows:ww iiii ← ww iiii − ∇ww iiii[ 554 ]

Chapter 15

Let's see in detail how the forward and backward steps are realized. It might be

useful to have a look back at Figure 5 and recall that a small change in a specific w ij

will be propagated through the network following its topology (see Figure 5, where

the edges in bold are the ones impacted by the small change in a specific weight).

Forward step

During the forward steps the inputs are multiplied with the weights and then they

all are summed together. Then the activation function is applied (see Figure 12).

This step is repeated for each layer, one after another. The first layer takes the input

features as input and it produces its output. Then, each subsequent layer takes as

input the output of the previous layer:

Figure 12: Forward propagation

If we look at one single layer, mathematically we have two equations:

1. The transfer equation: zz = ∑ ww ii xx ii + bb, where x i

are the input values, w i

are the

ii

weights, and b is the bias. In vector notation z = W T

X. Note that b can be

absorbed in the summatory by setting ww 0 = bb and x 0

= 1.

2. The activation function: yy = σσ(zz) , where σσ is the chosen activation function.

An artificial neural network consists of an input layer I, an output layer O and any

number of hidden layers H i

situated between the input and the output layers. For

the sake of simplicity let's assume that there is only one hidden layer, since the

results can be easily generalized.

As shown in Figure 12, The features x i

from the input layer are multiplied by a set

of fully-connected weights w ij

connecting the input layer to the hidden layer (see the

left side of Figure 12). The weighted signals are summed together with the bias to

calculate the result zz jj = ∑ ww ii xx ii + bb jj (see the center of Figure 12).

ii

[ 553 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!