09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The Math Behind Deep Learning

The result is passed through the activation function yy jj = σσ jj (zz jj ) which leaves the

hidden layer to the output layer (see the right side of Figure 12).

Summarizing, during the forward step we need to run the following operations:

• For each neuron in a layer, multiply each input by its corresponding weight.

• Then for each neuron in the layer, sum all input x weights together.

• Finally, for each neuron, apply the activation function on the result to

compute the new output.

At the end of the forward step, we get a predicted vector y o

from the output layer

o given the input vector x presented at the input layer. Now the question is: how

close is the predicted vector y o

to the true value vector t?

That's where the backstep comes in.

Backstep

For understanding how close the predicted vector yo is to the true value vector t we

need a function that measures the error at the output layer o. That is the loss function

defined earlier in the book. There are many choices for the loss function. For instance,

we can define the Mean Squared Error (MSE) as follows:

EE = 1 2 ∑(yy oo − tt oo ) 2

Note that E is a quadratic function and, therefore, the difference is quadratically

larger when t is far away from y o

, and the sign is not important. Note that this

quadratic error (loss) function is not the only one that we can use. Later in this

chapter we will see how to deal with cross-entropy.

oo

Now, remember that the key point is that during the training we want to adjust the

weights of the network in order to minimize the final error. As discussed, we can

move towards a local minimum by moving in the opposite direction to the gradient

−∇w . Moving in the opposite direction to the gradient is the reason why this

algorithm is called gradient descent. Therefore, it is reasonable to define the equation

for updating the weight w ij

as follows:

ww iiii ← ww iiii − ∇ww iiii

[ 554 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!