09.05.2023 Views

pdfcoffee

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 15

For a function in multiple variables, the gradient is computed using partial

derivatives. We introduce the hyperparameter ηη – or, in ML lingo, the learning rate

– to account for how large a step should be taken in the direction opposite to the

gradient.

Considering the error, E, we have the equation:

∇ww = −ηη ∂∂∂∂

∂∂ww iiii

The preceding equation is simply capturing the fact that a slight change will impact

the final error, as seen in Figure 13:

Figure 13: A small change in w ij

will impact the final error, E

Let's define the notation used throughout our equations in the remaining section:

• z j

is the input to node j for layer l

• δδ jj is the activation function for node j in layer l (applied to z j

)

• yy jj = δδ jj (zz jj ) is the output of the activation of node j in layer l

• w ij

is the matrix of weights connecting the neuron i in layer l – 1 to the neuron

j in layer l

• b j

is the bias for unit j in layer l

• t o

is the target value for node o in the output layer

Now we need to compute the partial derivative for the error at the output layer ∂∂∂∂

when the weights change by ∂∂ww iiii . There are two different cases:

Case 1: Weight update equation for a neuron from hidden (or input) layer to output

layer.

[ 555 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!