09.05.2023 Views

pdfcoffee

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 15

The gradient of the error E with respect to the weights w j

from the hidden layer j

to the output layer o is therefore simply the product of three terms: the difference

between the prediction y o

and the true value to, the derivative δδ ′ oo (zz oo ) of the output

layer activation function, and the activation output y j

of node j in the hidden layer.

For simplicity we can also define vv oo = (yy oo − tt oo )δδ ′ oo (zz oo ) and get:

∂∂∂∂

∂∂∂∂ jjjj

= vv oo yy jj

In short, for case 1 the weight update equation for each of the hidden-output

connections is:

ww jjjj ← ww jjjj − ηη ∂∂∂∂

∂∂∂∂ jjjj

If we want to explicitly compute the gradient with respect to the

output layer biases, the steps to follow are similar to the one above

with only one difference.

So in this case ∂∂∂∂

∂∂bb 0

= vv oo .

Next, we'll look at case 2.

∂∂∂∂ oo

∂∂∂∂ oo

= ∂∂ ∑ ww jjjjδδ jj (zz jj ) + bb oo

jj

∂∂∂∂ oo

= 1

Case 2 ‒ From hidden layer to hidden layer

In this case, we need to consider the equation for a neuron from a hidden layer

(or the input layer) to a hidden layer. Figure 13 showed that there is an indirect

relationship between the hidden layer weight change and the output error. This

makes the computation of the gradient a bit more challenging. In this case, we

need to consider the equation for a neuron from hidden layer i to hidden layer j.

Applying the definition of E and differentiating we have:

∂∂∂∂

= ∂∂ 1 2 ∑ (yy oo − tt oo ) 2

= ∑(yy

∂∂∂∂ iiii

oo∂∂ww oo − tt oo ) ∂∂(yy oo − tt oo )

iiii

oo

∂∂ww iiii

= ∑(yy oo − tt oo ) ∂∂∂∂ oo

∂∂∂∂ iiii

oo

[ 557 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!