09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The Math Behind Deep Learning

Case 2: weight update equation for a neuron from hidden (or input) layer to hidden

layer.

We'll begin with case 1.

Case 1 – From hidden layer to output layer

In this case, we need to consider the equation for a neuron from hidden layer j to

output layer o. Applying the definition of E and differentiating we have:

∂∂∂∂

= ∂∂ 1 2 ∑ (yy oo − tt oo ) 2

= (yy

∂∂ww jjjj

oo∂∂ww oo − tt oo ) ∂∂(yy oo − tt oo )

jjjj ∂∂ww jjjj

Here the summation disappears because when we take the partial derivative

with respect to the j-th dimension, the only term not zero in the error is the j-th.

Considering that differentiation is a linear operation and that ∂∂tt 0

∂∂ww jjjj

= 0 – because the

true t 0

value does not depend on w jo

– we have:

∂∂(yy oo − tt oo )

∂∂ww jjjj

= ∂∂yy oo

∂∂ww jjjj

− 0

Applying the chain rule again and remembering that yy oo = δδ oo (zz oo ) , we have:

∂∂∂∂

= (yy

∂∂ww oo − tt oo ) ∂∂yy oo

= (yy

jjjj ∂∂ww oo − tt oo ) ∂∂δδ oo(zz oo )

= (yy

jjjj ∂∂ww oo − tt oo )δδ′ oo (zz oo ) ∂∂zz oo

jjjj

∂∂∂∂ jjjj

Remembering that zz oo = ∑ ww jjjj δδ jj (zz jj ) + bb oo we have: that

jj

∂∂zz oo

∂∂ww jjjj

= δδ jj (zz jj )

Again because when we take the partial derivative with respect to the j-th

dimension, the only term not zero in the error is the j-th. By definition δδ jj (zz jj ) = yy jj ,

so putting everything together we have:

∂∂∂∂

∂∂ww jjjj

= (yy oo − tt oo )δδ′ oo (zz oo )yy jj

[ 556 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!