09.05.2023 Views

pdfcoffee

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 15

Chain rule

The chain rule says that if we have a function y = g(x) and z = f(g(x)) = f(y) then the

derivative is defined as follows:

dddd

dddd = dddd dddd

dddd dddd

This can be generalized beyond the scalar case. Suppose xx ∈ R mm and yy ∈ R nn with g

which maps from R mm to R nn , and f, which maps from R nn to R , and with y = g(x) and z

= f(y); then we have:

∂∂∂∂

= ∑ ∂∂∂∂ ∂∂yy jj

∂∂xx ii ∂∂yy jj ∂∂xx ii

jj

The generalized chain rule using partial derivatives will be used as a basic tool for

the backpropagation algorithm when dealing with functions in multiple variables.

Stop for a second and make sure that you fully understand it.

A few differentiation rules

It might be useful to remind ourselves of a few additional differentiation rules that

will be used later:

• Constant differentiation: c' = 0, with c constant

• Differentiation variable: ∂∂∂∂

∂∂∂∂ zz = 1 , when deriving the differentiation variable

• Linear differentiation: [aaaa(xx) + bbbb(xx)] = aaff ′ (xx) + bbgg ′ (xx)

• Reciprocal differentiation: [ 1

ff(xx) ] ′

= ff′ (xx)

ff(xx) 2

• Exponential differentiation: [ff(xx) nn ] ′ = nn ∗ ff(xx) nn−1

Matrix operations

There are many books about matrix calculus. Here we focus only on only a few basic

operations used for neural networks. Let us recall that a matrix m × n can be used to

represent the weights w ij

with 0 ≤ ii ≤ mm , 0 ≤ jj ≤ nn associated with the arcs between

two adjacent layers.

[ 547 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!