
You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The Math Behind Deep Learning

Suppose xx ∈ R mm (for example, the space of real numbers with m dimensions) and f

maps from R nn to R ; the gradient is defined as follows:

∇(ff) = ( ∂∂∂∂ ∂∂∂∂

, … , )

∂∂xx 1 ∂∂xx mm

In math, a partial derivative ∂∂∂∂

∂∂xx ii

of a function of several variables is its derivative

with respect to one of those variables, with the others held constant.

Note that it is possible to show that the gradient is a vector (a direction to move) that:

• Points in the direction of greatest increase of a function.

• Is 0 at a local maximum or local minimum. This is because if it is 0, it cannot

increase or decrease further.

The proof is left as an exercise to the interested reader. (Hint: consider Figure

2 and 3.)

Gradient descent

If the gradient points in the direction of greatest increase for a function, then it is

possible to move towards a local minimum for the function by simply moving in

a direction opposite to the gradient. That's the key observation used for gradient

descent algorithms, which will be used shortly. An example is provided in Figure 4:

Figure 4: Gradient descent for a function in three variables

[ 546 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!