09.05.2023 Views

pdfcoffee

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The Math Behind Deep Learning

Suppose xx ∈ R mm (for example, the space of real numbers with m dimensions) and f

maps from R nn to R ; the gradient is defined as follows:

∇(ff) = ( ∂∂∂∂ ∂∂∂∂

, … , )

∂∂xx 1 ∂∂xx mm

In math, a partial derivative ∂∂∂∂

∂∂xx ii

of a function of several variables is its derivative

with respect to one of those variables, with the others held constant.

Note that it is possible to show that the gradient is a vector (a direction to move) that:

• Points in the direction of greatest increase of a function.

• Is 0 at a local maximum or local minimum. This is because if it is 0, it cannot

increase or decrease further.

The proof is left as an exercise to the interested reader. (Hint: consider Figure

2 and 3.)

Gradient descent

If the gradient points in the direction of greatest increase for a function, then it is

possible to move towards a local minimum for the function by simply moving in

a direction opposite to the gradient. That's the key observation used for gradient

descent algorithms, which will be used shortly. An example is provided in Figure 4:

Figure 4: Gradient descent for a function in three variables

[ 546 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!