09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The Math Behind Deep Learning

Chapter 8, Recurrent Neural Networks, discussed how to use LSTMs and GRUs to

deal with the problem of vanishing gradients and efficiently learn long-range

dependencies. In a similar way, the gradient can explode when one single term in

the multiplication of the Jacobian matrix becomes large. Chapter 8, Recurrent Neural

Networks, discussed how to use gradient clipping to deal with this problem.

We now come to the conclusion of this journey, and you should now better

understand how backpropagation works and how it is applied in neural networks

for dense networks, CNNs, and RNNs. In the next section, we will discuss how

TensorFlow computes gradients, and why this is useful for backpropagation.

A note on TensorFlow and automatic

differentiation

TensorFlow can automatically calculate derivatives, a feature called Automatic

Differentiation. This is achieved by using the chain rule. Every node in the

computational graph (see Chapter 2, TensorFlow 1.x and 2.x) has an attached

gradient operation for calculating the derivatives of input with respect to output.

After that, the gradients with respect to parameters are automatically computed

during backpropagation.

Automatic differentiation is a very important feature because you do not need

to handcode new variations of backpropagation for each new model of a neural

network. This allows quick iteration and running many experiments faster.

Summary

In this chapter we discussed the math behind deep learning. Put simply, a deep

learning model computes a function given an input vector to produce the output.

The interesting part is that we can literally have billions of parameters (weights) to

be tuned. Backpropagation is a core mathematical algorithm used by deep learning

for efficiently training artificial neural networks following a gradient descent

approach that exploits the chain rule. The algorithm is based on two steps repeated

alternatively: the forward step and the backstep.

During the forward step inputs are propagated through the network in order to

predict outputs. These predictions might be different from the true values given to

assess the quality of the network. In other words, there is an error and our goal is to

minimize it. This is where the backstep plays a role, by adjusting the weights of the

network to minimize the error.

[ 568 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!