16.03.2021 Views

Advanced Deep Learning with Keras

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Introducing Advanced Deep Learning with Keras

GD is performed iteratively. At each step, y will get closer to its minimum value.

dy

At x = 0.5 0.0

dx = , the GD has found the absolute minimum value of y = -1.25.

The gradient recommends no further change in x.

The choice of learning rate is crucial. A large value of ∈ may not find the minimum

value since the search will just swing back and forth around the minimum value.

On the other hand, too small value of ∈ may take a significant number of iterations

before the minimum is found. In the case of multiple minima, the search might get

stuck in a local minimum.

Figure 1.3.7: Gradient descent is similar to walking downhill on the function curve until

the lowest point is reached. In this plot, the global minimum is at x = 0.5.

An example of multiple minima can be seen in Figure 1.3.8. If for some reason the

search started at the left side of the plot and the learning rate is very small, there

is a high probability that GD will find x = -1.51 as the minimum value of y. GD

will not find the global minimum at x = 1.66. A sufficiently valued learning rate

will enable the gradient descent to overcome the hill at x = 0.0. In deep learning

practice, it is normally recommended to start at a bigger learning rate (for example.

0.1 to 0.001) and gradually decrease as the loss gets closer to the minimum.

[ 18 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!