22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Sure, different values of b produce different cross-section loss curves for w. And

those curves will depend on the shape of the loss surface (more on that later, in the

"Learning Rate" section).

OK, so far, so good… What about the other cross-section? Let’s cut it horizontally

now, making w = -0.16 (the value from w_range that is closest to our initial random

value for b, -0.1382). The resulting plot is on the right:

Figure 0.6 - Horizontal cross-section; parameter w is fixed

Now, if we keep w constant (at -0.16), the loss, seen from the perspective of

parameter b, can be minimized if b gets increased (up to some value close to 2).

In general, the purpose of this cross-section is to get the effect on

the loss of changing a single parameter, while keeping

everything else constant. This is, in a nutshell, a gradient :-)

Now I have a question for you: Which of the two dashed curves,

red (w changes, b is constant) or black (b changes, w is constant)

yields the largest changes in loss when we modify the changing

parameter?

The answer is coming right up in the next section!

Step 3 - Compute the Gradients

A gradient is a partial derivative—why partial? Because one computes it with

Step 3 - Compute the Gradients | 37

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!