22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Or is there? Let’s compare the distributions of the computed gradients over the

whole training loop for both methods.

Figure E.9 - Distributions of gradients during training

Well, that’s a big difference! On the left plot, the gradients were computed as usual

and only clipped before the parameter update to prevent the compounding effect

that led to the explosion of the gradients. On the right plot, no gradients are ever

above the clip value (in absolute terms).

Keep in mind that, even though the choice of clipping method does not seem to

have an impact on the overall loss of our simple model, this won’t hold true for

recurrent neural networks, and you should use hooks for clipping gradients in

that case.

Recap

This extra chapter was much shorter than the others, and its purpose was to

illustrate some simple techniques to take back control of gradients gone wild.

Therefore, we’re skipping the "Putting It All Together" section this time. We used

two simple datasets, together with two simple models, to show the signs of both

vanishing and exploding gradients. The former issue was addressed with different

initialization schemes and, optionally, batch normalization, while the latter was

addressed by clipping the gradients in different ways. This is what we’ve covered:

• visualizing the vanishing gradients problem in deeper models

• using a function to initialize the weights of a model

• visualizing the effect of initialization schemes on the gradients

• realizing that batch normalization can compensate for bad initializations

• understanding the exploding gradients problem

• using gradient clipping to address the exploding gradients problem

584 | Extra Chapter: Vanishing and Exploding Gradients

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!