Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Clipping with HooksFirst, we reset the parameters once again:Model Configuration1 torch.manual_seed(42)2 with torch.no_grad():3 model.apply(weights_init)Next, we use set_clip_backprop() to clip the gradients during backpropagationusing hooks:Model Training1 sbs_reg_clip_hook = StepByStep(model, loss_fn, optimizer)2 sbs_reg_clip_hook.set_loaders(train_loader)3 sbs_reg_clip_hook.set_clip_backprop(1.0)4 sbs_reg_clip_hook.capture_gradients(['fc1'])5 sbs_reg_clip_hook.train(10)6 sbs_reg_clip_hook.remove_clip()7 sbs_reg_clip_hook.remove_hooks()fig = sbs_reg_clip_hook.plot_losses()Figure E.8 - Losses—clipping by value with hooksThe loss is, once again, well behaved. At first sight, there doesn’t seem to be anydifference…Vanishing and Exploding Gradients | 583

Or is there? Let’s compare the distributions of the computed gradients over thewhole training loop for both methods.Figure E.9 - Distributions of gradients during trainingWell, that’s a big difference! On the left plot, the gradients were computed as usualand only clipped before the parameter update to prevent the compounding effectthat led to the explosion of the gradients. On the right plot, no gradients are everabove the clip value (in absolute terms).Keep in mind that, even though the choice of clipping method does not seem tohave an impact on the overall loss of our simple model, this won’t hold true forrecurrent neural networks, and you should use hooks for clipping gradients inthat case.RecapThis extra chapter was much shorter than the others, and its purpose was toillustrate some simple techniques to take back control of gradients gone wild.Therefore, we’re skipping the "Putting It All Together" section this time. We usedtwo simple datasets, together with two simple models, to show the signs of bothvanishing and exploding gradients. The former issue was addressed with differentinitialization schemes and, optionally, batch normalization, while the latter wasaddressed by clipping the gradients in different ways. This is what we’ve covered:• visualizing the vanishing gradients problem in deeper models• using a function to initialize the weights of a model• visualizing the effect of initialization schemes on the gradients• realizing that batch normalization can compensate for bad initializations• understanding the exploding gradients problem• using gradient clipping to address the exploding gradients problem584 | Extra Chapter: Vanishing and Exploding Gradients

Clipping with Hooks

First, we reset the parameters once again:

Model Configuration

1 torch.manual_seed(42)

2 with torch.no_grad():

3 model.apply(weights_init)

Next, we use set_clip_backprop() to clip the gradients during backpropagation

using hooks:

Model Training

1 sbs_reg_clip_hook = StepByStep(model, loss_fn, optimizer)

2 sbs_reg_clip_hook.set_loaders(train_loader)

3 sbs_reg_clip_hook.set_clip_backprop(1.0)

4 sbs_reg_clip_hook.capture_gradients(['fc1'])

5 sbs_reg_clip_hook.train(10)

6 sbs_reg_clip_hook.remove_clip()

7 sbs_reg_clip_hook.remove_hooks()

fig = sbs_reg_clip_hook.plot_losses()

Figure E.8 - Losses—clipping by value with hooks

The loss is, once again, well behaved. At first sight, there doesn’t seem to be any

difference…

Vanishing and Exploding Gradients | 583

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!