Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Moreover, let’s use a ten times higher learning rate; after all, we’re in full control ofthe gradients now:Model Configuration1 torch.manual_seed(42)2 with torch.no_grad():3 model.apply(weights_init)45 optimizer = optim.SGD(model.parameters(), lr=0.1)Before training it, let’s use set_clip_grad_value() to make sure no gradients areever above 1.0:Model Training1 sbs_reg_clip = StepByStep(model, loss_fn, optimizer)2 sbs_reg_clip.set_loaders(train_loader)3 sbs_reg_clip.set_clip_grad_value(1.0)4 sbs_reg_clip.capture_gradients(['fc1'])5 sbs_reg_clip.train(10)6 sbs_reg_clip.remove_clip()7 sbs_reg_clip.remove_hooks()fig = sbs_reg_clip.plot_losses()No more exploding gradients, it seems. The loss is being minimized even afterchoosing a much higher learning rate to train the model.Vanishing and Exploding Gradients | 581

Figure E.7 - Losses—clipping by valueWhat about taking a look at the average gradients once again (there are 320updates now, so we’re looking at the extremes only):avg_grad = np.array(sbs_reg_clip._gradients['fc1']['weight']).mean(axis=(1, 2))avg_grad.min(), avg_grad.max()Output(-24.69288555463155, 14.385948762893676)"How come these (absolute) values are much larger than our clippingvalue?"These are the computed gradients; that is, before clipping. Left unchecked, thesegradients would have caused large updates, which, in turn, would have resulted ineven larger gradients, and so on and so forth. Explosion, basically. But these valueswere all clipped before being used in the parameter update, so all went well withthe model training.It is possible to take a more aggressive approach and clip the gradients at the originusing the backward hooks we discussed before.582 | Extra Chapter: Vanishing and Exploding Gradients

Moreover, let’s use a ten times higher learning rate; after all, we’re in full control of

the gradients now:

Model Configuration

1 torch.manual_seed(42)

2 with torch.no_grad():

3 model.apply(weights_init)

4

5 optimizer = optim.SGD(model.parameters(), lr=0.1)

Before training it, let’s use set_clip_grad_value() to make sure no gradients are

ever above 1.0:

Model Training

1 sbs_reg_clip = StepByStep(model, loss_fn, optimizer)

2 sbs_reg_clip.set_loaders(train_loader)

3 sbs_reg_clip.set_clip_grad_value(1.0)

4 sbs_reg_clip.capture_gradients(['fc1'])

5 sbs_reg_clip.train(10)

6 sbs_reg_clip.remove_clip()

7 sbs_reg_clip.remove_hooks()

fig = sbs_reg_clip.plot_losses()

No more exploding gradients, it seems. The loss is being minimized even after

choosing a much higher learning rate to train the model.

Vanishing and Exploding Gradients | 581

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!