22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Figure E.5 - Gradients: before and after clipping by value

Backward Hooks

As we saw in Chapter 6, the register_hook() method registers a backward

hook to a tensor for a given parameter. The hook function takes a gradient

as input and may return a modified, or clipped, gradient. The hook function

will be called every time a gradient with respect to that tensor is computed;

meaning, it can clip gradients during backpropagation, unlike the other

methods.

The code below attaches hooks to all parameters of the model, thus

performing gradient clipping on the fly:

def clip_backprop(model, clip_value):

handles = []

for p in model.parameters():

if p.requires_grad:

func = lambda grad: torch.clamp(grad,

-clip_value,

clip_value)

handle = p.register_hook(func)

handles.append(handle)

return handles

Do not forget that you should remove the hooks using handle.remove()

after you’re done with them.

Norm Clipping (or Gradient Scaling)

While value clipping was an element-wise operation, norm clipping computes the

norm for all gradients together as if they were concatenated into a single vector. If

576 | Extra Chapter: Vanishing and Exploding Gradients

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!