Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

(and only if) the norm exceeds the clipping value, the gradients are scaled down tomatch the desired norm; otherwise they keep their values. We can use PyTorch’snn.utils.clip_grad_norm_() to scale gradients in-place:parm.grad = fake_grads.clone()# Gradient Norm Clippingnn.utils.clip_grad_norm_(parm, max_norm=1.0, norm_type=2)fake_grads.norm(), parm.grad.view(-1,), parm.grad.norm()Output(tensor(2.6249), tensor([0.9524, 0.3048]), tensor(1.0000))The norm of our fake gradients was 2.6249, and we’re clipping the norm at 1.0, sothe gradients get scaled by a factor of 0.3810.Clipping the norm preserves the direction of the gradient vector.Figure E.6 - Gradients: before and after clipping by norm"A couple of questions … first, which one is better?"On the one hand, norm clipping maintains the balance between the updates of allparameters since it’s only scaling the norm and preserving the direction. On theother hand, value clipping is faster, and the fact that it slightly changes the directionof the gradient vector does not seem to have any harmful impact on performance.So, you’re probably OK using value clipping."Second, which clip value should I use?"That’s trickier to answer—the clip value is a hyper-parameter that can be fine-tunedlike any other. You can start with a value like ten, and work your way down if theVanishing and Exploding Gradients | 577

gradients keep exploding."Finally, how do I actually do it in practice?"Glad you asked! We’re creating some more methods in our StepByStep class tohandle both kinds of clipping, and modifying the _make_train_step_fn() methodto account for them. Gradient clipping must happen after gradients are computed(loss.backward()) and before updating the parameters (optimizer.step()):StepByStep Methodsetattr(StepByStep, 'clipping', None)def set_clip_grad_value(self, clip_value):self.clipping = lambda: nn.utils.clip_grad_value_(self.model.parameters(), clip_value=clip_value)def set_clip_grad_norm(self, max_norm, norm_type=2):self.clipping = lambda: nn.utils.clip_grad_norm_(self.model.parameters(), max_norm, norm_type)def remove_clip(self):self.clipping = Nonedef _make_train_step_fn(self):# This method does not need ARGS... it can refer to# the attributes: self.model, self.loss_fn, and self.optimizer# Builds function that performs a step in the train loopdef perform_train_step_fn(x, y):# Sets model to TRAIN modeself.model.train()# Step 1 - Computes model's predicted output - forward passyhat = self.model(x)# Step 2 - Computes the lossloss = self.loss_fn(yhat, y)# Step 3 - Computes gradientsloss.backward()578 | Extra Chapter: Vanishing and Exploding Gradients

gradients keep exploding.

"Finally, how do I actually do it in practice?"

Glad you asked! We’re creating some more methods in our StepByStep class to

handle both kinds of clipping, and modifying the _make_train_step_fn() method

to account for them. Gradient clipping must happen after gradients are computed

(loss.backward()) and before updating the parameters (optimizer.step()):

StepByStep Method

setattr(StepByStep, 'clipping', None)

def set_clip_grad_value(self, clip_value):

self.clipping = lambda: nn.utils.clip_grad_value_(

self.model.parameters(), clip_value=clip_value

)

def set_clip_grad_norm(self, max_norm, norm_type=2):

self.clipping = lambda: nn.utils.clip_grad_norm_(

self.model.parameters(), max_norm, norm_type

)

def remove_clip(self):

self.clipping = None

def _make_train_step_fn(self):

# This method does not need ARGS... it can refer to

# the attributes: self.model, self.loss_fn, and self.optimizer

# Builds function that performs a step in the train loop

def perform_train_step_fn(x, y):

# Sets model to TRAIN mode

self.model.train()

# Step 1 - Computes model's predicted output - forward pass

yhat = self.model(x)

# Step 2 - Computes the loss

loss = self.loss_fn(yhat, y)

# Step 3 - Computes gradients

loss.backward()

578 | Extra Chapter: Vanishing and Exploding Gradients

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!