22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Adam

So, choosing the Adam optimizer is an easy and straightforward way to tackle your

learning rate needs. Let’s take a closer look at PyTorch’s Adam optimizer and its

arguments:

• params: model’s parameters

• lr: learning rate, default value 1e-3

• betas: tuple containing beta1 and beta2 for the EWMAs

• eps: the epsilon (1e-8) value in the denominator

The four arguments above should be clear by now. But there are two others we

haven’t talked about yet:

• weight_decay: L2 penalty

• amsgrad: if the AMSGrad variant should be used

The first argument, weight decay, introduces a regularization term (L2 penalty) to

the model’s weights. As with every regularization procedure, it aims to prevent

overfitting by penalizing weights with large values. The term weight decay comes

from the fact that the regularization actually increases the gradients by adding the

weight value multiplied by the weight decay argument.

"If it increases the gradients, how come it is called weight decay?"

In the parameter update, the gradient is multiplied by the learning rate and

subtracted from the weight’s previous value. So, in effect, adding a penalty to the

value of the gradients makes the weights smaller. The smaller the weights, the

smaller the penalty, thus making further reductions even smaller—in other words,

the weights are decaying.

The second argument, amsgrad, makes the optimizer compatible with a variant of

the same name. In a nutshell, it modifies the formula used to compute adapted

gradients, ditching the bias correction and using the peak value of the EWMA of

squared gradients instead.

For now, we’re sticking with the first four, well-known to us, arguments:

462 | Chapter 6: Rock, Paper, Scissors

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!