Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

LRFinderThe function we’ve implemented above is fairly basic. For an implementationwith more bells and whistles, check this Python package: torch_lr_finder.[102]I am illustrating its usage here, which is quite similar to what we’ve doneabove, but please refer to the documentation for more details.!pip install --quiet torch-lr-finderfrom torch_lr_finder import LRFinderInstead of calling a function directly, we need to create an instance ofLRFinder first, using the typical model configuration objects (model,optimizer, loss function, and the device). Then, we can take the range_test()method for a spin, providing familiar arguments to it: a data loader, the upperrange for the learning rate, and the number of iterations. The reset()method restores the original states of both model and optimizer.torch.manual_seed(11)new_model = CNN2(n_feature=5, p=0.3)multi_loss_fn = nn.CrossEntropyLoss(reduction='mean')new_optimizer = optim.Adam(new_model.parameters(), lr=3e-4)device = 'cuda' if torch.cuda.is_available() else 'cpu'lr_finder = LRFinder(new_model, new_optimizer, multi_loss_fn, device=device)lr_finder.range_test(train_loader, end_lr=1e-1, num_iter=100)lr_finder.plot(log_lr=True)lr_finder.reset()Learning Rates | 453

Not quite a "U" shape, but we still can tell that something in the ballpark of1e-2 is a good starting point.Adaptive Learning RateThat’s what the Adam optimizer is actually doing for us—it starts with the learningrate provided as an argument, but it adapts the learning rate(s) as it goes, tweakingit in a different way for each parameter in the model. Or does it?Truth to be told, Adam does not adapt the learning rate—it really adapts thegradients. But, since the parameter update is given by the multiplication of bothterms, the learning rate and the gradient, this is a distinction without a difference.Adam combines the characteristics of two other optimizers: SGD (with momentum)and RMSProp. Like the former, it uses a moving average of gradients instead ofgradients themselves (that’s the first moment, in statistics jargon); like the latter, itscales the gradients using a moving average of squared gradients (that’s thesecond moment, or uncentered variance, in statistics jargon).But this is not a simple average. It is a moving average. And it is not any movingaverage. It is an exponentially weighted moving average (EWMA).Before diving into EWMAs, though, we need to briefly go over simple movingaverages.Moving Average (MA)To compute the moving average of a given feature x over a certain number ofperiods, we just have to average the values observed over that many time steps(from an initial value observed periods-1 steps ago all the way up to the currentvalue):Equation 6.1 - Simple moving averageBut, instead of averaging the values themselves, let’s compute the average age ofthe values. The current value has an age equals one unit of time while the oldest454 | Chapter 6: Rock, Paper, Scissors

LRFinder

The function we’ve implemented above is fairly basic. For an implementation

with more bells and whistles, check this Python package: torch_lr_finder.

[102]

I am illustrating its usage here, which is quite similar to what we’ve done

above, but please refer to the documentation for more details.

!pip install --quiet torch-lr-finder

from torch_lr_finder import LRFinder

Instead of calling a function directly, we need to create an instance of

LRFinder first, using the typical model configuration objects (model,

optimizer, loss function, and the device). Then, we can take the range_test()

method for a spin, providing familiar arguments to it: a data loader, the upper

range for the learning rate, and the number of iterations. The reset()

method restores the original states of both model and optimizer.

torch.manual_seed(11)

new_model = CNN2(n_feature=5, p=0.3)

multi_loss_fn = nn.CrossEntropyLoss(reduction='mean')

new_optimizer = optim.Adam(new_model.parameters(), lr=3e-4)

device = 'cuda' if torch.cuda.is_available() else 'cpu'

lr_finder = LRFinder(

new_model, new_optimizer, multi_loss_fn, device=device

)

lr_finder.range_test(train_loader, end_lr=1e-1, num_iter=100)

lr_finder.plot(log_lr=True)

lr_finder.reset()

Learning Rates | 453

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!