Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
ax.set_xlabel('Learning Rate')ax.set_ylabel('Loss')fig.tight_layout()return tracking, figsetattr(StepByStep, 'lr_range_test', lr_range_test)Since the technique is supposed to be applied on an untrained model, we create anew model (and optimizer) here:Model Configuration1 torch.manual_seed(13)2 new_model = CNN2(n_feature=5, p=0.3)3 multi_loss_fn = nn.CrossEntropyLoss(reduction='mean')4 new_optimizer = optim.Adam(new_model.parameters(), lr=3e-4)Next, we create an instance of StepByStep and call the new method using thetraining data loader, the upper range for the learning rate (end_lr), and how manyiterations we’d like it to try:Learning Rate Range Test1 sbs_new = StepByStep(new_model, multi_loss_fn, new_optimizer)2 tracking, fig = sbs_new.lr_range_test(3 train_loader, end_lr=1e-1, num_iter=100)Figure 6.14 - Learning rate finderThere we go: a "U"-shaped curve. Apparently, the Karpathy Constant (3e-4) is toolow for our model. The descending part of the curve is the region we should aimfor: something around 0.01.Learning Rates | 451
This means we could have used a higher learning rate, like 0.005, to train ourmodel. But this also means we need to recreate the optimizer and update it insbs_new. First, let’s create a method for setting its optimizer:StepByStep Methoddef set_optimizer(self, optimizer):self.optimizer = optimizersetattr(StepByStep, 'set_optimizer', set_optimizer)Then, we create and set the new optimizer and train the model as usual:Updating LR and Model Training1 new_optimizer = optim.Adam(new_model.parameters(), lr=0.005)2 sbs_new.set_optimizer(new_optimizer)3 sbs_new.set_loaders(train_loader, val_loader)4 sbs_new.train(10)If you try it out, you’ll find that the training loss actually goes down a bit faster (andthat the model might be overfitting).DISCLAIMER: The learning rate finder is surely not magic!Sometimes you’ll not get the "U"-shaped curve: Maybe the initiallearning rate (as defined in your optimizer) is too high already, ormaybe the end_lr is too low. Even if you do, it does notnecessarily mean the mid-point of the descending part will giveyou the fastest learning rate for your model."OK, if I manage to choose a good learning rate from the start, am Idone with it?"Sorry, but NO! Well, it depends; it might be fine for simpler (but real, not toy)problems. The issue here is, for larger models, the loss surface (remember that,from Chapter 0?) becomes very messy, and a learning rate that works well at thestart of model training may be too high for a later stage of model training. It meansthat the learning rate needs to change or adapt.452 | Chapter 6: Rock, Paper, Scissors
- Page 426 and 427: will be the externally defined vari
- Page 428 and 429: Removing Hookssbs_cnn1.remove_hooks
- Page 430 and 431: return figsetattr(StepByStep, 'visu
- Page 432 and 433: Figure 5.22 - Feature maps (classif
- Page 434 and 435: classification: The predicted class
- Page 436 and 437: convolutional layers to our model a
- Page 438 and 439: Capturing Outputsfeaturizer_layers
- Page 440 and 441: the filters learned by the model pr
- Page 442 and 443: given chapter are imported at its v
- Page 444 and 445: Data PreparationThe data preparatio
- Page 446 and 447: model anyway. We’ll use it to com
- Page 448 and 449: StepByStep Method@staticmethoddef m
- Page 450 and 451: "What’s wrong with the colors?"Th
- Page 452 and 453: three_channel_filter = np.array([[[
- Page 454 and 455: Fancier Model (Constructor)class CN
- Page 456 and 457: Fancier Model (Classifier)def class
- Page 458 and 459: torch.manual_seed(44)dropping_model
- Page 460 and 461: Outputtensor([0.1000, 0.2000, 0.300
- Page 462 and 463: Figure 6.8 - Output distribution fo
- Page 464 and 465: Adaptive moment estimation (Adam) u
- Page 466 and 467: torch.manual_seed(13)# Model Config
- Page 468 and 469: Outputtorch.Size([5, 3, 3, 3])Its s
- Page 470 and 471: Choosing a learning rate that works
- Page 472 and 473: Higher-Order Learning Rate Function
- Page 474 and 475: Perfect! Now let’s build the actu
- Page 478 and 479: LRFinderThe function we’ve implem
- Page 480 and 481: value in our moving average has an
- Page 482 and 483: Figure 6.15 - Distribution of weigh
- Page 484 and 485: In code, the implementation of the
- Page 486 and 487: As expected, the EWMA without corre
- Page 488 and 489: optimizer = optim.Adam(model.parame
- Page 490 and 491: IMPORTANT: The logging function mus
- Page 492 and 493: Output{'state': {140601337662512: {
- Page 494 and 495: different optimizer, set them to ca
- Page 496 and 497: • dampening: dampening factor for
- Page 498 and 499: Figure 6.20 - Paths taken by SGD (w
- Page 500 and 501: Equation 6.16 - Looking aheadOnce N
- Page 502 and 503: Figure 6.22 - Path taken by each SG
- Page 504 and 505: for epoch in range(4):# training lo
- Page 506 and 507: course) up to a given number of epo
- Page 508 and 509: Next, we create a protected method
- Page 510 and 511: Mini-Batch SchedulersThese schedule
- Page 512 and 513: Schedulers in StepByStep — Part I
- Page 514 and 515: Scheduler PathsBefore trying out a
- Page 516 and 517: After applying each scheduler to SG
- Page 518 and 519: Data Preparation1 # Loads temporary
- Page 520 and 521: Figure 6.31 - LossesEvaluationprint
- Page 522 and 523: [96] http://www.samkass.com/theorie
- Page 524 and 525: ImportsFor the sake of organization
This means we could have used a higher learning rate, like 0.005, to train our
model. But this also means we need to recreate the optimizer and update it in
sbs_new. First, let’s create a method for setting its optimizer:
StepByStep Method
def set_optimizer(self, optimizer):
self.optimizer = optimizer
setattr(StepByStep, 'set_optimizer', set_optimizer)
Then, we create and set the new optimizer and train the model as usual:
Updating LR and Model Training
1 new_optimizer = optim.Adam(new_model.parameters(), lr=0.005)
2 sbs_new.set_optimizer(new_optimizer)
3 sbs_new.set_loaders(train_loader, val_loader)
4 sbs_new.train(10)
If you try it out, you’ll find that the training loss actually goes down a bit faster (and
that the model might be overfitting).
DISCLAIMER: The learning rate finder is surely not magic!
Sometimes you’ll not get the "U"-shaped curve: Maybe the initial
learning rate (as defined in your optimizer) is too high already, or
maybe the end_lr is too low. Even if you do, it does not
necessarily mean the mid-point of the descending part will give
you the fastest learning rate for your model.
"OK, if I manage to choose a good learning rate from the start, am I
done with it?"
Sorry, but NO! Well, it depends; it might be fine for simpler (but real, not toy)
problems. The issue here is, for larger models, the loss surface (remember that,
from Chapter 0?) becomes very messy, and a learning rate that works well at the
start of model training may be too high for a later stage of model training. It means
that the learning rate needs to change or adapt.
452 | Chapter 6: Rock, Paper, Scissors