Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
Figure 6.31 - LossesEvaluationprint(StepByStep.loader_apply(train_loader, sbs_cnn3.correct).sum(axis=0),StepByStep.loader_apply(val_loader, sbs_cnn3.correct).sum(axis=0))Outputtensor([2511, 2520]) tensor([336, 372])Looking good! Lower losses, 99.64% training accuracy, and 90.32% validationaccuracy.RecapIn this chapter, we’ve introduced dropout layers for regularization and focused onthe inner workings of different optimizers and the role of the learning rate in theprocess. This is what we’ve covered:• computing channel statistics using a temporary data loader to build aNormalize() transform• using Normalize() to standardize an image dataset• understanding how convolutions over multiple channels work• building a fancier model with two typical convolutional blocks and dropoutlayersRecap | 495
• understanding how the dropout probability generates a distribution ofoutputs• observing the effect of train and eval modes in dropout layers• visualizing the regularizing effect of dropout layers• using the learning rate range test to find an interval of learning rate candidates• computing bias-corrected exponentially weighted moving averages of bothgradients and squared gradients to implement adaptive learning rates like theAdam optimizer• capturing gradients using register_hook() on tensors of learnable parameters• capturing parameters using the previously implemented attach_hooks()method• visualizing the path taken by different optimizers for updating parameters• understanding how momentum is computed and its effect on the parameterupdate• (re)discovering the clever look-ahead trick implemented by Nesterov’smomentum• learning about different types of schedulers: epoch, validation loss, and minibatch• including learning rate schedulers in the training loop• visualizing the impact of a scheduler on the path taken for updatingparametersCongratulations! You have just learned about the tools commonly used for trainingdeep learning models: adaptive learning rates, momentum, and learning rateschedulers. Far from being an exhaustive lesson on this topic, this chapter hasgiven you a good understanding of the basic building blocks. You have also learnedhow dropout can be used to reduce overfitting and, consequently, improvegeneralization.In the next chapter, we’ll learn about transfer learning to leverage the power ofpre-trained models, and we’ll go over some key components of populararchitectures, like 1x1 convolutions, batch normalization layers, and residualconnections.[94] https://github.com/dvgodoy/PyTorchStepByStep/blob/master/Chapter06.ipynb[95] https://colab.research.google.com/github/dvgodoy/PyTorchStepByStep/blob/master/Chapter06.ipynb496 | Chapter 6: Rock, Paper, Scissors
- Page 470 and 471: Choosing a learning rate that works
- Page 472 and 473: Higher-Order Learning Rate Function
- Page 474 and 475: Perfect! Now let’s build the actu
- Page 476 and 477: ax.set_xlabel('Learning Rate')ax.se
- Page 478 and 479: LRFinderThe function we’ve implem
- Page 480 and 481: value in our moving average has an
- Page 482 and 483: Figure 6.15 - Distribution of weigh
- Page 484 and 485: In code, the implementation of the
- Page 486 and 487: As expected, the EWMA without corre
- Page 488 and 489: optimizer = optim.Adam(model.parame
- Page 490 and 491: IMPORTANT: The logging function mus
- Page 492 and 493: Output{'state': {140601337662512: {
- Page 494 and 495: different optimizer, set them to ca
- Page 496 and 497: • dampening: dampening factor for
- Page 498 and 499: Figure 6.20 - Paths taken by SGD (w
- Page 500 and 501: Equation 6.16 - Looking aheadOnce N
- Page 502 and 503: Figure 6.22 - Path taken by each SG
- Page 504 and 505: for epoch in range(4):# training lo
- Page 506 and 507: course) up to a given number of epo
- Page 508 and 509: Next, we create a protected method
- Page 510 and 511: Mini-Batch SchedulersThese schedule
- Page 512 and 513: Schedulers in StepByStep — Part I
- Page 514 and 515: Scheduler PathsBefore trying out a
- Page 516 and 517: After applying each scheduler to SG
- Page 518 and 519: Data Preparation1 # Loads temporary
- Page 522 and 523: [96] http://www.samkass.com/theorie
- Page 524 and 525: ImportsFor the sake of organization
- Page 526 and 527: ILSVRC-2012The 2012 edition [111] o
- Page 528 and 529: remained unchanged.ResNet (MSRA Tea
- Page 530 and 531: Transfer Learning in PracticeIn Cha
- Page 532 and 533: dropout. You’re already familiar
- Page 534 and 535: OutputDownloading: "https://downloa
- Page 536 and 537: Replacing the "Top" of the Model1 a
- Page 538 and 539: Model Size Classifier Layer(s) Repl
- Page 540 and 541: Model TrainingWe have everything se
- Page 542 and 543: "Removing" the Top Layer1 alex.clas
- Page 544 and 545: torch.save(train_preproc.tensors, '
- Page 546 and 547: Outputtensor([[109, 124],[124, 124]
- Page 548 and 549: Model Configuration1 optimizer_mode
- Page 550 and 551: Figure 7.4 - 1x1 convolutionThe inp
- Page 552 and 553: The weights used by PIL are 0.299 f
- Page 554 and 555: • reduce the number of output cha
- Page 556 and 557: The constructor method defines the
- Page 558 and 559: Does it sound familiar? That’s wh
- Page 560 and 561: and w to represent these parameters
- Page 562 and 563: A mini-batch of size 64 is small en
- Page 564 and 565: normed1 = batch_normalizer(batch1[0
- Page 566 and 567: OutputOrderedDict([('running_mean',
- Page 568 and 569: OutputOrderedDict([('running_mean',
• understanding how the dropout probability generates a distribution of
outputs
• observing the effect of train and eval modes in dropout layers
• visualizing the regularizing effect of dropout layers
• using the learning rate range test to find an interval of learning rate candidates
• computing bias-corrected exponentially weighted moving averages of both
gradients and squared gradients to implement adaptive learning rates like the
Adam optimizer
• capturing gradients using register_hook() on tensors of learnable parameters
• capturing parameters using the previously implemented attach_hooks()
method
• visualizing the path taken by different optimizers for updating parameters
• understanding how momentum is computed and its effect on the parameter
update
• (re)discovering the clever look-ahead trick implemented by Nesterov’s
momentum
• learning about different types of schedulers: epoch, validation loss, and minibatch
• including learning rate schedulers in the training loop
• visualizing the impact of a scheduler on the path taken for updating
parameters
Congratulations! You have just learned about the tools commonly used for training
deep learning models: adaptive learning rates, momentum, and learning rate
schedulers. Far from being an exhaustive lesson on this topic, this chapter has
given you a good understanding of the basic building blocks. You have also learned
how dropout can be used to reduce overfitting and, consequently, improve
generalization.
In the next chapter, we’ll learn about transfer learning to leverage the power of
pre-trained models, and we’ll go over some key components of popular
architectures, like 1x1 convolutions, batch normalization layers, and residual
connections.
[94] https://github.com/dvgodoy/PyTorchStepByStep/blob/master/Chapter06.ipynb
[95] https://colab.research.google.com/github/dvgodoy/PyTorchStepByStep/blob/master/Chapter06.ipynb
496 | Chapter 6: Rock, Paper, Scissors