Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Figure 6.31 - LossesEvaluationprint(StepByStep.loader_apply(train_loader, sbs_cnn3.correct).sum(axis=0),StepByStep.loader_apply(val_loader, sbs_cnn3.correct).sum(axis=0))Outputtensor([2511, 2520]) tensor([336, 372])Looking good! Lower losses, 99.64% training accuracy, and 90.32% validationaccuracy.RecapIn this chapter, we’ve introduced dropout layers for regularization and focused onthe inner workings of different optimizers and the role of the learning rate in theprocess. This is what we’ve covered:• computing channel statistics using a temporary data loader to build aNormalize() transform• using Normalize() to standardize an image dataset• understanding how convolutions over multiple channels work• building a fancier model with two typical convolutional blocks and dropoutlayersRecap | 495

• understanding how the dropout probability generates a distribution ofoutputs• observing the effect of train and eval modes in dropout layers• visualizing the regularizing effect of dropout layers• using the learning rate range test to find an interval of learning rate candidates• computing bias-corrected exponentially weighted moving averages of bothgradients and squared gradients to implement adaptive learning rates like theAdam optimizer• capturing gradients using register_hook() on tensors of learnable parameters• capturing parameters using the previously implemented attach_hooks()method• visualizing the path taken by different optimizers for updating parameters• understanding how momentum is computed and its effect on the parameterupdate• (re)discovering the clever look-ahead trick implemented by Nesterov’smomentum• learning about different types of schedulers: epoch, validation loss, and minibatch• including learning rate schedulers in the training loop• visualizing the impact of a scheduler on the path taken for updatingparametersCongratulations! You have just learned about the tools commonly used for trainingdeep learning models: adaptive learning rates, momentum, and learning rateschedulers. Far from being an exhaustive lesson on this topic, this chapter hasgiven you a good understanding of the basic building blocks. You have also learnedhow dropout can be used to reduce overfitting and, consequently, improvegeneralization.In the next chapter, we’ll learn about transfer learning to leverage the power ofpre-trained models, and we’ll go over some key components of populararchitectures, like 1x1 convolutions, batch normalization layers, and residualconnections.[94] https://github.com/dvgodoy/PyTorchStepByStep/blob/master/Chapter06.ipynb[95] https://colab.research.google.com/github/dvgodoy/PyTorchStepByStep/blob/master/Chapter06.ipynb496 | Chapter 6: Rock, Paper, Scissors

• understanding how the dropout probability generates a distribution of

outputs

• observing the effect of train and eval modes in dropout layers

• visualizing the regularizing effect of dropout layers

• using the learning rate range test to find an interval of learning rate candidates

• computing bias-corrected exponentially weighted moving averages of both

gradients and squared gradients to implement adaptive learning rates like the

Adam optimizer

• capturing gradients using register_hook() on tensors of learnable parameters

• capturing parameters using the previously implemented attach_hooks()

method

• visualizing the path taken by different optimizers for updating parameters

• understanding how momentum is computed and its effect on the parameter

update

• (re)discovering the clever look-ahead trick implemented by Nesterov’s

momentum

• learning about different types of schedulers: epoch, validation loss, and minibatch

• including learning rate schedulers in the training loop

• visualizing the impact of a scheduler on the path taken for updating

parameters

Congratulations! You have just learned about the tools commonly used for training

deep learning models: adaptive learning rates, momentum, and learning rate

schedulers. Far from being an exhaustive lesson on this topic, this chapter has

given you a good understanding of the basic building blocks. You have also learned

how dropout can be used to reduce overfitting and, consequently, improve

generalization.

In the next chapter, we’ll learn about transfer learning to leverage the power of

pre-trained models, and we’ll go over some key components of popular

architectures, like 1x1 convolutions, batch normalization layers, and residual

connections.

[94] https://github.com/dvgodoy/PyTorchStepByStep/blob/master/Chapter06.ipynb

[95] https://colab.research.google.com/github/dvgodoy/PyTorchStepByStep/blob/master/Chapter06.ipynb

496 | Chapter 6: Rock, Paper, Scissors

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!