22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Adaptive moment estimation (Adam) uses adaptive learning

rates, computing a learning rate for each parameter. Yes, you

read it right: Each parameter has a learning rate to call its own!

If you dig into the state_dict() of an Adam optimizer, you’ll find

tensors shaped like the parameters of every layer in your model

that Adam will use to compute the corresponding learning rates.

True story!

Adam is known to achieve good results fast and is likely a safe choice of optimizer.

We’ll get back to its inner workings in a later section.

Learning Rate

Another thing we need to keep in mind is that 0.1 won’t cut it as a learning rate

anymore. Remember what happens when the learning rate is too high? The loss

doesn’t go down or, even worse, goes up! We need to go lower, much lower, than

that. For this example, let’s use 3e-4, the "Karpathy’s Constant." [100] Even though it

was meant as a joke, it still is in the right order of magnitude, so let’s give it a try.

Model Configuration

1 torch.manual_seed(13)

2 model_cnn2 = CNN2(n_feature=5, p=0.3)

3 multi_loss_fn = nn.CrossEntropyLoss(reduction='mean')

4 optimizer_cnn2 = optim.Adam(model_cnn2.parameters(), lr=3e-4)

We have everything in place to start the…

Model Training

Once again, we use our StepByStep class to handle model training for us.

Model Training

1 sbs_cnn2 = StepByStep(model_cnn2, multi_loss_fn, optimizer_cnn2)

2 sbs_cnn2.set_loaders(train_loader, val_loader)

3 sbs_cnn2.train(10)

You should expect training to take a while since this model is more complex than

Model Training | 439

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!