Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Figure 10.15 - Losses—Transformer model"Why is the validation loss so much better than the training loss?"This phenomenon may happen for a variety of reasons, from having an easiervalidation set to being a "side effect" of regularization (e.g., dropout) in our currentmodel. The regularization makes it harder for the model to learn or, in other words,it yields higher losses. In our Transformer model, there are many dropout layers,so it gets increasingly more difficult for the model to learn.Let’s observe this effect by using the same mini-batch to compute the loss usingthe trained model in both train and eval modes:torch.manual_seed(11)x, y = next(iter(train_loader))device = sbs_seq_transf.device# Trainingmodel_transf.train()loss(model_transf(x.to(device)), y.to(device))Outputtensor(0.0158, device='cuda:0', grad_fn=<MseLossBackward>)# Validationmodel_transf.eval()loss(model_transf(x.to(device)), y.to(device))The Transformer | 837

Outputtensor(0.0091, device='cuda:0')See the difference? The loss is roughly two times larger in training mode. You canalso set dropout to zero and retrain the model to verify that both loss curves getmuch closer to each other (by the way, the overall loss level gets better withoutdropout, but that’s just because our sequence-to-sequence problem is actuallyquite simple).Visualizing PredictionsLet’s plot the predicted coordinates and connect them using dashed lines, whileusing solid lines to connect the actual coordinates, just like before:fig = sequence_pred(sbs_seq_transf, full_test, test_directions)Figure 10.16 - PredictionsLooking good, right?The PyTorch TransformerSo far we’ve been using our own classes to build encoder and decoder "layers" andassemble them all into a Transformer. We don’t have to do it like that, though.PyTorch implements a full-fledged Transformer class of its own: nn.Transformer.There are some differences between PyTorch’s implementation and our own:838 | Chapter 10: Transform and Roll Out

Output

tensor(0.0091, device='cuda:0')

See the difference? The loss is roughly two times larger in training mode. You can

also set dropout to zero and retrain the model to verify that both loss curves get

much closer to each other (by the way, the overall loss level gets better without

dropout, but that’s just because our sequence-to-sequence problem is actually

quite simple).

Visualizing Predictions

Let’s plot the predicted coordinates and connect them using dashed lines, while

using solid lines to connect the actual coordinates, just like before:

fig = sequence_pred(sbs_seq_transf, full_test, test_directions)

Figure 10.16 - Predictions

Looking good, right?

The PyTorch Transformer

So far we’ve been using our own classes to build encoder and decoder "layers" and

assemble them all into a Transformer. We don’t have to do it like that, though.

PyTorch implements a full-fledged Transformer class of its own: nn.Transformer.

There are some differences between PyTorch’s implementation and our own:

838 | Chapter 10: Transform and Roll Out

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!