Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

torch.manual_seed(44)dropping_model.train()output_train = dropping_model(spaced_points)output_trainOutputtensor([0.0000, 0.4000, 0.0000, 0.8000, 0.0000, 1.2000, 1.4000,1.6000, 1.8000, 0.0000, 2.2000])There are many things to notice here:• The model is in train mode (very important, hold on to this!).• Since this model does not have any weights, it becomes clear that dropoutdrops inputs, not weights.• It dropped four elements only!• The remaining elements have different values now!"What’s going on here?"First, dropping is probabilistic, so each input had a 50% chance of being dropped.In our tiny example, by chance, only four out of ten were actually dropped (hold onto this thought too!).Figure 6.6 - Applying dropoutSecond, the remaining elements need to be proportionally adjusted by a factor of1/p. In our example, that’s a factor of two.Dropout | 433

output_train / spaced_pointsOutputtensor([0., 2., 0., 2., 0., 2., 2., 2., 2., 0., 2.])"Why?"This adjustment has the purpose of preserving (or at least trying to) the overalllevel of the outputs in the particular layer that’s "suffering" the dropout. So, let’simagine that these inputs (after dropping) will feed a linear layer and, foreducational purposes, that all their weights are equal to one (and bias equals zero).As you already know, a linear layer will multiply these weights by the (dropped)inputs and sum them up:F.linear(output_train, weight=torch.ones(11), bias=torch.tensor(0))Outputtensor(9.4000)The sum is 9.4. It would have been half of this (4.7) without the adjusting factor."OK, so what? Why do I need to preserve the level of the outputsanyway?"Because there is no dropping in evaluation mode! We’ve talked about it briefly inthe past—the dropout is random in nature, so it would produce slightly (or maybenot so slightly) different predictions for the same inputs. You don’t want that,that’s bad business. So, let’s set our model to eval mode (and that’s why I chose tomake it a model instead of using functional dropout) and see what happens there:dropping_model.eval()output_eval = dropping_model(spaced_points)output_eval434 | Chapter 6: Rock, Paper, Scissors

torch.manual_seed(44)

dropping_model.train()

output_train = dropping_model(spaced_points)

output_train

Output

tensor([0.0000, 0.4000, 0.0000, 0.8000, 0.0000, 1.2000, 1.4000,

1.6000, 1.8000, 0.0000, 2.2000])

There are many things to notice here:

• The model is in train mode (very important, hold on to this!).

• Since this model does not have any weights, it becomes clear that dropout

drops inputs, not weights.

• It dropped four elements only!

• The remaining elements have different values now!

"What’s going on here?"

First, dropping is probabilistic, so each input had a 50% chance of being dropped.

In our tiny example, by chance, only four out of ten were actually dropped (hold on

to this thought too!).

Figure 6.6 - Applying dropout

Second, the remaining elements need to be proportionally adjusted by a factor of

1/p. In our example, that’s a factor of two.

Dropout | 433

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!