Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Outputtensor([0.1000, 0.2000, 0.3000, 0.4000, 0.5000, 0.6000, 0.7000,0.8000, 0.9000, 1.0000, 1.1000])Pretty boring, right? This isn’t doing anything!Finally, an actual difference in behavior between train and evalmodes! It was long overdue!The inputs are just passing through. What’s the implication of this? Well, thatlinear layer that receives these values is still multiplying them by the weights andsumming them up:F.linear(output_eval, weight=torch.ones(11), bias=torch.tensor(0))Outputtensor(6.6000)This is the sum of all inputs (because all the weights were set to one and no inputwas dropped). If there was no adjusting factor, the outputs in evaluation andtraining modes would be substantially different, simply because there would bemore terms to add up in evaluation mode."I am still not convinced … without adjusting the output would be4.7, which is closer to 6.6 than the adjusted 9.4 … what is up?"This happened because dropping is probabilistic, and only four out of ten elementswere actually dropped (that was the thought I asked you to hold on to). The factoradjusts for the average number of dropped elements. We set the probability to50% so, on average, five elements will be dropped. By the way, if you change theseed to 45 and re-run the code, it will actually drop half of the inputs, and theadjusted output will be 6.4 instead of 9.4.Instead of setting a different random seed and manually checking which value itproduces, let’s generate 1,000 scenarios and compute the sum of the adjusteddropped outputs to get their distribution:Dropout | 435

torch.manual_seed(17)p = 0.5distrib_outputs = torch.tensor([F.linear(F.dropout(spaced_points, p=p),weight=torch.ones(11), bias=torch.tensor(0))for _ in range(1000)])Figure 6.7 - Distribution of outputsThe figure above shows us that, for that set of inputs, the output of our simplelinear layer with dropout will not be exactly 6.6 anymore, but something between0 and 12. The mean value for all scenarios is quite close to 6.6, though.Dropout not only drops some inputs but, due to its probabilisticnature, produces a distribution of outputs.In other words, the model needs to learn how to handle adistribution of values that is centered at the value the outputwould have if there was no dropout.Moreover, the choice of the dropout probability determines how spread out theoutputs will be.436 | Chapter 6: Rock, Paper, Scissors

torch.manual_seed(17)

p = 0.5

distrib_outputs = torch.tensor([

F.linear(F.dropout(spaced_points, p=p),

weight=torch.ones(11), bias=torch.tensor(0))

for _ in range(1000)

])

Figure 6.7 - Distribution of outputs

The figure above shows us that, for that set of inputs, the output of our simple

linear layer with dropout will not be exactly 6.6 anymore, but something between

0 and 12. The mean value for all scenarios is quite close to 6.6, though.

Dropout not only drops some inputs but, due to its probabilistic

nature, produces a distribution of outputs.

In other words, the model needs to learn how to handle a

distribution of values that is centered at the value the output

would have if there was no dropout.

Moreover, the choice of the dropout probability determines how spread out the

outputs will be.

436 | Chapter 6: Rock, Paper, Scissors

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!