22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Output

tensor([0.1000, 0.2000, 0.3000, 0.4000, 0.5000, 0.6000, 0.7000,

0.8000, 0.9000, 1.0000, 1.1000])

Pretty boring, right? This isn’t doing anything!

Finally, an actual difference in behavior between train and eval

modes! It was long overdue!

The inputs are just passing through. What’s the implication of this? Well, that

linear layer that receives these values is still multiplying them by the weights and

summing them up:

F.linear(output_eval, weight=torch.ones(11), bias=torch.tensor(0))

Output

tensor(6.6000)

This is the sum of all inputs (because all the weights were set to one and no input

was dropped). If there was no adjusting factor, the outputs in evaluation and

training modes would be substantially different, simply because there would be

more terms to add up in evaluation mode.

"I am still not convinced … without adjusting the output would be

4.7, which is closer to 6.6 than the adjusted 9.4 … what is up?"

This happened because dropping is probabilistic, and only four out of ten elements

were actually dropped (that was the thought I asked you to hold on to). The factor

adjusts for the average number of dropped elements. We set the probability to

50% so, on average, five elements will be dropped. By the way, if you change the

seed to 45 and re-run the code, it will actually drop half of the inputs, and the

adjusted output will be 6.4 instead of 9.4.

Instead of setting a different random seed and manually checking which value it

produces, let’s generate 1,000 scenarios and compute the sum of the adjusted

dropped outputs to get their distribution:

Dropout | 435

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!