22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Figure 6.8 - Output distribution for dropout probabilities

On the left, if there is barely any dropout (p=0.10), the sum of adjusted outputs is

tightly distributed around the mean value. For more typical dropout probabilities

(like 30% or 50%), the distribution may take some more extreme values.

If we go to extremes, like a dropout probability of 90%, the distribution gets a bit

degenerated, I would say—it is pretty much all over the place (and it has a lot of

scenarios where everything gets dropped, hence the tall bar at zero).

The variance of the distribution of outputs grows with the

dropout probability.

A higher dropout probability makes it harder for your model to

learn—that’s what regularization does.

"Can I use dropout with the convolutional layers?"

Two-Dimensional Dropout

Yes, you can, but not that dropout. There is a specific dropout to be used with

convolutional layers: nn.Dropout2d. Its dropout procedure is a bit different,

though: Instead of dropping individual inputs (which would be pixel values in a

given channel), it drops entire channels / filters. So, if a convolutional layer

produces ten filters, a two-dimensional dropout with a probability of 50% would

drop five filters (on average), while the remaining filters would have all their pixel

values left untouched.

"Why does it drop entire channels instead of dropping pixels?"

Randomly dropping pixels doesn’t do much for regularization because adjacent

pixels are strongly correlated; that is, they have quite similar values. You can think

of it this way: If there are some dead pixels randomly spread in an image, the

Dropout | 437

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!