22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Figure 10.10 - Layer norm vs batch norm

In Chapter 7 we learned that the size of the mini-batch strongly impacts the

running statistics of the batch normalization. We also learned that batch norm’s

oscillating statistics may introduce a regularizing effect.

None of this happens with layer normalization: It steadily delivers data points with

zero mean and unit standard deviation regardless of our choice of mini-batch size

or anything else. Let’s see it in action!

First, we’re visualizing the distribution of the positionally-encoded features that

we generated.

Figure 10.11 - Distribution of feature values

The actual range is much larger than that (like -50 to 50), and the variance is

approximately the same as the dimensionality (256) as a result of the addition of

positional encoding. Let’s apply layer normalization to it:

layer_normalizer = nn.LayerNorm(256)

dummy_normed = layer_normalizer(dummy_enc)

dummy_normed

Layer Normalization | 827

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!