22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

networks) inside a "layer" is now a "sub-layer."

The figure above represents an encoder-decoder architecture with two "layers"

each. But we’re not stopping there: We’re stacking six "layers"! It would be

somewhat hard to draw a diagram for it, so we’re simplifying it a bit.

Figure 10.6 - Stacked "layers"

On the one hand, we could simply draw both stacks of "layers" and abstract away

their inner operations. That’s the diagram (a) in the figure above. On the other

hand, since all "layers" are identical, we can keep representing the inner

operations and just hint at the stack by adding "Nx "Layers"" next to it. That’s

diagram (b) in the figure above, and it will be our representation of choice from now

on.

By the way, that’s exactly how a Transformer is built!

"Cool! Is this a Transformer already then?"

Not yet, no. We need to work further on the "sub-layers" to transform (ahem!) the

architecture above into a real Transformer.

Wrapping "Sub-Layers"

As our model grows deeper, with many stacked "layers," we’re going to run into

familiar issues, like the vanishing gradients problem. In computer vision models,

this issue was successfully addressed by the addition of other components, like

batch normalization and residual connections. Moreover, we know that …

808 | Chapter 10: Transform and Roll Out

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!