22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The Transformer

Let’s start with the diagram, which is nothing more than an encoder and a decoder

side-by-side (we’re sticking with norm-first "sub-layer" wrappers).

Figure 10.13 - The Transformer (norm-first)

The Transformer still is an encoder-decoder architecture like the one we

developed in the previous chapter, so it should be no surprise that we can actually

use our former EncoderDecoderSelfAttn class as a parent class and add two extra

components to it:

• A projection layer to map our original features (n_features) to the

dimensionality of both encoder and decoder (d_model).

The Transformer | 831

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!