22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Let’s create an encoder and feed it a source sequence:

torch.manual_seed(11)

encself = EncoderSelfAttn(n_heads=3, d_model=2,

ff_units=10, n_features=2)

query = source_seq

encoder_states = encself(query)

encoder_states

Output

tensor([[[-0.0498, 0.2193],

[-0.0642, 0.2258]]], grad_fn=<AddBackward0>)

It produced a sequence of states that will be the input of the (cross-)attention

mechanism used by the decoder. Business as usual.

Cross-Attention

The cross-attention was the first mechanism we discussed: The decoder provided

a "query" (Q), which served not only as input but also got concatenated to the

resulting context vector. That won’t be the case anymore! Instead of

concatenation, the context vector will go through a feed-forward network in the

decoder to generate the predicted coordinates.

The figure below illustrates the current state of the architecture: self-attention as

encoder, cross-attention on top of it, and the modifications to the decoder.

746 | Chapter 9 — Part II: Sequence-to-Sequence

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!