22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Figure 9.24 - Encoder with self-attention

Let’s focus on the self-attention mechanism on the left, used to generate the

"hidden state" (h 00 , h 01 ) corresponding to the first data point in the source

sequence (x 00 , x 01 ), and see how it works in great detail:

• The transformed coordinates of the first data point are used as "query" (Q).

• This "query" (Q) will be paired, independently, with each of the two "keys" (K),

one of them being a different transformation of the same coordinates (x 00 , x 01 ),

the other being a transformation of the second data point (x 10 , x 11 ).

• The pairing above will result in two attention scores (alphas, s 0 and s 1 ) that,

multiplied by their corresponding "values" (V), are added up to become the

context vector:

Equation 9.11 - Context vector for first input (x 0 )

• Next, the context vector goes through the feed-forward network, and the first

"hidden state" is born!

742 | Chapter 9 — Part II: Sequence-to-Sequence

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!