22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

dummy_points = torch.randn(16, 2, 4) # N, L, F

mha = MultiHeadedAttention(n_heads=2, d_model=4, dropout=0.0)

mha.init_keys(dummy_points)

out = mha(dummy_points) # N, L, D

out.shape

Output

torch.Size([16, 2, 4])

Since we’re using the data points as "keys," "values," and "queries," this is a selfattention

mechanism.

The figure below depicts a multi-headed attention mechanism with its two heads,

blue (left) and green (right), and the first data point being used as a "query" to

generate the first "hidden state" (h 0 ).

Figure 10.4 - Self-(narrow)attention mechanism (both heads)

To help you out (especially if you’re seeing it in black and white), I’ve labeled the

Narrow Attention | 805

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!