Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Visualizing PredictionsLet’s plot the predicted coordinates and connect them using dashed lines, whileusing solid lines to connect the actual coordinates, just like before:fig = sequence_pred(sbs_seq_selfattnpe, full_test, test_directions)Figure 9.50 - Predicting the last two cornersAwesome, it looks like positional encoding is working well indeed—the predictedcoordinates are quite close to the actual ones for the most part.Positional Encoding (PE) | 781

Visualizing AttentionNow, let’s check what the model is paying attention to for the first two sequencesin the training set. Unlike last time, though, there are three heads and threeattention mechanisms to visualize now.We’re starting with the three heads of the self-attention mechanism of theencoder. There are two data points in our source sequence, so each attention headhas a two-by-two matrix of attention scores.Figure 9.51 - Encoder’s self-attention scores for its three headsIt seems that, in Attention Head #3, each data point is dividing its attentionbetween itself and the other data point. In the other attention heads, though, thedata points are paying attention to a single data point, either itself or the other one.Of course, these are just two data points used for visualization: The attentionscores are different for each source sequence.782 | Chapter 9 — Part II: Sequence-to-Sequence

Visualizing Attention

Now, let’s check what the model is paying attention to for the first two sequences

in the training set. Unlike last time, though, there are three heads and three

attention mechanisms to visualize now.

We’re starting with the three heads of the self-attention mechanism of the

encoder. There are two data points in our source sequence, so each attention head

has a two-by-two matrix of attention scores.

Figure 9.51 - Encoder’s self-attention scores for its three heads

It seems that, in Attention Head #3, each data point is dividing its attention

between itself and the other data point. In the other attention heads, though, the

data points are paying attention to a single data point, either itself or the other one.

Of course, these are just two data points used for visualization: The attention

scores are different for each source sequence.

782 | Chapter 9 — Part II: Sequence-to-Sequence

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!