Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

"Why would I want to force it to do that?"Padding comes to mind—you likely don’t want to pay attention to stuffed datapoints in a sequence, right? Let’s try out an example. Pretend we have a sourcesequence with one real and one padded data point, and that it went through anencoder to generate the corresponding "keys":source_seq = torch.tensor([[[-1., 1.], [0., 0.]]])# pretend there's an encoder here...keys = torch.tensor([[[-.38, .44], [.85, -.05]]])query = torch.tensor([[[-1., 1.]]])The source mask should be False for every padded data point,and its shape should be (N, 1, L), where L is the length of thesource sequence.source_mask = (source_seq != 0).all(axis=2).unsqueeze(1)source_mask # N, 1, LOutputtensor([[[ True, False]]])The mask will make the attention score equal to zero for the padded data points.If we use the "keys" we’ve just made up to initialize an instance of the attentionmechanism and call it using the source mask above, we’ll see the following result:torch.manual_seed(11)attnh = Attention(2)attnh.init_keys(keys)context = attnh(query, mask=source_mask)attnh.alphasOutputtensor([[[1., 0.]]])Attention | 727

The attention score of the second data point, as expected, was set to zero, leavingthe whole attention on the first data point.DecoderWe also need to make some small adjustments to the decoder:Decoder + Attention1 class DecoderAttn(nn.Module):2 def __init__(self, n_features, hidden_dim):3 super().__init__()4 self.hidden_dim = hidden_dim5 self.n_features = n_features6 self.hidden = None7 self.basic_rnn = nn.GRU(self.n_features,8 self.hidden_dim,9 batch_first=True)10 self.attn = Attention(self.hidden_dim) 111 self.regression = nn.Linear(2 * self.hidden_dim,12 self.n_features) 11314 def init_hidden(self, hidden_seq):15 # the output of the encoder is N, L, H16 # and init_keys expects batch-first as well17 self.attn.init_keys(hidden_seq) 218 hidden_final = hidden_seq[:, -1:]19 self.hidden = hidden_final.permute(1, 0, 2) # L, N, H2021 def forward(self, X, mask=None):22 # X is N, 1, F23 batch_first_output, self.hidden = \24 self.basic_rnn(X, self.hidden)25 query = batch_first_output[:, -1:]26 # Attention27 context = self.attn(query, mask=mask) 328 concatenated = torch.cat([context, query],29 axis=-1) 330 out = self.regression(concatenated)3132 # N, 1, F33 return out.view(-1, 1, self.n_features)728 | Chapter 9 — Part I: Sequence-to-Sequence

The attention score of the second data point, as expected, was set to zero, leaving

the whole attention on the first data point.

Decoder

We also need to make some small adjustments to the decoder:

Decoder + Attention

1 class DecoderAttn(nn.Module):

2 def __init__(self, n_features, hidden_dim):

3 super().__init__()

4 self.hidden_dim = hidden_dim

5 self.n_features = n_features

6 self.hidden = None

7 self.basic_rnn = nn.GRU(self.n_features,

8 self.hidden_dim,

9 batch_first=True)

10 self.attn = Attention(self.hidden_dim) 1

11 self.regression = nn.Linear(2 * self.hidden_dim,

12 self.n_features) 1

13

14 def init_hidden(self, hidden_seq):

15 # the output of the encoder is N, L, H

16 # and init_keys expects batch-first as well

17 self.attn.init_keys(hidden_seq) 2

18 hidden_final = hidden_seq[:, -1:]

19 self.hidden = hidden_final.permute(1, 0, 2) # L, N, H

20

21 def forward(self, X, mask=None):

22 # X is N, 1, F

23 batch_first_output, self.hidden = \

24 self.basic_rnn(X, self.hidden)

25 query = batch_first_output[:, -1:]

26 # Attention

27 context = self.attn(query, mask=mask) 3

28 concatenated = torch.cat([context, query],

29 axis=-1) 3

30 out = self.regression(concatenated)

31

32 # N, 1, F

33 return out.view(-1, 1, self.n_features)

728 | Chapter 9 — Part I: Sequence-to-Sequence

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!