Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

encdec = EncoderDecoder(encoder, decoder_attn,input_len=2, target_len=2,teacher_forcing_prob=0.0)encdec(full_seq)Outputtensor([[[-0.3555, -0.1220],[-0.2641, -0.2521]]], grad_fn=<CopySlices>)We could use it to train a model already, but we would miss something interesting:visualizing the attention scores. To visualize them, we need to store them first.The easiest way to do so is to create a new class that inherits from EncoderDecoderand then override the init_outputs() and store_outputs() methods:Encoder + Decoder + Attention1 class EncoderDecoderAttn(EncoderDecoder):2 def __init__(self, encoder, decoder, input_len, target_len,3 teacher_forcing_prob=0.5):4 super().__init__(encoder, decoder, input_len, target_len,5 teacher_forcing_prob)6 self.alphas = None78 def init_outputs(self, batch_size):9 device = next(self.parameters()).device10 # N, L (target), F11 self.outputs = torch.zeros(batch_size,12 self.target_len,13 self.encoder.n_features).to(device)14 # N, L (target), L (source)15 self.alphas = torch.zeros(batch_size,16 self.target_len,17 self.input_len).to(device)1819 def store_output(self, i, out):20 # Stores the output21 self.outputs[:, i:i+1, :] = out22 self.alphas[:, i:i+1, :] = self.decoder.attn.alphasAttention | 731

The attention scores are stored in the alphas attribute of the attention model,which, in turn, is the decoder’s attn attribute. For each step in the target sequencegeneration, the corresponding scores are copied to the alphas attribute of theEncoderDecoderAttn model (line 22).IMPORTANT: Pay attention (pun very much intended!) to theshape of the alphas attribute: (N, L target , L source ). For each one outof N sequences in a mini-batch, there is a matrix, where each"query" (Q) coming from the target sequence (a row in thismatrix) has as many attention scores as there are "keys" (K) inthe source sequence (the columns in this matrix).We’ll visualize these matrices shortly. Moreover, a properunderstanding of how attention scores are organized in thealphas attribute will make it much easier to understand the nextsection: "Self-Attention."Model Configuration & TrainingWe just have to replace the original classes for both decoder and model with theirattention counterparts, and we’re good to go:Model Configuration1 torch.manual_seed(17)2 encoder = Encoder(n_features=2, hidden_dim=2)3 decoder_attn = DecoderAttn(n_features=2, hidden_dim=2)4 model = EncoderDecoderAttn(encoder, decoder_attn,5 input_len=2, target_len=2,6 teacher_forcing_prob=0.5)7 loss = nn.MSELoss()8 optimizer = optim.Adam(model.parameters(), lr=0.01)Model Training1 sbs_seq_attn = StepByStep(model, loss, optimizer)2 sbs_seq_attn.set_loaders(train_loader, test_loader)3 sbs_seq_attn.train(100)732 | Chapter 9 — Part I: Sequence-to-Sequence

The attention scores are stored in the alphas attribute of the attention model,

which, in turn, is the decoder’s attn attribute. For each step in the target sequence

generation, the corresponding scores are copied to the alphas attribute of the

EncoderDecoderAttn model (line 22).

IMPORTANT: Pay attention (pun very much intended!) to the

shape of the alphas attribute: (N, L target , L source ). For each one out

of N sequences in a mini-batch, there is a matrix, where each

"query" (Q) coming from the target sequence (a row in this

matrix) has as many attention scores as there are "keys" (K) in

the source sequence (the columns in this matrix).

We’ll visualize these matrices shortly. Moreover, a proper

understanding of how attention scores are organized in the

alphas attribute will make it much easier to understand the next

section: "Self-Attention."

Model Configuration & Training

We just have to replace the original classes for both decoder and model with their

attention counterparts, and we’re good to go:

Model Configuration

1 torch.manual_seed(17)

2 encoder = Encoder(n_features=2, hidden_dim=2)

3 decoder_attn = DecoderAttn(n_features=2, hidden_dim=2)

4 model = EncoderDecoderAttn(encoder, decoder_attn,

5 input_len=2, target_len=2,

6 teacher_forcing_prob=0.5)

7 loss = nn.MSELoss()

8 optimizer = optim.Adam(model.parameters(), lr=0.01)

Model Training

1 sbs_seq_attn = StepByStep(model, loss, optimizer)

2 sbs_seq_attn.set_loaders(train_loader, test_loader)

3 sbs_seq_attn.train(100)

732 | Chapter 9 — Part I: Sequence-to-Sequence

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!