Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Outputtensor([[[ 0.0832, -0.0356],[ 0.3105, -0.5263]]], grad_fn=<PermuteBackward>)keys = hidden_seq # N, L, HkeysOutputtensor([[[ 0.0832, -0.0356],[ 0.3105, -0.5263]]], grad_fn=<PermuteBackward>)The encoder-decoder dynamics stay exactly the same: We still use the encoder’sfinal hidden state as the decoder’s initial hidden state (even though we’re sendingthe whole sequence to the decoder, it still uses the last hidden state only), and westill use the last element of the source sequence as input to the first step of thedecoder:torch.manual_seed(21)decoder = Decoder(n_features=2, hidden_dim=2)decoder.init_hidden(hidden_seq)inputs = source_seq[:, -1:]out = decoder(inputs)The first "query" (Q) is the decoder’s hidden state (remember, hidden states arealways sequence-first, so we’re permuting it to batch-first):query = decoder.hidden.permute(1, 0, 2) # N, 1, HqueryOutputtensor([[[ 0.3913, -0.6853]]], grad_fn=<PermuteBackward>)Attention | 711

OK, we have the "keys" and a "query," so let’s pretend we can compute attentionscores (alphas) using them:def calc_alphas(ks, q):N, L, H = ks.size()alphas = torch.ones(N, 1, L).float() * 1/Lreturn alphasalphas = calc_alphas(keys, query)alphasOutputtensor([[[0.5000, 0.5000]]])We had to make sure alphas had the right shape (N, 1, L) so that, when multipliedby the "values" with shape (N, L, H), it will result in a weighted sum of thealignment vectors with shape (N, 1, H). We can use batch matrix multiplication(torch.bmm()) for that:Equation 9.2 - Shapes for batch matrix multiplicationIn other words, we can simply ignore the first dimension, and PyTorch will go overall the elements in the mini-batch for us:# N, 1, L x N, L, H -> 1, L x L, H -> 1, Hcontext_vector = torch.bmm(alphas, values)context_vectorOutputtensor([[[ 0.1968, -0.2809]]], grad_fn=<BmmBackward0>)"Why are you spending so much time on matrix multiplication, of allthings?"Although it seems a fairly basic topic, getting the shapes and dimensions right is of712 | Chapter 9 — Part I: Sequence-to-Sequence

Output

tensor([[[ 0.0832, -0.0356],

[ 0.3105, -0.5263]]], grad_fn=<PermuteBackward>)

keys = hidden_seq # N, L, H

keys

Output

tensor([[[ 0.0832, -0.0356],

[ 0.3105, -0.5263]]], grad_fn=<PermuteBackward>)

The encoder-decoder dynamics stay exactly the same: We still use the encoder’s

final hidden state as the decoder’s initial hidden state (even though we’re sending

the whole sequence to the decoder, it still uses the last hidden state only), and we

still use the last element of the source sequence as input to the first step of the

decoder:

torch.manual_seed(21)

decoder = Decoder(n_features=2, hidden_dim=2)

decoder.init_hidden(hidden_seq)

inputs = source_seq[:, -1:]

out = decoder(inputs)

The first "query" (Q) is the decoder’s hidden state (remember, hidden states are

always sequence-first, so we’re permuting it to batch-first):

query = decoder.hidden.permute(1, 0, 2) # N, 1, H

query

Output

tensor([[[ 0.3913, -0.6853]]], grad_fn=<PermuteBackward>)

Attention | 711

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!