Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
Figure 9.3 - Sequence datasetThe corners show the order in which they were drawn. In the first square, thedrawing started at the top-right corner (corresponding to the blue C corner) andfollowed a clockwise direction (corresponding to the CDAB sequence). The sourcesequence for that square would include corners C and D (1 and 2), while the targetsequence would include corners A and B (3 and 4), in that order.In order to output a sequence we need a more complex architecture; we need an…Encoder-Decoder ArchitectureThe encoder-decoder is a combination of two models: the encoder and thedecoder.EncoderThe encoder’s goal is to generate a representation of the sourcesequence; that is, to encode it."Wait, we’ve done that already, right?"Absolutely! That’s what the recurrent layers did: They generated a final hiddenstate that was a representation of the input sequence. Now you know why Iinsisted so much on this idea and repeated it over and over again in Chapter 8 :-)Encoder-Decoder Architecture | 689
The figure below should look familiar: It is a typical recurrent neural network thatwe’re using to encode the source sequence.Figure 9.4 - EncoderThe encoder model is a slim version of our models from Chapter 8: It simplyreturns a sequence of hidden states.Encoder1 class Encoder(nn.Module):2 def __init__(self, n_features, hidden_dim):3 super().__init__()4 self.hidden_dim = hidden_dim5 self.n_features = n_features6 self.hidden = None7 self.basic_rnn = nn.GRU(self.n_features,8 self.hidden_dim,9 batch_first=True)1011 def forward(self, X):12 rnn_out, self.hidden = self.basic_rnn(X)1314 return rnn_out # N, L, F"Don’t we need only the final hidden state?"That’s correct. We’ll be using the final hidden state only … for now.In the "Attention" section, we’ll be using all hidden states, andthat’s why we’re implementing the encoder like this.Let’s go over a simple example of encoding: We start with a sequence of690 | Chapter 9 — Part I: Sequence-to-Sequence
- Page 664 and 665: Figure 8.22 - Transforming the hidd
- Page 666 and 667: Equation 8.9 - LSTM—candidate hid
- Page 668 and 669: Now, let’s visualize the internal
- Page 670 and 671: OutputOrderedDict([('weight_ih', te
- Page 672 and 673: def forget_gate(h, x):thf = f_hidde
- Page 674 and 675: Outputtensor([[-5.4936e-02, -8.3816
- Page 676 and 677: 1 First change: from RNN to LSTM2 S
- Page 678 and 679: Like the GRU, the LSTM presents fou
- Page 680 and 681: Output-----------------------------
- Page 682 and 683: Before moving on to packed sequence
- Page 684 and 685: column-wise fashion, from top to bo
- Page 686 and 687: does match the last output.• No,
- Page 688 and 689: So, to actually get the last output
- Page 690 and 691: Data Preparation1 class CustomDatas
- Page 692 and 693: OutputPackedSequence(data=tensor([[
- Page 694 and 695: Model Configuration & TrainingWe ca
- Page 696 and 697: size = 5weight = torch.ones(size) *
- Page 698 and 699: torch.manual_seed(17)conv_seq = nn.
- Page 700 and 701: Figure 8.32 - Applying dilated filt
- Page 702 and 703: Model Configuration1 torch.manual_s
- Page 704 and 705: We can actually find an expression
- Page 706 and 707: Data Preparation1 def pack_collate(
- Page 708 and 709: and variable-length sequences.Model
- Page 710 and 711: • generating variable-length sequ
- Page 712 and 713: import copyimport numpy as npimport
- Page 716 and 717: coordinates of a "perfect" square a
- Page 718 and 719: Let’s pretend for a moment that t
- Page 720 and 721: to initialize the hidden state and
- Page 722 and 723: predictions in previous steps have
- Page 724 and 725: the second set of predicted coordin
- Page 726 and 727: Let’s create an instance of the m
- Page 728 and 729: Model Configuration & TrainingThe m
- Page 730 and 731: Sure, we can!AttentionHere is a (no
- Page 732 and 733: based on "the" and "zone," I’ve j
- Page 734 and 735: Figure 9.12 - Matching a query to t
- Page 736 and 737: Outputtensor([[[ 0.0832, -0.0356],[
- Page 738 and 739: utmost importance for the correct i
- Page 740 and 741: Its formula is:Equation 9.3 - Cosin
- Page 742 and 743: second hidden state contributes to
- Page 744 and 745: Outputtensor([[[ 0.5475, 0.0875, -1
- Page 746 and 747: alphas = F.softmax(scaled_products,
- Page 748 and 749: Outputtensor([[[ 0.2138, -0.3175]]]
- Page 750 and 751: Attention Mechanism1 class Attentio
- Page 752 and 753: "Why would I want to force it to do
- Page 754 and 755: 1 Sets attention module and adjusts
- Page 756 and 757: encdec = EncoderDecoder(encoder, de
- Page 758 and 759: fig = sbs_seq_attn.plot_losses()Fig
- Page 760 and 761: Figure 9.20 - Attention scoresSee?
- Page 762 and 763: Wide vs Narrow AttentionThis mechan
Figure 9.3 - Sequence dataset
The corners show the order in which they were drawn. In the first square, the
drawing started at the top-right corner (corresponding to the blue C corner) and
followed a clockwise direction (corresponding to the CDAB sequence). The source
sequence for that square would include corners C and D (1 and 2), while the target
sequence would include corners A and B (3 and 4), in that order.
In order to output a sequence we need a more complex architecture; we need an…
Encoder-Decoder Architecture
The encoder-decoder is a combination of two models: the encoder and the
decoder.
Encoder
The encoder’s goal is to generate a representation of the source
sequence; that is, to encode it.
"Wait, we’ve done that already, right?"
Absolutely! That’s what the recurrent layers did: They generated a final hidden
state that was a representation of the input sequence. Now you know why I
insisted so much on this idea and repeated it over and over again in Chapter 8 :-)
Encoder-Decoder Architecture | 689