Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
import copyimport numpy as npimport torchimport torch.optim as optimimport torch.nn as nnimport torch.nn.functional as Ffrom torch.utils.data import DataLoader, Dataset, random_split, \TensorDatasetfrom data_generation.square_sequences import generate_sequencesfrom stepbystep.v4 import StepByStepSequence-to-SequenceSequence-to-sequence problems are more complex than those we handled in thelast chapter. There are two sequences now: the source and the target. We use theformer to predict the latter, and they may even have different lengths.A typical example of a sequence-to-sequence problem is translation: A sentencegoes in (a sequence of words in English), and another sentence comes out (asequence of words in French). This problem can be tackled using an encoderdecoderarchitecture, first described in the "Sequence to Sequence Learning withNeural Networks" [138] paper by Sutskever, I., et al.Translating languages is an obviously difficult task, so we’re falling back to a muchsimpler problem to illustrate how the encoder-decoder architecture works.Data GenerationWe’ll keep drawing the same squares as before, but this time we’ll draw the firsttwo corners ourselves (the source sequence) and ask our model to predict thenext two corners (the target sequence). As with every sequence-related problem,the order is important, so it is not enough to get the corners' coordinates right;they should follow the same direction (clockwise or counterclockwise).Sequence-to-Sequence | 687
Figure 9.1 - Drawing first two corners, starting at A and moving toward either D or BSince there are four corners to start from and two directions to follow, there areeffectively eight possible sequences (solid colors indicate the corners in the sourcesequence, semi-transparent colors, the target sequence).Figure 9.2 - Possible sequences of cornersSince the desired output of our model is a sequence of coordinates (x 0 , x 1 ), we’redealing with a regression problem now. Therefore, we’ll be using a typical meansquared error loss to compare the predicted and actual coordinates for the twopoints in the target sequence.Let’s generate 256 random noisy squares:Data Generation1 points, directions = generate_sequences(n=256, seed=13)And then let’s visualize the first five squares:fig = plot_data(points, directions, n_rows=1)688 | Chapter 9 — Part I: Sequence-to-Sequence
- Page 662 and 663: Outputtensor([[53, 53],[75, 75]])Th
- Page 664 and 665: Figure 8.22 - Transforming the hidd
- Page 666 and 667: Equation 8.9 - LSTM—candidate hid
- Page 668 and 669: Now, let’s visualize the internal
- Page 670 and 671: OutputOrderedDict([('weight_ih', te
- Page 672 and 673: def forget_gate(h, x):thf = f_hidde
- Page 674 and 675: Outputtensor([[-5.4936e-02, -8.3816
- Page 676 and 677: 1 First change: from RNN to LSTM2 S
- Page 678 and 679: Like the GRU, the LSTM presents fou
- Page 680 and 681: Output-----------------------------
- Page 682 and 683: Before moving on to packed sequence
- Page 684 and 685: column-wise fashion, from top to bo
- Page 686 and 687: does match the last output.• No,
- Page 688 and 689: So, to actually get the last output
- Page 690 and 691: Data Preparation1 class CustomDatas
- Page 692 and 693: OutputPackedSequence(data=tensor([[
- Page 694 and 695: Model Configuration & TrainingWe ca
- Page 696 and 697: size = 5weight = torch.ones(size) *
- Page 698 and 699: torch.manual_seed(17)conv_seq = nn.
- Page 700 and 701: Figure 8.32 - Applying dilated filt
- Page 702 and 703: Model Configuration1 torch.manual_s
- Page 704 and 705: We can actually find an expression
- Page 706 and 707: Data Preparation1 def pack_collate(
- Page 708 and 709: and variable-length sequences.Model
- Page 710 and 711: • generating variable-length sequ
- Page 714 and 715: Figure 9.3 - Sequence datasetThe co
- Page 716 and 717: coordinates of a "perfect" square a
- Page 718 and 719: Let’s pretend for a moment that t
- Page 720 and 721: to initialize the hidden state and
- Page 722 and 723: predictions in previous steps have
- Page 724 and 725: the second set of predicted coordin
- Page 726 and 727: Let’s create an instance of the m
- Page 728 and 729: Model Configuration & TrainingThe m
- Page 730 and 731: Sure, we can!AttentionHere is a (no
- Page 732 and 733: based on "the" and "zone," I’ve j
- Page 734 and 735: Figure 9.12 - Matching a query to t
- Page 736 and 737: Outputtensor([[[ 0.0832, -0.0356],[
- Page 738 and 739: utmost importance for the correct i
- Page 740 and 741: Its formula is:Equation 9.3 - Cosin
- Page 742 and 743: second hidden state contributes to
- Page 744 and 745: Outputtensor([[[ 0.5475, 0.0875, -1
- Page 746 and 747: alphas = F.softmax(scaled_products,
- Page 748 and 749: Outputtensor([[[ 0.2138, -0.3175]]]
- Page 750 and 751: Attention Mechanism1 class Attentio
- Page 752 and 753: "Why would I want to force it to do
- Page 754 and 755: 1 Sets attention module and adjusts
- Page 756 and 757: encdec = EncoderDecoder(encoder, de
- Page 758 and 759: fig = sbs_seq_attn.plot_losses()Fig
- Page 760 and 761: Figure 9.20 - Attention scoresSee?
import copy
import numpy as np
import torch
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset, random_split, \
TensorDataset
from data_generation.square_sequences import generate_sequences
from stepbystep.v4 import StepByStep
Sequence-to-Sequence
Sequence-to-sequence problems are more complex than those we handled in the
last chapter. There are two sequences now: the source and the target. We use the
former to predict the latter, and they may even have different lengths.
A typical example of a sequence-to-sequence problem is translation: A sentence
goes in (a sequence of words in English), and another sentence comes out (a
sequence of words in French). This problem can be tackled using an encoderdecoder
architecture, first described in the "Sequence to Sequence Learning with
Neural Networks" [138] paper by Sutskever, I., et al.
Translating languages is an obviously difficult task, so we’re falling back to a much
simpler problem to illustrate how the encoder-decoder architecture works.
Data Generation
We’ll keep drawing the same squares as before, but this time we’ll draw the first
two corners ourselves (the source sequence) and ask our model to predict the
next two corners (the target sequence). As with every sequence-related problem,
the order is important, so it is not enough to get the corners' coordinates right;
they should follow the same direction (clockwise or counterclockwise).
Sequence-to-Sequence | 687