Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
Model Configuration & TrainingWe can use our data loader that outputs packed sequences (train_var_loader) tofeed our SquareModelPacked model and train it in the usual way:Model Configuration1 torch.manual_seed(21)2 model = SquareModelPacked(n_features=2, hidden_dim=2, n_outputs=1)3 loss = nn.BCEWithLogitsLoss()4 optimizer = optim.Adam(model.parameters(), lr=0.01)Model Training1 sbs_packed = StepByStep(model, loss, optimizer)2 sbs_packed.set_loaders(train_var_loader)3 sbs_packed.train(100)fig = sbs_packed.plot_losses()Figure 8.29 - Losses—SquareModelPackedStepByStep.loader_apply(train_var_loader, sbs_packed.correct)Variable-Length Sequences | 669
Outputtensor([[66, 66],[62, 62]])1D ConvolutionsIn Chapter 5, we learned about convolutions, their kernels and filters, and how toperform a convolution by repeatedly applying a filter to a moving region over theimage. Those were 2D convolutions, though, meaning that the filter was moving intwo dimensions, both along the width (left to right), and the height (top to bottom)of the image.Guess what 1D convolutions do? They move the filter in one dimension, from leftto right. The filter works like a moving window, performing a weighted sum of thevalues in the region it has moved over. Let’s use a sequence of temperature valuesover thirteen days as an example:temperatures = np.array([5, 11, 15, 6, 5, 3, 3, 0, 0, 3, 4, 2, 1])Figure 8.30 - Moving window over series of temperaturesThen, let’s use a window (filter) of size five, like in the figure above. In its first step,the window is over days one to five. In the next step, since it can only move to theright, it will be over days two to six. By the way, the size of our movement to theright is, once again, known as the stride.Now, let’s assign the same value (0.2) for every weight in our filter and usePyTorch’s F.conv1d() to convolve the filter with our sequence (don’t mind theshape just yet; we’ll get back to it in the next section):670 | Chapter 8: Sequences
- Page 644 and 645: StepByStep.loader_apply(test_loader
- Page 646 and 647: Figure 8.14 - Final hidden states f
- Page 648 and 649: Figure 8.16 - Transforming the hidd
- Page 650 and 651: Since the RNN cell has both of them
- Page 652 and 653: Every gate worthy of its name will
- Page 654 and 655: • For r=0 and z=0, the cell becom
- Page 656 and 657: In code, we can use split() to get
- Page 658 and 659: Let’s pause for a moment here. Fi
- Page 660 and 661: Square Model II — The QuickeningT
- Page 662 and 663: Outputtensor([[53, 53],[75, 75]])Th
- Page 664 and 665: Figure 8.22 - Transforming the hidd
- Page 666 and 667: Equation 8.9 - LSTM—candidate hid
- Page 668 and 669: Now, let’s visualize the internal
- Page 670 and 671: OutputOrderedDict([('weight_ih', te
- Page 672 and 673: def forget_gate(h, x):thf = f_hidde
- Page 674 and 675: Outputtensor([[-5.4936e-02, -8.3816
- Page 676 and 677: 1 First change: from RNN to LSTM2 S
- Page 678 and 679: Like the GRU, the LSTM presents fou
- Page 680 and 681: Output-----------------------------
- Page 682 and 683: Before moving on to packed sequence
- Page 684 and 685: column-wise fashion, from top to bo
- Page 686 and 687: does match the last output.• No,
- Page 688 and 689: So, to actually get the last output
- Page 690 and 691: Data Preparation1 class CustomDatas
- Page 692 and 693: OutputPackedSequence(data=tensor([[
- Page 696 and 697: size = 5weight = torch.ones(size) *
- Page 698 and 699: torch.manual_seed(17)conv_seq = nn.
- Page 700 and 701: Figure 8.32 - Applying dilated filt
- Page 702 and 703: Model Configuration1 torch.manual_s
- Page 704 and 705: We can actually find an expression
- Page 706 and 707: Data Preparation1 def pack_collate(
- Page 708 and 709: and variable-length sequences.Model
- Page 710 and 711: • generating variable-length sequ
- Page 712 and 713: import copyimport numpy as npimport
- Page 714 and 715: Figure 9.3 - Sequence datasetThe co
- Page 716 and 717: coordinates of a "perfect" square a
- Page 718 and 719: Let’s pretend for a moment that t
- Page 720 and 721: to initialize the hidden state and
- Page 722 and 723: predictions in previous steps have
- Page 724 and 725: the second set of predicted coordin
- Page 726 and 727: Let’s create an instance of the m
- Page 728 and 729: Model Configuration & TrainingThe m
- Page 730 and 731: Sure, we can!AttentionHere is a (no
- Page 732 and 733: based on "the" and "zone," I’ve j
- Page 734 and 735: Figure 9.12 - Matching a query to t
- Page 736 and 737: Outputtensor([[[ 0.0832, -0.0356],[
- Page 738 and 739: utmost importance for the correct i
- Page 740 and 741: Its formula is:Equation 9.3 - Cosin
- Page 742 and 743: second hidden state contributes to
Output
tensor([[66, 66],
[62, 62]])
1D Convolutions
In Chapter 5, we learned about convolutions, their kernels and filters, and how to
perform a convolution by repeatedly applying a filter to a moving region over the
image. Those were 2D convolutions, though, meaning that the filter was moving in
two dimensions, both along the width (left to right), and the height (top to bottom)
of the image.
Guess what 1D convolutions do? They move the filter in one dimension, from left
to right. The filter works like a moving window, performing a weighted sum of the
values in the region it has moved over. Let’s use a sequence of temperature values
over thirteen days as an example:
temperatures = np.array([5, 11, 15, 6, 5, 3, 3, 0, 0, 3, 4, 2, 1])
Figure 8.30 - Moving window over series of temperatures
Then, let’s use a window (filter) of size five, like in the figure above. In its first step,
the window is over days one to five. In the next step, since it can only move to the
right, it will be over days two to six. By the way, the size of our movement to the
right is, once again, known as the stride.
Now, let’s assign the same value (0.2) for every weight in our filter and use
PyTorch’s F.conv1d() to convolve the filter with our sequence (don’t mind the
shape just yet; we’ll get back to it in the next section):
670 | Chapter 8: Sequences