Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
Like the GRU, the LSTM presents four distinct groups of sequences correspondingto the different starting corners. Moreover, it is able to classify most sequencescorrectly after seeing only three points.Variable-Length SequencesSo far, we’ve been working with full, regular sequences of four data points each,and that’s nice. But what do you do if you get variable-length sequences, like theones below:x0 = points[0] # 4 data pointsx1 = points[1][2:] # 2 data pointsx2 = points[2][1:] # 3 data pointsx0.shape, x1.shape, x2.shapeOutput((4, 2), (2, 2), (3, 2))The answer: You pad them!"Could you please remind me again what padding is?"Sure! Padding means stuffing with zeros. We’ve seen padding in Chapter 5 already:We used it to stuff an image with zeros around it in order to preserve its originalsize after being convolved.Variable-Length Sequences | 653
Padding in Computer VisionPadding an image simply means adding zeros around it. An image is worth athousand words in this case.By adding columns and rows of zeros around it, we expand the input imagesuch that the gray region starts centered in the actual top-left corner of theinput image. This simple trick can be used to preserve the original size of theimage.PaddingNow, we’ll stuff sequences with zeros so they all have matching sizes. Simpleenough, right?"OK, it is simple, but why are we doing it?"We need to pad the sequences because we cannot create a tensor out of a list ofelements with different sizes:all_seqs = [s0, s1, s2]torch.as_tensor(all_seqs)654 | Chapter 8: Sequences
- Page 628 and 629: Let’s take a look at the RNN’s
- Page 630 and 631: ◦ The initial hidden state, which
- Page 632 and 633: batch_first argument to True so we
- Page 634 and 635: OutputOrderedDict([('weight_ih_l0',
- Page 636 and 637: out, hidden = rnn_stacked(x)out, hi
- Page 638 and 639: _l0_reverse).Once again, let’s cr
- Page 640 and 641: For bidirectional RNNs, the last el
- Page 642 and 643: Model Configuration1 class SquareMo
- Page 644 and 645: StepByStep.loader_apply(test_loader
- Page 646 and 647: Figure 8.14 - Final hidden states f
- Page 648 and 649: Figure 8.16 - Transforming the hidd
- Page 650 and 651: Since the RNN cell has both of them
- Page 652 and 653: Every gate worthy of its name will
- Page 654 and 655: • For r=0 and z=0, the cell becom
- Page 656 and 657: In code, we can use split() to get
- Page 658 and 659: Let’s pause for a moment here. Fi
- Page 660 and 661: Square Model II — The QuickeningT
- Page 662 and 663: Outputtensor([[53, 53],[75, 75]])Th
- Page 664 and 665: Figure 8.22 - Transforming the hidd
- Page 666 and 667: Equation 8.9 - LSTM—candidate hid
- Page 668 and 669: Now, let’s visualize the internal
- Page 670 and 671: OutputOrderedDict([('weight_ih', te
- Page 672 and 673: def forget_gate(h, x):thf = f_hidde
- Page 674 and 675: Outputtensor([[-5.4936e-02, -8.3816
- Page 676 and 677: 1 First change: from RNN to LSTM2 S
- Page 680 and 681: Output-----------------------------
- Page 682 and 683: Before moving on to packed sequence
- Page 684 and 685: column-wise fashion, from top to bo
- Page 686 and 687: does match the last output.• No,
- Page 688 and 689: So, to actually get the last output
- Page 690 and 691: Data Preparation1 class CustomDatas
- Page 692 and 693: OutputPackedSequence(data=tensor([[
- Page 694 and 695: Model Configuration & TrainingWe ca
- Page 696 and 697: size = 5weight = torch.ones(size) *
- Page 698 and 699: torch.manual_seed(17)conv_seq = nn.
- Page 700 and 701: Figure 8.32 - Applying dilated filt
- Page 702 and 703: Model Configuration1 torch.manual_s
- Page 704 and 705: We can actually find an expression
- Page 706 and 707: Data Preparation1 def pack_collate(
- Page 708 and 709: and variable-length sequences.Model
- Page 710 and 711: • generating variable-length sequ
- Page 712 and 713: import copyimport numpy as npimport
- Page 714 and 715: Figure 9.3 - Sequence datasetThe co
- Page 716 and 717: coordinates of a "perfect" square a
- Page 718 and 719: Let’s pretend for a moment that t
- Page 720 and 721: to initialize the hidden state and
- Page 722 and 723: predictions in previous steps have
- Page 724 and 725: the second set of predicted coordin
- Page 726 and 727: Let’s create an instance of the m
Like the GRU, the LSTM presents four distinct groups of sequences corresponding
to the different starting corners. Moreover, it is able to classify most sequences
correctly after seeing only three points.
Variable-Length Sequences
So far, we’ve been working with full, regular sequences of four data points each,
and that’s nice. But what do you do if you get variable-length sequences, like the
ones below:
x0 = points[0] # 4 data points
x1 = points[1][2:] # 2 data points
x2 = points[2][1:] # 3 data points
x0.shape, x1.shape, x2.shape
Output
((4, 2), (2, 2), (3, 2))
The answer: You pad them!
"Could you please remind me again what padding is?"
Sure! Padding means stuffing with zeros. We’ve seen padding in Chapter 5 already:
We used it to stuff an image with zeros around it in order to preserve its original
size after being convolved.
Variable-Length Sequences | 653