Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
Let’s take a look at the RNN’s arguments:• input_size: It is the number of features in each data point of the sequence.• hidden_size: It is the number of hidden dimensions you want to use.• bias: Just like any other layer, it includes the bias in the equations.• nonlinearity: By default, it uses the hyperbolic tangent, but you can change itto ReLU if you want.The four arguments above are exactly the same as those in the RNN cell. So, we caneasily create a full-fledged RNN like that:n_features = 2hidden_dim = 2torch.manual_seed(19)rnn = nn.RNN(input_size=n_features, hidden_size=hidden_dim)rnn.state_dict()OutputOrderedDict([('weight_ih_l0', tensor([[ 0.6627, -0.4245],[ 0.5373, 0.2294]])),('weight_hh_l0', tensor([[-0.4015, -0.5385],[-0.1956, -0.6835]])),('bias_ih_l0', tensor([0.4954, 0.6533])),('bias_hh_l0', tensor([-0.3565, -0.2904]))])Since the seed is exactly the same, you’ll notice that the weights and biases haveexactly the same values as our former RNN cell. The only difference is in theparameters' names: Now they all have an _l0 suffix to indicate they belong to thefirst "layer.""What do you mean by layer? Isn’t the RNN itself a layer?"Yes, the RNN itself can be a layer in our model. But it may have its own internallayers! You can configure those with the following four extra arguments:• num_layers: The RNN we’ve been using so far has one layer (the default value),Recurrent Neural Networks (RNNs) | 603
but if you use more than one, you’ll be creating a stacked RNN, which we’ll seein its own section.• bidirectional: So far, our RNNs have been handling sequences in the left-torightdirection (the default), but if you set this argument to True, you’ll becreating a bidirectional RNN, which we’ll also see in its own section.• dropout: This introduces an RNN’s own dropout layer between its internallayers, so it only makes sense if you’re using a stacked RNN.And I saved the best (actually, the worst) for last:• batch_first: The documentation says, "if True, then the input and output tensorsare provided as (batch, seq, feature)," which makes you think that you only needto set it to True and it will turn everything into your nice and familiar tensorswhere different batches are concatenated together as its first dimension—andyou’d be sorely mistaken."Why? What’s wrong with that?"The problem is, you need to read the documentation very literally: Only the inputand output tensors are going to be batch first; the hidden state will never be batchfirst. This behavior may bring complications you need to be aware of.ShapesBefore going through an example, let’s take a look at the expected inputs andoutputs of our RNN:• Inputs:◦ The input tensor containing the sequence you want to run through theRNN:▪ The default shape is sequence-first; that is, (sequence length, batchsize, number of features), which we’re abbreviating to (L, N, F).▪ But if you choose batch_first, it will flip the first two dimensions, andthen it will expect an (N, L, F) shape, which is what you’re likely gettingfrom a data loader.▪ By the way, the input can also be a packed sequence—we’ll get back tothat in a later section.604 | Chapter 8: Sequences
- Page 578 and 579: It should be pretty clear, except f
- Page 580 and 581: Data Preparation1 # ImageNet statis
- Page 582 and 583: Data Preparation — Preprocessing1
- Page 584 and 585: • freezing the layers of the mode
- Page 586 and 587: Extra ChapterVanishing and Explodin
- Page 588 and 589: discussing it, let me illustrate it
- Page 590 and 591: Model Configuration (2)1 loss_fn =
- Page 592 and 593: weights. If done properly, the init
- Page 594 and 595: just did), or, if you are training
- Page 596 and 597: Figure E.3 - The effect of batch no
- Page 598 and 599: Model Configuration1 torch.manual_s
- Page 600 and 601: torch.manual_seed(42)parm = nn.Para
- Page 602 and 603: (and only if) the norm exceeds the
- Page 604 and 605: if callable(self.clipping): 1self.c
- Page 606 and 607: Moreover, let’s use a ten times h
- Page 608 and 609: Clipping with HooksFirst, we reset
- Page 610 and 611: • visualizing the difference betw
- Page 612 and 613: Chapter 8SequencesSpoilersIn this c
- Page 614 and 615: Before shuffling, the pixels were o
- Page 616 and 617: And then let’s visualize the firs
- Page 618 and 619: sequence so far, and a data point f
- Page 620 and 621: Considering this, the not "unrolled
- Page 622 and 623: linear_input = nn.Linear(n_features
- Page 624 and 625: Outputtensor([[0.3924, 0.8146]], gr
- Page 626 and 627: Now we’re talking! The last hidde
- Page 630 and 631: ◦ The initial hidden state, which
- Page 632 and 633: batch_first argument to True so we
- Page 634 and 635: OutputOrderedDict([('weight_ih_l0',
- Page 636 and 637: out, hidden = rnn_stacked(x)out, hi
- Page 638 and 639: _l0_reverse).Once again, let’s cr
- Page 640 and 641: For bidirectional RNNs, the last el
- Page 642 and 643: Model Configuration1 class SquareMo
- Page 644 and 645: StepByStep.loader_apply(test_loader
- Page 646 and 647: Figure 8.14 - Final hidden states f
- Page 648 and 649: Figure 8.16 - Transforming the hidd
- Page 650 and 651: Since the RNN cell has both of them
- Page 652 and 653: Every gate worthy of its name will
- Page 654 and 655: • For r=0 and z=0, the cell becom
- Page 656 and 657: In code, we can use split() to get
- Page 658 and 659: Let’s pause for a moment here. Fi
- Page 660 and 661: Square Model II — The QuickeningT
- Page 662 and 663: Outputtensor([[53, 53],[75, 75]])Th
- Page 664 and 665: Figure 8.22 - Transforming the hidd
- Page 666 and 667: Equation 8.9 - LSTM—candidate hid
- Page 668 and 669: Now, let’s visualize the internal
- Page 670 and 671: OutputOrderedDict([('weight_ih', te
- Page 672 and 673: def forget_gate(h, x):thf = f_hidde
- Page 674 and 675: Outputtensor([[-5.4936e-02, -8.3816
- Page 676 and 677: 1 First change: from RNN to LSTM2 S
but if you use more than one, you’ll be creating a stacked RNN, which we’ll see
in its own section.
• bidirectional: So far, our RNNs have been handling sequences in the left-toright
direction (the default), but if you set this argument to True, you’ll be
creating a bidirectional RNN, which we’ll also see in its own section.
• dropout: This introduces an RNN’s own dropout layer between its internal
layers, so it only makes sense if you’re using a stacked RNN.
And I saved the best (actually, the worst) for last:
• batch_first: The documentation says, "if True, then the input and output tensors
are provided as (batch, seq, feature)," which makes you think that you only need
to set it to True and it will turn everything into your nice and familiar tensors
where different batches are concatenated together as its first dimension—and
you’d be sorely mistaken.
"Why? What’s wrong with that?"
The problem is, you need to read the documentation very literally: Only the input
and output tensors are going to be batch first; the hidden state will never be batch
first. This behavior may bring complications you need to be aware of.
Shapes
Before going through an example, let’s take a look at the expected inputs and
outputs of our RNN:
• Inputs:
◦ The input tensor containing the sequence you want to run through the
RNN:
▪ The default shape is sequence-first; that is, (sequence length, batch
size, number of features), which we’re abbreviating to (L, N, F).
▪ But if you choose batch_first, it will flip the first two dimensions, and
then it will expect an (N, L, F) shape, which is what you’re likely getting
from a data loader.
▪ By the way, the input can also be a packed sequence—we’ll get back to
that in a later section.
604 | Chapter 8: Sequences