Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
Considering this, the not "unrolled" ("rolled" doesn’t sound right!) representation isa better characterization of the internal structure of an RNN.Let’s dive deeper into the internals of an RNN cell and look at it at the neuron level:Figure 8.7 - RNN cell at neuron levelSince one can choose the number of hidden dimensions, I chose two dimensions,simply because I want to be able to easily visualize the results. Hence, two blueneurons are transforming the hidden state.The number of red neurons transforming the data point willnecessarily be the same as the chosen number of hiddendimensions since both transformed outputs need to be addedtogether. But this doesn’t mean the data points must have thesame number of dimensions.Coincidentally, our data points have two coordinates, but even ifwe had 25 dimensions, these 25 features would still be mappedinto two dimensions by the two red neurons.The only operation left is the activation function, most likely the hyperbolictangent, which will produce the updated hidden state."Why hyperbolic tangent? Isn’t ReLU a better activation function?"The hyperbolic tangent has a "competitive advantage" here since it maps the featurespace to clearly defined boundaries: the interval (-1, 1). This guarantees that, atRecurrent Neural Networks (RNNs) | 595
every step of the sequence, the hidden state is always within these boundaries.Given that we have only one linear layer with which to transform the hiddenstate, regardless of which step of the sequence it is being used in, it is definitelyconvenient to have its values within a predictable range. We’ll get back to this inthe "Journey of a Hidden State" section.Now, let’s see how an RNN cell works in code. We’ll create one using PyTorch’sown nn.RNNCell and disassemble it into its components to manually reproduce allthe steps involved in updating the hidden state. To create a cell, we need to tell itthe input_size (number of features in our data points) and the hidden_size (thesize of the vector representing the hidden state). It is also possible to tell it not toadd biases, and to use a ReLU instead of TanH, but we’re sticking to the defaults.n_features = 2hidden_dim = 2torch.manual_seed(19)rnn_cell = nn.RNNCell(input_size=n_features, hidden_size=hidden_dim)rnn_state = rnn_cell.state_dict()rnn_stateOutputOrderedDict([('weight_ih', tensor([[ 0.6627, -0.4245],[ 0.5373, 0.2294]])),('weight_hh', tensor([[-0.4015, -0.5385],[-0.1956, -0.6835]])),('bias_ih', tensor([0.4954, 0.6533])),('bias_hh', tensor([-0.3565, -0.2904]))])The weight_ih and bias_ih (i stands for inputs—the data) tensors correspond tothe red neurons in Figure 8.7. The weight_hh and bias_hh (h stands for hidden)tensors, to the blue neurons. We can use these weights to create two linear layers:596 | Chapter 8: Sequences
- Page 570 and 571: batch_normalizer = nn.BatchNorm2d(n
- Page 572 and 573: torch.manual_seed(23)dummy_points =
- Page 574 and 575: np.concatenate([dummy_points[:5].nu
- Page 576 and 577: Another advantage of these shortcut
- Page 578 and 579: It should be pretty clear, except f
- Page 580 and 581: Data Preparation1 # ImageNet statis
- Page 582 and 583: Data Preparation — Preprocessing1
- Page 584 and 585: • freezing the layers of the mode
- Page 586 and 587: Extra ChapterVanishing and Explodin
- Page 588 and 589: discussing it, let me illustrate it
- Page 590 and 591: Model Configuration (2)1 loss_fn =
- Page 592 and 593: weights. If done properly, the init
- Page 594 and 595: just did), or, if you are training
- Page 596 and 597: Figure E.3 - The effect of batch no
- Page 598 and 599: Model Configuration1 torch.manual_s
- Page 600 and 601: torch.manual_seed(42)parm = nn.Para
- Page 602 and 603: (and only if) the norm exceeds the
- Page 604 and 605: if callable(self.clipping): 1self.c
- Page 606 and 607: Moreover, let’s use a ten times h
- Page 608 and 609: Clipping with HooksFirst, we reset
- Page 610 and 611: • visualizing the difference betw
- Page 612 and 613: Chapter 8SequencesSpoilersIn this c
- Page 614 and 615: Before shuffling, the pixels were o
- Page 616 and 617: And then let’s visualize the firs
- Page 618 and 619: sequence so far, and a data point f
- Page 622 and 623: linear_input = nn.Linear(n_features
- Page 624 and 625: Outputtensor([[0.3924, 0.8146]], gr
- Page 626 and 627: Now we’re talking! The last hidde
- Page 628 and 629: Let’s take a look at the RNN’s
- Page 630 and 631: ◦ The initial hidden state, which
- Page 632 and 633: batch_first argument to True so we
- Page 634 and 635: OutputOrderedDict([('weight_ih_l0',
- Page 636 and 637: out, hidden = rnn_stacked(x)out, hi
- Page 638 and 639: _l0_reverse).Once again, let’s cr
- Page 640 and 641: For bidirectional RNNs, the last el
- Page 642 and 643: Model Configuration1 class SquareMo
- Page 644 and 645: StepByStep.loader_apply(test_loader
- Page 646 and 647: Figure 8.14 - Final hidden states f
- Page 648 and 649: Figure 8.16 - Transforming the hidd
- Page 650 and 651: Since the RNN cell has both of them
- Page 652 and 653: Every gate worthy of its name will
- Page 654 and 655: • For r=0 and z=0, the cell becom
- Page 656 and 657: In code, we can use split() to get
- Page 658 and 659: Let’s pause for a moment here. Fi
- Page 660 and 661: Square Model II — The QuickeningT
- Page 662 and 663: Outputtensor([[53, 53],[75, 75]])Th
- Page 664 and 665: Figure 8.22 - Transforming the hidd
- Page 666 and 667: Equation 8.9 - LSTM—candidate hid
- Page 668 and 669: Now, let’s visualize the internal
every step of the sequence, the hidden state is always within these boundaries.
Given that we have only one linear layer with which to transform the hidden
state, regardless of which step of the sequence it is being used in, it is definitely
convenient to have its values within a predictable range. We’ll get back to this in
the "Journey of a Hidden State" section.
Now, let’s see how an RNN cell works in code. We’ll create one using PyTorch’s
own nn.RNNCell and disassemble it into its components to manually reproduce all
the steps involved in updating the hidden state. To create a cell, we need to tell it
the input_size (number of features in our data points) and the hidden_size (the
size of the vector representing the hidden state). It is also possible to tell it not to
add biases, and to use a ReLU instead of TanH, but we’re sticking to the defaults.
n_features = 2
hidden_dim = 2
torch.manual_seed(19)
rnn_cell = nn.RNNCell(input_size=n_features, hidden_size=hidden_dim)
rnn_state = rnn_cell.state_dict()
rnn_state
Output
OrderedDict([('weight_ih', tensor([[ 0.6627, -0.4245],
[ 0.5373, 0.2294]])),
('weight_hh', tensor([[-0.4015, -0.5385],
[-0.1956, -0.6835]])),
('bias_ih', tensor([0.4954, 0.6533])),
('bias_hh', tensor([-0.3565, -0.2904]))])
The weight_ih and bias_ih (i stands for inputs—the data) tensors correspond to
the red neurons in Figure 8.7. The weight_hh and bias_hh (h stands for hidden)
tensors, to the blue neurons. We can use these weights to create two linear layers:
596 | Chapter 8: Sequences