Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
Outputtensor([[0.3924, 0.8146]], grad_fn=<TanhBackward>)That’s the updated hidden state!Now, let’s take a quick sanity check, feeding the same input to the original RNNcell:rnn_cell(X[0:1])Outputtensor([[0.3924, 0.8146]], grad_fn=<TanhBackward>)Great, the values match.We can also visualize this sequence of operations, assuming that every hiddenspace "lives" in a feature space delimited by the boundaries given by the hyperbolictangent. So, the initial hidden state (0, 0) sits at the center of this feature space,depicted in the left-most plot in the figure below:Figure 8.8 - Evolution of the hidden stateThe transformed hidden state (the output of linear_hidden()) is depicted in thesecond plot: It went through an affine transformation. The point in the centercorresponds to t h . In the third plot, we can see the effect of adding t x (the output oflinear_input()): The whole feature space was translated to the right and up. Andthen, in the right most plot, the hyperbolic tangent works its magic and brings thewhole feature space back to the (-1, 1) range. That was the first step in the journeyof a hidden state. We’ll do it once again, using the full sequence, after training amodel.Recurrent Neural Networks (RNNs) | 599
I guess it is time to feed the full sequence to the RNN cell, right? You may betempted to do it like this:# WRONG!rnn_cell(X)Outputtensor([[ 0.3924, 0.8146],[ 0.7864, 0.5266],[-0.0047, -0.2897],[-0.6817, 0.1109]], grad_fn=<TanhBackward>)This is wrong! Remember, the RNN cell has two inputs: one hidden state and onedata point."Where is the hidden state then?"That’s exactly the problem! If not provided, it defaults to the zeros correspondingto the initial hidden state. So, the call above is not processing four steps of asequence, but rather processing the first step of what it is assuming to be foursequences.To effectively use the RNN cell in a sequence, we need to loop over the data pointsand provide the updated hidden state at each step:hidden = torch.zeros(1, hidden_dim)for i in range(X.shape[0]):out = rnn_cell(X[i:i+1], hidden)print(out)hidden = outOutputtensor([[0.3924, 0.8146]], grad_fn=<TanhBackward>)tensor([[ 0.4347, -0.0481]], grad_fn=<TanhBackward>)tensor([[-0.1521, -0.3367]], grad_fn=<TanhBackward>)tensor([[-0.5297, 0.3551]], grad_fn=<TanhBackward>)600 | Chapter 8: Sequences
- Page 574 and 575: np.concatenate([dummy_points[:5].nu
- Page 576 and 577: Another advantage of these shortcut
- Page 578 and 579: It should be pretty clear, except f
- Page 580 and 581: Data Preparation1 # ImageNet statis
- Page 582 and 583: Data Preparation — Preprocessing1
- Page 584 and 585: • freezing the layers of the mode
- Page 586 and 587: Extra ChapterVanishing and Explodin
- Page 588 and 589: discussing it, let me illustrate it
- Page 590 and 591: Model Configuration (2)1 loss_fn =
- Page 592 and 593: weights. If done properly, the init
- Page 594 and 595: just did), or, if you are training
- Page 596 and 597: Figure E.3 - The effect of batch no
- Page 598 and 599: Model Configuration1 torch.manual_s
- Page 600 and 601: torch.manual_seed(42)parm = nn.Para
- Page 602 and 603: (and only if) the norm exceeds the
- Page 604 and 605: if callable(self.clipping): 1self.c
- Page 606 and 607: Moreover, let’s use a ten times h
- Page 608 and 609: Clipping with HooksFirst, we reset
- Page 610 and 611: • visualizing the difference betw
- Page 612 and 613: Chapter 8SequencesSpoilersIn this c
- Page 614 and 615: Before shuffling, the pixels were o
- Page 616 and 617: And then let’s visualize the firs
- Page 618 and 619: sequence so far, and a data point f
- Page 620 and 621: Considering this, the not "unrolled
- Page 622 and 623: linear_input = nn.Linear(n_features
- Page 626 and 627: Now we’re talking! The last hidde
- Page 628 and 629: Let’s take a look at the RNN’s
- Page 630 and 631: ◦ The initial hidden state, which
- Page 632 and 633: batch_first argument to True so we
- Page 634 and 635: OutputOrderedDict([('weight_ih_l0',
- Page 636 and 637: out, hidden = rnn_stacked(x)out, hi
- Page 638 and 639: _l0_reverse).Once again, let’s cr
- Page 640 and 641: For bidirectional RNNs, the last el
- Page 642 and 643: Model Configuration1 class SquareMo
- Page 644 and 645: StepByStep.loader_apply(test_loader
- Page 646 and 647: Figure 8.14 - Final hidden states f
- Page 648 and 649: Figure 8.16 - Transforming the hidd
- Page 650 and 651: Since the RNN cell has both of them
- Page 652 and 653: Every gate worthy of its name will
- Page 654 and 655: • For r=0 and z=0, the cell becom
- Page 656 and 657: In code, we can use split() to get
- Page 658 and 659: Let’s pause for a moment here. Fi
- Page 660 and 661: Square Model II — The QuickeningT
- Page 662 and 663: Outputtensor([[53, 53],[75, 75]])Th
- Page 664 and 665: Figure 8.22 - Transforming the hidd
- Page 666 and 667: Equation 8.9 - LSTM—candidate hid
- Page 668 and 669: Now, let’s visualize the internal
- Page 670 and 671: OutputOrderedDict([('weight_ih', te
- Page 672 and 673: def forget_gate(h, x):thf = f_hidde
Output
tensor([[0.3924, 0.8146]], grad_fn=<TanhBackward>)
That’s the updated hidden state!
Now, let’s take a quick sanity check, feeding the same input to the original RNN
cell:
rnn_cell(X[0:1])
Output
tensor([[0.3924, 0.8146]], grad_fn=<TanhBackward>)
Great, the values match.
We can also visualize this sequence of operations, assuming that every hidden
space "lives" in a feature space delimited by the boundaries given by the hyperbolic
tangent. So, the initial hidden state (0, 0) sits at the center of this feature space,
depicted in the left-most plot in the figure below:
Figure 8.8 - Evolution of the hidden state
The transformed hidden state (the output of linear_hidden()) is depicted in the
second plot: It went through an affine transformation. The point in the center
corresponds to t h . In the third plot, we can see the effect of adding t x (the output of
linear_input()): The whole feature space was translated to the right and up. And
then, in the right most plot, the hyperbolic tangent works its magic and brings the
whole feature space back to the (-1, 1) range. That was the first step in the journey
of a hidden state. We’ll do it once again, using the full sequence, after training a
model.
Recurrent Neural Networks (RNNs) | 599