Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
Figure 8.14 - Final hidden states for eight sequences of the "perfect" squareFor clockwise movement, the final hidden states are situated in the upper-leftregion, while counterclockwise movement brings the final hidden states to thelower-right corner. The decision boundary, as expected from a logistic regression,is a straight line. The point closest to the decision boundary—that is, the one themodel is less confident about—corresponds to the sequence starting at the B corner(green) and moving clockwise (+)."What about the other hidden states for the actual sequences?"Let’s visualize them as well. In the figure below, clockwise sequences arerepresented by blue points and counterclockwise sequences, by red points.Figure 8.15 - Sequence of hidden statesWe can see that the model already achieves some separation after "seeing" two datapoints (corners), corresponding to "Hidden State #1." After "seeing" the thirdcorner, most of the sequences are already correctly classified, and, after observingRecurrent Neural Networks (RNNs) | 621
all corners, it gets every noisy square right."Can we pick one sequence and observe its hidden state from itsinitial to its final values?"Sure we can!The Journey of a Hidden StateLet’s use the ABCD sequence of the "perfect" square for this. The initial hiddenstate is (0, 0) by default, and it is colored black. Every time a new data point (corner)is going to be used in the computation, the affected hidden state is coloredaccordingly (gray, green, blue, and red, in order).The figure below tracks the progress of the hidden state over every operationperformed inside the RNN.The first column has the hidden state that’s an input for the RNN cell at a givenstep; the second column has the transformed hidden state; the third, thetranslated hidden state (by adding the transformed input); and the last, theactivated hidden state.There are four rows, one for each data point (corner) in our sequence. The initialhidden state of each row is the activated state of the previous row, so it starts atthe initial hidden state of the whole sequence (0, 0) and, after processing the gray,green, blue, and red corners, ends at the final hidden state, the red dot close to (-1,1) in the last plot.622 | Chapter 8: Sequences
- Page 596 and 597: Figure E.3 - The effect of batch no
- Page 598 and 599: Model Configuration1 torch.manual_s
- Page 600 and 601: torch.manual_seed(42)parm = nn.Para
- Page 602 and 603: (and only if) the norm exceeds the
- Page 604 and 605: if callable(self.clipping): 1self.c
- Page 606 and 607: Moreover, let’s use a ten times h
- Page 608 and 609: Clipping with HooksFirst, we reset
- Page 610 and 611: • visualizing the difference betw
- Page 612 and 613: Chapter 8SequencesSpoilersIn this c
- Page 614 and 615: Before shuffling, the pixels were o
- Page 616 and 617: And then let’s visualize the firs
- Page 618 and 619: sequence so far, and a data point f
- Page 620 and 621: Considering this, the not "unrolled
- Page 622 and 623: linear_input = nn.Linear(n_features
- Page 624 and 625: Outputtensor([[0.3924, 0.8146]], gr
- Page 626 and 627: Now we’re talking! The last hidde
- Page 628 and 629: Let’s take a look at the RNN’s
- Page 630 and 631: ◦ The initial hidden state, which
- Page 632 and 633: batch_first argument to True so we
- Page 634 and 635: OutputOrderedDict([('weight_ih_l0',
- Page 636 and 637: out, hidden = rnn_stacked(x)out, hi
- Page 638 and 639: _l0_reverse).Once again, let’s cr
- Page 640 and 641: For bidirectional RNNs, the last el
- Page 642 and 643: Model Configuration1 class SquareMo
- Page 644 and 645: StepByStep.loader_apply(test_loader
- Page 648 and 649: Figure 8.16 - Transforming the hidd
- Page 650 and 651: Since the RNN cell has both of them
- Page 652 and 653: Every gate worthy of its name will
- Page 654 and 655: • For r=0 and z=0, the cell becom
- Page 656 and 657: In code, we can use split() to get
- Page 658 and 659: Let’s pause for a moment here. Fi
- Page 660 and 661: Square Model II — The QuickeningT
- Page 662 and 663: Outputtensor([[53, 53],[75, 75]])Th
- Page 664 and 665: Figure 8.22 - Transforming the hidd
- Page 666 and 667: Equation 8.9 - LSTM—candidate hid
- Page 668 and 669: Now, let’s visualize the internal
- Page 670 and 671: OutputOrderedDict([('weight_ih', te
- Page 672 and 673: def forget_gate(h, x):thf = f_hidde
- Page 674 and 675: Outputtensor([[-5.4936e-02, -8.3816
- Page 676 and 677: 1 First change: from RNN to LSTM2 S
- Page 678 and 679: Like the GRU, the LSTM presents fou
- Page 680 and 681: Output-----------------------------
- Page 682 and 683: Before moving on to packed sequence
- Page 684 and 685: column-wise fashion, from top to bo
- Page 686 and 687: does match the last output.• No,
- Page 688 and 689: So, to actually get the last output
- Page 690 and 691: Data Preparation1 class CustomDatas
- Page 692 and 693: OutputPackedSequence(data=tensor([[
- Page 694 and 695: Model Configuration & TrainingWe ca
all corners, it gets every noisy square right.
"Can we pick one sequence and observe its hidden state from its
initial to its final values?"
Sure we can!
The Journey of a Hidden State
Let’s use the ABCD sequence of the "perfect" square for this. The initial hidden
state is (0, 0) by default, and it is colored black. Every time a new data point (corner)
is going to be used in the computation, the affected hidden state is colored
accordingly (gray, green, blue, and red, in order).
The figure below tracks the progress of the hidden state over every operation
performed inside the RNN.
The first column has the hidden state that’s an input for the RNN cell at a given
step; the second column has the transformed hidden state; the third, the
translated hidden state (by adding the transformed input); and the last, the
activated hidden state.
There are four rows, one for each data point (corner) in our sequence. The initial
hidden state of each row is the activated state of the previous row, so it starts at
the initial hidden state of the whole sequence (0, 0) and, after processing the gray,
green, blue, and red corners, ends at the final hidden state, the red dot close to (-1,
1) in the last plot.
622 | Chapter 8: Sequences