Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
sequence so far, and a data point from the sequence (like the coordinates ofone of the corners from a given square).3. The two inputs are used to produce a new hidden state (h 0 for the first datapoint), representing the updated state of the sequence now that a new pointwas presented to it.4. The new hidden state is both the output of the current step and one of theinputs of the next step.5. If there is yet another data point in the sequence, it goes back to Step #2; ifnot, the last hidden state (h 1 in the figure above) is also the final hidden state (h f ) of the whole RNN.Since the final hidden state is a representation of the full sequence, that’s whatwe’re going to use as features for our classifier.In a way, that’s not so different from the way we used CNNs:There, we’d run the pixels through multiple convolutional blocks(convolutional layer + activation + pooling) and flatten them intoa vector at the end to use as features for a classifier.Here, we run a sequence of data points through RNN cells anduse the final hidden state (also a vector) as features for aclassifier.There is a fundamental difference between CNNs and RNNs, though: While thereare several different convolutional layers, each learning its own filters, the RNNcell is one and the same. In this sense, the "unrolled" representation is misleading: Itdefinitely looks like each input is being fed to a different RNN cell, but that’s not thecase.There is only one cell, which will learn a particular set of weightsand biases, and which will transform the inputs exactly the sameway in every step of the sequence. Don’t worry if this doesn’tcompletely make sense to you just yet; I promise it will becomemore clear soon, especially in the "Journey of a Hidden State"section.Recurrent Neural Networks (RNNs) | 593
RNN CellLet’s take a look at some of the internals of an RNN cell:Figure 8.6 - Internals of an RNN cellOn the left, we have a single RNN cell. It has three main components:• A linear layer to transform the hidden state (in blue)• A linear layer to transform the data point from the sequence (in red)• An activation function, usually the hyperbolic tangent (TanH), which is appliedto the sum of both transformed inputsWe can also represent them as equations:Equation 8.1 - RNNI chose to split the equation into smaller colored parts to highlight the fact thatthese are simple linear layers producing both a transformed hidden state (t h ) and atransformed data point (t x ). The updated hidden (h t ) state is both the output of thisparticular cell and one of the inputs of the "next" cell.But there is no other cell, really; it is just the same cell over and over again, asdepicted on the right side of the figure above. So, in the second step of thesequence, the updated hidden state will run through the very same linear layer theinitial hidden state ran through. The same goes for the second data point.594 | Chapter 8: Sequences
- Page 568 and 569: OutputOrderedDict([('running_mean',
- Page 570 and 571: batch_normalizer = nn.BatchNorm2d(n
- Page 572 and 573: torch.manual_seed(23)dummy_points =
- Page 574 and 575: np.concatenate([dummy_points[:5].nu
- Page 576 and 577: Another advantage of these shortcut
- Page 578 and 579: It should be pretty clear, except f
- Page 580 and 581: Data Preparation1 # ImageNet statis
- Page 582 and 583: Data Preparation — Preprocessing1
- Page 584 and 585: • freezing the layers of the mode
- Page 586 and 587: Extra ChapterVanishing and Explodin
- Page 588 and 589: discussing it, let me illustrate it
- Page 590 and 591: Model Configuration (2)1 loss_fn =
- Page 592 and 593: weights. If done properly, the init
- Page 594 and 595: just did), or, if you are training
- Page 596 and 597: Figure E.3 - The effect of batch no
- Page 598 and 599: Model Configuration1 torch.manual_s
- Page 600 and 601: torch.manual_seed(42)parm = nn.Para
- Page 602 and 603: (and only if) the norm exceeds the
- Page 604 and 605: if callable(self.clipping): 1self.c
- Page 606 and 607: Moreover, let’s use a ten times h
- Page 608 and 609: Clipping with HooksFirst, we reset
- Page 610 and 611: • visualizing the difference betw
- Page 612 and 613: Chapter 8SequencesSpoilersIn this c
- Page 614 and 615: Before shuffling, the pixels were o
- Page 616 and 617: And then let’s visualize the firs
- Page 620 and 621: Considering this, the not "unrolled
- Page 622 and 623: linear_input = nn.Linear(n_features
- Page 624 and 625: Outputtensor([[0.3924, 0.8146]], gr
- Page 626 and 627: Now we’re talking! The last hidde
- Page 628 and 629: Let’s take a look at the RNN’s
- Page 630 and 631: ◦ The initial hidden state, which
- Page 632 and 633: batch_first argument to True so we
- Page 634 and 635: OutputOrderedDict([('weight_ih_l0',
- Page 636 and 637: out, hidden = rnn_stacked(x)out, hi
- Page 638 and 639: _l0_reverse).Once again, let’s cr
- Page 640 and 641: For bidirectional RNNs, the last el
- Page 642 and 643: Model Configuration1 class SquareMo
- Page 644 and 645: StepByStep.loader_apply(test_loader
- Page 646 and 647: Figure 8.14 - Final hidden states f
- Page 648 and 649: Figure 8.16 - Transforming the hidd
- Page 650 and 651: Since the RNN cell has both of them
- Page 652 and 653: Every gate worthy of its name will
- Page 654 and 655: • For r=0 and z=0, the cell becom
- Page 656 and 657: In code, we can use split() to get
- Page 658 and 659: Let’s pause for a moment here. Fi
- Page 660 and 661: Square Model II — The QuickeningT
- Page 662 and 663: Outputtensor([[53, 53],[75, 75]])Th
- Page 664 and 665: Figure 8.22 - Transforming the hidd
- Page 666 and 667: Equation 8.9 - LSTM—candidate hid
sequence so far, and a data point from the sequence (like the coordinates of
one of the corners from a given square).
3. The two inputs are used to produce a new hidden state (h 0 for the first data
point), representing the updated state of the sequence now that a new point
was presented to it.
4. The new hidden state is both the output of the current step and one of the
inputs of the next step.
5. If there is yet another data point in the sequence, it goes back to Step #2; if
not, the last hidden state (h 1 in the figure above) is also the final hidden state (
h f ) of the whole RNN.
Since the final hidden state is a representation of the full sequence, that’s what
we’re going to use as features for our classifier.
In a way, that’s not so different from the way we used CNNs:
There, we’d run the pixels through multiple convolutional blocks
(convolutional layer + activation + pooling) and flatten them into
a vector at the end to use as features for a classifier.
Here, we run a sequence of data points through RNN cells and
use the final hidden state (also a vector) as features for a
classifier.
There is a fundamental difference between CNNs and RNNs, though: While there
are several different convolutional layers, each learning its own filters, the RNN
cell is one and the same. In this sense, the "unrolled" representation is misleading: It
definitely looks like each input is being fed to a different RNN cell, but that’s not the
case.
There is only one cell, which will learn a particular set of weights
and biases, and which will transform the inputs exactly the same
way in every step of the sequence. Don’t worry if this doesn’t
completely make sense to you just yet; I promise it will become
more clear soon, especially in the "Journey of a Hidden State"
section.
Recurrent Neural Networks (RNNs) | 593