Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
linear_input = nn.Linear(n_features, hidden_dim)linear_hidden = nn.Linear(hidden_dim, hidden_dim)with torch.no_grad():linear_input.weight = nn.Parameter(rnn_state['weight_ih'])linear_input.bias = nn.Parameter(rnn_state['bias_ih'])linear_hidden.weight = nn.Parameter(rnn_state['weight_hh'])linear_hidden.bias = nn.Parameter(rnn_state['bias_hh'])Now, let’s work our way through the mechanics of the RNN cell! It all starts withthe initial hidden state representing the empty sequence:initial_hidden = torch.zeros(1, hidden_dim)initial_hiddenOutputtensor([[0., 0.]])Then, we use the two blue neurons, the linear_hidden layer, to transform thehidden state:th = linear_hidden(initial_hidden)thOutputtensor([[-0.3565, -0.2904]], grad_fn=<AddmmBackward>)Cool! Now, let’s take look at a sequence of data points from our dataset:X = torch.as_tensor(points[0]).float()XRecurrent Neural Networks (RNNs) | 597
Outputtensor([[ 1.0349, 0.9661],[ 0.8055, -0.9169],[-0.8251, -0.9499],[-0.8670, 0.9342]])As expected, four data points, two coordinates each. The first data point, [1.0349,0.9661], corresponding to the top-right corner of the square, is going to betransformed by the linear_input layers (the two red neurons):tx = linear_input(X[0:1])txOutputtensor([[0.7712, 1.4310]], grad_fn=<AddmmBackward>)There we go: We got both t x and t h . Let’s add them together:adding = th + txaddingOutputtensor([[0.4146, 1.1405]], grad_fn=<AddBackward0>)The effect of adding t x is similar to the effect of adding the bias: It is effectivelytranslating the transformed hidden state to the right (by 0.7712) and up (by1.4310).Finally, the hyperbolic tangent activation function "compresses" the feature spaceback into the (-1, 1) interval:torch.tanh(adding)598 | Chapter 8: Sequences
- Page 572 and 573: torch.manual_seed(23)dummy_points =
- Page 574 and 575: np.concatenate([dummy_points[:5].nu
- Page 576 and 577: Another advantage of these shortcut
- Page 578 and 579: It should be pretty clear, except f
- Page 580 and 581: Data Preparation1 # ImageNet statis
- Page 582 and 583: Data Preparation — Preprocessing1
- Page 584 and 585: • freezing the layers of the mode
- Page 586 and 587: Extra ChapterVanishing and Explodin
- Page 588 and 589: discussing it, let me illustrate it
- Page 590 and 591: Model Configuration (2)1 loss_fn =
- Page 592 and 593: weights. If done properly, the init
- Page 594 and 595: just did), or, if you are training
- Page 596 and 597: Figure E.3 - The effect of batch no
- Page 598 and 599: Model Configuration1 torch.manual_s
- Page 600 and 601: torch.manual_seed(42)parm = nn.Para
- Page 602 and 603: (and only if) the norm exceeds the
- Page 604 and 605: if callable(self.clipping): 1self.c
- Page 606 and 607: Moreover, let’s use a ten times h
- Page 608 and 609: Clipping with HooksFirst, we reset
- Page 610 and 611: • visualizing the difference betw
- Page 612 and 613: Chapter 8SequencesSpoilersIn this c
- Page 614 and 615: Before shuffling, the pixels were o
- Page 616 and 617: And then let’s visualize the firs
- Page 618 and 619: sequence so far, and a data point f
- Page 620 and 621: Considering this, the not "unrolled
- Page 624 and 625: Outputtensor([[0.3924, 0.8146]], gr
- Page 626 and 627: Now we’re talking! The last hidde
- Page 628 and 629: Let’s take a look at the RNN’s
- Page 630 and 631: ◦ The initial hidden state, which
- Page 632 and 633: batch_first argument to True so we
- Page 634 and 635: OutputOrderedDict([('weight_ih_l0',
- Page 636 and 637: out, hidden = rnn_stacked(x)out, hi
- Page 638 and 639: _l0_reverse).Once again, let’s cr
- Page 640 and 641: For bidirectional RNNs, the last el
- Page 642 and 643: Model Configuration1 class SquareMo
- Page 644 and 645: StepByStep.loader_apply(test_loader
- Page 646 and 647: Figure 8.14 - Final hidden states f
- Page 648 and 649: Figure 8.16 - Transforming the hidd
- Page 650 and 651: Since the RNN cell has both of them
- Page 652 and 653: Every gate worthy of its name will
- Page 654 and 655: • For r=0 and z=0, the cell becom
- Page 656 and 657: In code, we can use split() to get
- Page 658 and 659: Let’s pause for a moment here. Fi
- Page 660 and 661: Square Model II — The QuickeningT
- Page 662 and 663: Outputtensor([[53, 53],[75, 75]])Th
- Page 664 and 665: Figure 8.22 - Transforming the hidd
- Page 666 and 667: Equation 8.9 - LSTM—candidate hid
- Page 668 and 669: Now, let’s visualize the internal
- Page 670 and 671: OutputOrderedDict([('weight_ih', te
linear_input = nn.Linear(n_features, hidden_dim)
linear_hidden = nn.Linear(hidden_dim, hidden_dim)
with torch.no_grad():
linear_input.weight = nn.Parameter(rnn_state['weight_ih'])
linear_input.bias = nn.Parameter(rnn_state['bias_ih'])
linear_hidden.weight = nn.Parameter(rnn_state['weight_hh'])
linear_hidden.bias = nn.Parameter(rnn_state['bias_hh'])
Now, let’s work our way through the mechanics of the RNN cell! It all starts with
the initial hidden state representing the empty sequence:
initial_hidden = torch.zeros(1, hidden_dim)
initial_hidden
Output
tensor([[0., 0.]])
Then, we use the two blue neurons, the linear_hidden layer, to transform the
hidden state:
th = linear_hidden(initial_hidden)
th
Output
tensor([[-0.3565, -0.2904]], grad_fn=<AddmmBackward>)
Cool! Now, let’s take look at a sequence of data points from our dataset:
X = torch.as_tensor(points[0]).float()
X
Recurrent Neural Networks (RNNs) | 597