Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
Model Configuration1 class SquareModel(nn.Module):2 def __init__(self, n_features, hidden_dim, n_outputs):3 super(SquareModel, self).__init__()4 self.hidden_dim = hidden_dim5 self.n_features = n_features6 self.n_outputs = n_outputs7 self.hidden = None8 # Simple RNN9 self.basic_rnn = nn.RNN(self.n_features,10 self.hidden_dim,11 batch_first=True)12 # Classifier to produce as many logits as outputs13 self.classifier = nn.Linear(self.hidden_dim,14 self.n_outputs)1516 def forward(self, X):17 # X is batch first (N, L, F)18 # output is (N, L, H)19 # final hidden state is (1, N, H)20 batch_first_output, self.hidden = self.basic_rnn(X)2122 # only last item in sequence (N, 1, H)23 last_output = batch_first_output[:, -1]24 # classifier will output (N, 1, n_outputs)25 out = self.classifier(last_output)2627 # final output is (N, n_outputs)28 return out.view(-1, self.n_outputs)"Why are we taking the last output instead of the final hidden state?Aren’t they the same?"They are the same in most cases, yes, but they are different if you’re usingbidirectional RNNs. By using the last output, we’re ensuring that the code willwork for all sorts of RNNs: simple, stacked, and bidirectional. Besides, we want toavoid handling the hidden state anyway, because it’s always in sequence-first shape.Recurrent Neural Networks (RNNs) | 617
In the next chapter, we’ll be using the full output, that is, the fullsequence of hidden states, for encoder-decoder models.Next, we create an instance of the model, the corresponding loss function for abinary classification problem, and an optimizer:Model Configuration1 torch.manual_seed(21)2 model = SquareModel(n_features=2, hidden_dim=2, n_outputs=1)3 loss = nn.BCEWithLogitsLoss()4 optimizer = optim.Adam(model.parameters(), lr=0.01)Model TrainingThen, we train our SquareModel over 100 epochs, as usual, visualize the losses, andevaluate its accuracy on the test data:Model Training1 sbs_rnn = StepByStep(model, loss, optimizer)2 sbs_rnn.set_loaders(train_loader, test_loader)3 sbs_rnn.train(100)fig = sbs_rnn.plot_losses()Figure 8.12 - Losses—SquareModel618 | Chapter 8: Sequences
- Page 592 and 593: weights. If done properly, the init
- Page 594 and 595: just did), or, if you are training
- Page 596 and 597: Figure E.3 - The effect of batch no
- Page 598 and 599: Model Configuration1 torch.manual_s
- Page 600 and 601: torch.manual_seed(42)parm = nn.Para
- Page 602 and 603: (and only if) the norm exceeds the
- Page 604 and 605: if callable(self.clipping): 1self.c
- Page 606 and 607: Moreover, let’s use a ten times h
- Page 608 and 609: Clipping with HooksFirst, we reset
- Page 610 and 611: • visualizing the difference betw
- Page 612 and 613: Chapter 8SequencesSpoilersIn this c
- Page 614 and 615: Before shuffling, the pixels were o
- Page 616 and 617: And then let’s visualize the firs
- Page 618 and 619: sequence so far, and a data point f
- Page 620 and 621: Considering this, the not "unrolled
- Page 622 and 623: linear_input = nn.Linear(n_features
- Page 624 and 625: Outputtensor([[0.3924, 0.8146]], gr
- Page 626 and 627: Now we’re talking! The last hidde
- Page 628 and 629: Let’s take a look at the RNN’s
- Page 630 and 631: ◦ The initial hidden state, which
- Page 632 and 633: batch_first argument to True so we
- Page 634 and 635: OutputOrderedDict([('weight_ih_l0',
- Page 636 and 637: out, hidden = rnn_stacked(x)out, hi
- Page 638 and 639: _l0_reverse).Once again, let’s cr
- Page 640 and 641: For bidirectional RNNs, the last el
- Page 644 and 645: StepByStep.loader_apply(test_loader
- Page 646 and 647: Figure 8.14 - Final hidden states f
- Page 648 and 649: Figure 8.16 - Transforming the hidd
- Page 650 and 651: Since the RNN cell has both of them
- Page 652 and 653: Every gate worthy of its name will
- Page 654 and 655: • For r=0 and z=0, the cell becom
- Page 656 and 657: In code, we can use split() to get
- Page 658 and 659: Let’s pause for a moment here. Fi
- Page 660 and 661: Square Model II — The QuickeningT
- Page 662 and 663: Outputtensor([[53, 53],[75, 75]])Th
- Page 664 and 665: Figure 8.22 - Transforming the hidd
- Page 666 and 667: Equation 8.9 - LSTM—candidate hid
- Page 668 and 669: Now, let’s visualize the internal
- Page 670 and 671: OutputOrderedDict([('weight_ih', te
- Page 672 and 673: def forget_gate(h, x):thf = f_hidde
- Page 674 and 675: Outputtensor([[-5.4936e-02, -8.3816
- Page 676 and 677: 1 First change: from RNN to LSTM2 S
- Page 678 and 679: Like the GRU, the LSTM presents fou
- Page 680 and 681: Output-----------------------------
- Page 682 and 683: Before moving on to packed sequence
- Page 684 and 685: column-wise fashion, from top to bo
- Page 686 and 687: does match the last output.• No,
- Page 688 and 689: So, to actually get the last output
- Page 690 and 691: Data Preparation1 class CustomDatas
Model Configuration
1 class SquareModel(nn.Module):
2 def __init__(self, n_features, hidden_dim, n_outputs):
3 super(SquareModel, self).__init__()
4 self.hidden_dim = hidden_dim
5 self.n_features = n_features
6 self.n_outputs = n_outputs
7 self.hidden = None
8 # Simple RNN
9 self.basic_rnn = nn.RNN(self.n_features,
10 self.hidden_dim,
11 batch_first=True)
12 # Classifier to produce as many logits as outputs
13 self.classifier = nn.Linear(self.hidden_dim,
14 self.n_outputs)
15
16 def forward(self, X):
17 # X is batch first (N, L, F)
18 # output is (N, L, H)
19 # final hidden state is (1, N, H)
20 batch_first_output, self.hidden = self.basic_rnn(X)
21
22 # only last item in sequence (N, 1, H)
23 last_output = batch_first_output[:, -1]
24 # classifier will output (N, 1, n_outputs)
25 out = self.classifier(last_output)
26
27 # final output is (N, n_outputs)
28 return out.view(-1, self.n_outputs)
"Why are we taking the last output instead of the final hidden state?
Aren’t they the same?"
They are the same in most cases, yes, but they are different if you’re using
bidirectional RNNs. By using the last output, we’re ensuring that the code will
work for all sorts of RNNs: simple, stacked, and bidirectional. Besides, we want to
avoid handling the hidden state anyway, because it’s always in sequence-first shape.
Recurrent Neural Networks (RNNs) | 617