Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Model Configuration1 class SquareModel(nn.Module):2 def __init__(self, n_features, hidden_dim, n_outputs):3 super(SquareModel, self).__init__()4 self.hidden_dim = hidden_dim5 self.n_features = n_features6 self.n_outputs = n_outputs7 self.hidden = None8 # Simple RNN9 self.basic_rnn = nn.RNN(self.n_features,10 self.hidden_dim,11 batch_first=True)12 # Classifier to produce as many logits as outputs13 self.classifier = nn.Linear(self.hidden_dim,14 self.n_outputs)1516 def forward(self, X):17 # X is batch first (N, L, F)18 # output is (N, L, H)19 # final hidden state is (1, N, H)20 batch_first_output, self.hidden = self.basic_rnn(X)2122 # only last item in sequence (N, 1, H)23 last_output = batch_first_output[:, -1]24 # classifier will output (N, 1, n_outputs)25 out = self.classifier(last_output)2627 # final output is (N, n_outputs)28 return out.view(-1, self.n_outputs)"Why are we taking the last output instead of the final hidden state?Aren’t they the same?"They are the same in most cases, yes, but they are different if you’re usingbidirectional RNNs. By using the last output, we’re ensuring that the code willwork for all sorts of RNNs: simple, stacked, and bidirectional. Besides, we want toavoid handling the hidden state anyway, because it’s always in sequence-first shape.Recurrent Neural Networks (RNNs) | 617

In the next chapter, we’ll be using the full output, that is, the fullsequence of hidden states, for encoder-decoder models.Next, we create an instance of the model, the corresponding loss function for abinary classification problem, and an optimizer:Model Configuration1 torch.manual_seed(21)2 model = SquareModel(n_features=2, hidden_dim=2, n_outputs=1)3 loss = nn.BCEWithLogitsLoss()4 optimizer = optim.Adam(model.parameters(), lr=0.01)Model TrainingThen, we train our SquareModel over 100 epochs, as usual, visualize the losses, andevaluate its accuracy on the test data:Model Training1 sbs_rnn = StepByStep(model, loss, optimizer)2 sbs_rnn.set_loaders(train_loader, test_loader)3 sbs_rnn.train(100)fig = sbs_rnn.plot_losses()Figure 8.12 - Losses—SquareModel618 | Chapter 8: Sequences

Model Configuration

1 class SquareModel(nn.Module):

2 def __init__(self, n_features, hidden_dim, n_outputs):

3 super(SquareModel, self).__init__()

4 self.hidden_dim = hidden_dim

5 self.n_features = n_features

6 self.n_outputs = n_outputs

7 self.hidden = None

8 # Simple RNN

9 self.basic_rnn = nn.RNN(self.n_features,

10 self.hidden_dim,

11 batch_first=True)

12 # Classifier to produce as many logits as outputs

13 self.classifier = nn.Linear(self.hidden_dim,

14 self.n_outputs)

15

16 def forward(self, X):

17 # X is batch first (N, L, F)

18 # output is (N, L, H)

19 # final hidden state is (1, N, H)

20 batch_first_output, self.hidden = self.basic_rnn(X)

21

22 # only last item in sequence (N, 1, H)

23 last_output = batch_first_output[:, -1]

24 # classifier will output (N, 1, n_outputs)

25 out = self.classifier(last_output)

26

27 # final output is (N, n_outputs)

28 return out.view(-1, self.n_outputs)

"Why are we taking the last output instead of the final hidden state?

Aren’t they the same?"

They are the same in most cases, yes, but they are different if you’re using

bidirectional RNNs. By using the last output, we’re ensuring that the code will

work for all sorts of RNNs: simple, stacked, and bidirectional. Besides, we want to

avoid handling the hidden state anyway, because it’s always in sequence-first shape.

Recurrent Neural Networks (RNNs) | 617

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!