Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Outputtensor([[-5.4936e-02, -8.3816e-05]], grad_fn=<MulBackward0>)The LSTM cell must return both states, hidden and cell, in that order, as a tuple:(h_prime, c_prime)Output(tensor([[-5.4936e-02, -8.3816e-05]], grad_fn=<MulBackward0>),tensor([[-0.1340, -0.0004]], grad_fn=<AddBackward0>))That’s it! Wasn’t that bad, right? The formulation of the LSTM may seem scary atfirst sight, especially if you bump into a huge sequence of equations using allweights and biases at once, but it doesn’t have to be that way.Finally, let’s take a quick sanity check, feeding the same input to the original LSTMcell:lstm_cell(first_corner)Output(tensor([[-5.4936e-02, -8.3816e-05]], grad_fn=<MulBackward0>),tensor([[-0.1340, -0.0004]], grad_fn=<AddBackward0>))And we’re done with cells. I guess you know what comes next…LSTM LayerThe nn.LSTM layer takes care of the hidden and cell states handling for us, nomatter how long the input sequence is. We’ve been through this once with the RNNlayer and then again with the GRU layer. The arguments, inputs, and outputs of theLSTM are almost exactly the same as those for the GRU, except for the fact that, asyou already know, LSTMs return two states (hidden and cell) with the same shapeinstead of one. By the way, you can create stacked LSTMs and bidirectional LSTMstoo.Long Short-Term Memory (LSTM) | 649

So, let’s go straight to creating a model using a long short-term memory.Square Model III — The SorcererThis model is pretty much the same as the original "Square Model," except for twodifferences: Its recurrent neural network is not a plain RNN anymore, but an LSTM,and it produces two states as output instead of one. Everything else stays exactlythe same.Model Configuration1 class SquareModelLSTM(nn.Module):2 def __init__(self, n_features, hidden_dim, n_outputs):3 super(SquareModelLSTM, self).__init__()4 self.hidden_dim = hidden_dim5 self.n_features = n_features6 self.n_outputs = n_outputs7 self.hidden = None8 self.cell = None 29 # Simple LSTM10 self.basic_rnn = nn.LSTM(self.n_features,11 self.hidden_dim,12 batch_first=True) 113 # Classifier to produce as many logits as outputs14 self.classifier = nn.Linear(self.hidden_dim,15 self.n_outputs)1617 def forward(self, X):18 # X is batch first (N, L, F)19 # output is (N, L, H)20 # final hidden state is (1, N, H)21 # final cell state is (1, N, H)22 batch_first_output, (self.hidden, self.cell) = \23 self.basic_rnn(X) 22425 # only last item in sequence (N, 1, H)26 last_output = batch_first_output[:, -1]27 # classifier will output (N, 1, n_outputs)28 out = self.classifier(last_output)29 # final output is (N, n_outputs)30 return out.view(-1, self.n_outputs)650 | Chapter 8: Sequences

Output

tensor([[-5.4936e-02, -8.3816e-05]], grad_fn=<MulBackward0>)

The LSTM cell must return both states, hidden and cell, in that order, as a tuple:

(h_prime, c_prime)

Output

(tensor([[-5.4936e-02, -8.3816e-05]], grad_fn=<MulBackward0>),

tensor([[-0.1340, -0.0004]], grad_fn=<AddBackward0>))

That’s it! Wasn’t that bad, right? The formulation of the LSTM may seem scary at

first sight, especially if you bump into a huge sequence of equations using all

weights and biases at once, but it doesn’t have to be that way.

Finally, let’s take a quick sanity check, feeding the same input to the original LSTM

cell:

lstm_cell(first_corner)

Output

(tensor([[-5.4936e-02, -8.3816e-05]], grad_fn=<MulBackward0>),

tensor([[-0.1340, -0.0004]], grad_fn=<AddBackward0>))

And we’re done with cells. I guess you know what comes next…

LSTM Layer

The nn.LSTM layer takes care of the hidden and cell states handling for us, no

matter how long the input sequence is. We’ve been through this once with the RNN

layer and then again with the GRU layer. The arguments, inputs, and outputs of the

LSTM are almost exactly the same as those for the GRU, except for the fact that, as

you already know, LSTMs return two states (hidden and cell) with the same shape

instead of one. By the way, you can create stacked LSTMs and bidirectional LSTMs

too.

Long Short-Term Memory (LSTM) | 649

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!