Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
Data Preparation1 def pack_collate(batch):2 X = [item[0] for item in batch]3 y = [item[1] for item in batch]4 X_pack = rnn_utils.pack_sequence(X, enforce_sorted=False)56 return X_pack, torch.as_tensor(y).view(-1, 1)78 train_var_loader = DataLoader(train_var_data,9 batch_size=16,10 shuffle=True,11 collate_fn=pack_collate)There Can Be Only ONE … ModelWe’ve developed many models throughout this chapter, depending both on thetype of recurrent layer that was used (RNN, GRU, or LSTM) and on the type ofsequence (packed or not). The model below, though, is able to handle differentconfigurations:• Its rnn_layer argument allows you to use whichever recurrent layer youprefer.• The **kwargs argument allows you to further configure the recurrent layer(using num_layers and bidirectional arguments, for example).• The output dimension of the recurrent layer is automatically computed tobuild a matching linear layer.• If the input is a packed sequence, it handles the unpacking and fancy indexingto retrieve the actual last hidden state.Model Configuration1 class SquareModelOne(nn.Module):2 def __init__(self, n_features, hidden_dim, n_outputs,3 rnn_layer=nn.LSTM, **kwargs):4 super(SquareModelOne, self).__init__()5 self.hidden_dim = hidden_dim6 self.n_features = n_features7 self.n_outputs = n_outputs8 self.hidden = NonePutting It All Together | 681
9 self.cell = None10 self.basic_rnn = rnn_layer(self.n_features,11 self.hidden_dim,12 batch_first=True, **kwargs)13 output_dim = (self.basic_rnn.bidirectional + 1) * \14 self.hidden_dim15 # Classifier to produce as many logits as outputs16 self.classifier = nn.Linear(output_dim, self.n_outputs)1718 def forward(self, X):19 is_packed = isinstance(X, nn.utils.rnn.PackedSequence)20 # X is a PACKED sequence, there is no need to permute2122 rnn_out, self.hidden = self.basic_rnn(X)23 if isinstance(self.basic_rnn, nn.LSTM):24 self.hidden, self.cell = self.hidden2526 if is_packed:27 # unpack the output28 batch_first_output, seq_sizes = \29 rnn_utils.pad_packed_sequence(rnn_out,30 batch_first=True)31 seq_slice = torch.arange(seq_sizes.size(0))32 else:33 batch_first_output = rnn_out34 seq_sizes = 0 # so it is -1 as the last output35 seq_slice = slice(None, None, None) # same as ':'3637 # only last item in sequence (N, 1, H)38 last_output = batch_first_output[seq_slice, seq_sizes-1]3940 # classifier will output (N, 1, n_outputs)41 out = self.classifier(last_output)4243 # final output is (N, n_outputs)44 return out.view(-1, self.n_outputs)Model Configuration & TrainingThe model below uses a bidirectional LSTM and already achieves a 100% accuracyon the training set. Feel free to experiment with different recurrent layers, thenumber of layers, single or bidirectional, as well as with switching between fixed-682 | Chapter 8: Sequences
- Page 656 and 657: In code, we can use split() to get
- Page 658 and 659: Let’s pause for a moment here. Fi
- Page 660 and 661: Square Model II — The QuickeningT
- Page 662 and 663: Outputtensor([[53, 53],[75, 75]])Th
- Page 664 and 665: Figure 8.22 - Transforming the hidd
- Page 666 and 667: Equation 8.9 - LSTM—candidate hid
- Page 668 and 669: Now, let’s visualize the internal
- Page 670 and 671: OutputOrderedDict([('weight_ih', te
- Page 672 and 673: def forget_gate(h, x):thf = f_hidde
- Page 674 and 675: Outputtensor([[-5.4936e-02, -8.3816
- Page 676 and 677: 1 First change: from RNN to LSTM2 S
- Page 678 and 679: Like the GRU, the LSTM presents fou
- Page 680 and 681: Output-----------------------------
- Page 682 and 683: Before moving on to packed sequence
- Page 684 and 685: column-wise fashion, from top to bo
- Page 686 and 687: does match the last output.• No,
- Page 688 and 689: So, to actually get the last output
- Page 690 and 691: Data Preparation1 class CustomDatas
- Page 692 and 693: OutputPackedSequence(data=tensor([[
- Page 694 and 695: Model Configuration & TrainingWe ca
- Page 696 and 697: size = 5weight = torch.ones(size) *
- Page 698 and 699: torch.manual_seed(17)conv_seq = nn.
- Page 700 and 701: Figure 8.32 - Applying dilated filt
- Page 702 and 703: Model Configuration1 torch.manual_s
- Page 704 and 705: We can actually find an expression
- Page 708 and 709: and variable-length sequences.Model
- Page 710 and 711: • generating variable-length sequ
- Page 712 and 713: import copyimport numpy as npimport
- Page 714 and 715: Figure 9.3 - Sequence datasetThe co
- Page 716 and 717: coordinates of a "perfect" square a
- Page 718 and 719: Let’s pretend for a moment that t
- Page 720 and 721: to initialize the hidden state and
- Page 722 and 723: predictions in previous steps have
- Page 724 and 725: the second set of predicted coordin
- Page 726 and 727: Let’s create an instance of the m
- Page 728 and 729: Model Configuration & TrainingThe m
- Page 730 and 731: Sure, we can!AttentionHere is a (no
- Page 732 and 733: based on "the" and "zone," I’ve j
- Page 734 and 735: Figure 9.12 - Matching a query to t
- Page 736 and 737: Outputtensor([[[ 0.0832, -0.0356],[
- Page 738 and 739: utmost importance for the correct i
- Page 740 and 741: Its formula is:Equation 9.3 - Cosin
- Page 742 and 743: second hidden state contributes to
- Page 744 and 745: Outputtensor([[[ 0.5475, 0.0875, -1
- Page 746 and 747: alphas = F.softmax(scaled_products,
- Page 748 and 749: Outputtensor([[[ 0.2138, -0.3175]]]
- Page 750 and 751: Attention Mechanism1 class Attentio
- Page 752 and 753: "Why would I want to force it to do
- Page 754 and 755: 1 Sets attention module and adjusts
Data Preparation
1 def pack_collate(batch):
2 X = [item[0] for item in batch]
3 y = [item[1] for item in batch]
4 X_pack = rnn_utils.pack_sequence(X, enforce_sorted=False)
5
6 return X_pack, torch.as_tensor(y).view(-1, 1)
7
8 train_var_loader = DataLoader(train_var_data,
9 batch_size=16,
10 shuffle=True,
11 collate_fn=pack_collate)
There Can Be Only ONE … Model
We’ve developed many models throughout this chapter, depending both on the
type of recurrent layer that was used (RNN, GRU, or LSTM) and on the type of
sequence (packed or not). The model below, though, is able to handle different
configurations:
• Its rnn_layer argument allows you to use whichever recurrent layer you
prefer.
• The **kwargs argument allows you to further configure the recurrent layer
(using num_layers and bidirectional arguments, for example).
• The output dimension of the recurrent layer is automatically computed to
build a matching linear layer.
• If the input is a packed sequence, it handles the unpacking and fancy indexing
to retrieve the actual last hidden state.
Model Configuration
1 class SquareModelOne(nn.Module):
2 def __init__(self, n_features, hidden_dim, n_outputs,
3 rnn_layer=nn.LSTM, **kwargs):
4 super(SquareModelOne, self).__init__()
5 self.hidden_dim = hidden_dim
6 self.n_features = n_features
7 self.n_outputs = n_outputs
8 self.hidden = None
Putting It All Together | 681