Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
Figure 10.20 - Sample image—label "2""But this is a classification problem, not a sequence-to-sequenceone—why are we using a Transformer then?"Well, we’re not using the full Transformer architecture, only its encoder. In Chapter8, we used recurrent neural networks to generate a final hidden state that we usedas the input for classification. Similarly, the encoder generates a sequence of"hidden states" (the memory, in Transformer lingo), and we’re using one "hiddenstate" as the input for classification again."Which one? The last 'hidden state'?"No, not the last one, but a special one. We’ll prepend a special classifier token[CLS] to our sequence and use its corresponding "hidden state" as input to aclassifier. The figure below illustrates the idea.Figure 10.21 - Hidden states and the special classifier token [CLS]But I’m jumping the gun here—we’ll get back to that in the "Special Classifier Token"section.Vision Transformer | 847
The data preparation step is exactly the same one we used in Chapter 5:Data Preparation1 class TransformedTensorDataset(Dataset):2 def __init__(self, x, y, transform=None):3 self.x = x4 self.y = y5 self.transform = transform67 def __getitem__(self, index):8 x = self.x[index]9 if self.transform:10 x = self.transform(x)1112 return x, self.y[index]1314 def __len__(self):15 return len(self.x)1617 # Builds tensors from numpy arrays BEFORE split18 # Modifies the scale of pixel values from [0, 255] to [0, 1]19 x_tensor = torch.as_tensor(images / 255).float()20 y_tensor = torch.as_tensor(labels).long()2122 # Uses index_splitter to generate indices for training and23 # validation sets24 train_idx, val_idx = index_splitter(len(x_tensor), [80, 20])25 # Uses indices to perform the split26 x_train_tensor = x_tensor[train_idx]27 y_train_tensor = y_tensor[train_idx]28 x_val_tensor = x_tensor[val_idx]29 y_val_tensor = y_tensor[val_idx]3031 # We're not doing any data augmentation now32 train_composer = Compose([Normalize(mean=(.5,), std=(.5,))])33 val_composer = Compose([Normalize(mean=(.5,), std=(.5,))])3435 # Uses custom dataset to apply composed transforms to each set36 train_dataset = TransformedTensorDataset(37 x_train_tensor, y_train_tensor, transform=train_composer)38 val_dataset = TransformedTensorDataset(39 x_val_tensor, y_val_tensor, transform=val_composer)848 | Chapter 10: Transform and Roll Out
- Page 822 and 823: following imports:import copyimport
- Page 824 and 825: Figure 10.2 - Chunking: the wrong a
- Page 826 and 827: chunks to compute the other half of
- Page 828 and 829: 67 # N, L, n_heads, d_k68 context =
- Page 830 and 831: dummy_points = torch.randn(16, 2, 4
- Page 832 and 833: Stacking Encoders and DecodersLet
- Page 834 and 835: "… with great depth comes great c
- Page 836 and 837: Transformer EncoderWe’ll be repre
- Page 838 and 839: Let’s see it in code, starting wi
- Page 840 and 841: Transformer Encoder1 class EncoderT
- Page 842 and 843: of the encoder-decoder (or Transfor
- Page 844 and 845: In PyTorch, the decoder "layer" is
- Page 846 and 847: In PyTorch, the decoder is implemen
- Page 848 and 849: Equation 10.7 - Data points' means
- Page 850 and 851: layer_norm = nn.LayerNorm(d_model)n
- Page 852 and 853: Figure 10.10 - Layer norm vs batch
- Page 854 and 855: Outputtensor([[[ 1.4636, 2.3663],[
- Page 856 and 857: The TransformerLet’s start with t
- Page 858 and 859: "values") in the decoder.• decode
- Page 860 and 861: Data Preparation1 # Generating trai
- Page 862 and 863: Figure 10.15 - Losses—Transformer
- Page 864 and 865: • First, and most important, PyTo
- Page 866 and 867: decode(), with a single one, encode
- Page 868 and 869: 46 for i in range(self.target_len):
- Page 870 and 871: Figure 10.18 - Losses - PyTorch’s
- Page 874 and 875: 4041 # Builds a weighted random sam
- Page 876 and 877: Figure 10.23 - Sample image—split
- Page 878 and 879: Einops"There is more than one way t
- Page 880 and 881: Figure 10.26 - Two patch embeddings
- Page 882 and 883: Now each sequence has ten elements,
- Page 884 and 885: It takes an instance of a Transform
- Page 886 and 887: Putting It All TogetherIn this chap
- Page 888 and 889: 1. Encoder-DecoderThe encoder-decod
- Page 890 and 891: This is the actual encoder-decoder
- Page 892 and 893: 3. DecoderThe Transformer decoder h
- Page 894 and 895: 5. Encoder "Layer"The encoder "laye
- Page 896 and 897: 7. "Sub-Layer" WrapperThe "sub-laye
- Page 898 and 899: 8. Multi-Headed AttentionThe multi-
- Page 900 and 901: Model Configuration & TrainingModel
- Page 902 and 903: • training the Transformer to tac
- Page 904 and 905: Part IVNatural Language Processing|
- Page 906 and 907: Additional SetupThis is a special c
- Page 908 and 909: "Down the Yellow Brick Rabbit Hole"
- Page 910 and 911: The actual texts of the books are c
- Page 912 and 913: "What is this punkt?"That’s the P
- Page 914 and 915: 14 # If there is a configuration fi
- Page 916 and 917: Sentence Tokenization in spaCyBy th
- Page 918 and 919: AttributesThe Dataset has many attr
- Page 920 and 921: Output{'labels': 1,'sentence': 'The
The data preparation step is exactly the same one we used in Chapter 5:
Data Preparation
1 class TransformedTensorDataset(Dataset):
2 def __init__(self, x, y, transform=None):
3 self.x = x
4 self.y = y
5 self.transform = transform
6
7 def __getitem__(self, index):
8 x = self.x[index]
9 if self.transform:
10 x = self.transform(x)
11
12 return x, self.y[index]
13
14 def __len__(self):
15 return len(self.x)
16
17 # Builds tensors from numpy arrays BEFORE split
18 # Modifies the scale of pixel values from [0, 255] to [0, 1]
19 x_tensor = torch.as_tensor(images / 255).float()
20 y_tensor = torch.as_tensor(labels).long()
21
22 # Uses index_splitter to generate indices for training and
23 # validation sets
24 train_idx, val_idx = index_splitter(len(x_tensor), [80, 20])
25 # Uses indices to perform the split
26 x_train_tensor = x_tensor[train_idx]
27 y_train_tensor = y_tensor[train_idx]
28 x_val_tensor = x_tensor[val_idx]
29 y_val_tensor = y_tensor[val_idx]
30
31 # We're not doing any data augmentation now
32 train_composer = Compose([Normalize(mean=(.5,), std=(.5,))])
33 val_composer = Compose([Normalize(mean=(.5,), std=(.5,))])
34
35 # Uses custom dataset to apply composed transforms to each set
36 train_dataset = TransformedTensorDataset(
37 x_train_tensor, y_train_tensor, transform=train_composer)
38 val_dataset = TransformedTensorDataset(
39 x_val_tensor, y_val_tensor, transform=val_composer)
848 | Chapter 10: Transform and Roll Out