Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
4041 # Builds a weighted random sampler to handle imbalanced classes42 sampler = make_balanced_sampler(y_train_tensor)4344 # Uses sampler in the training set to get a balanced data loader45 train_loader = DataLoader(46 dataset=train_dataset, batch_size=16, sampler=sampler)47 val_loader = DataLoader(dataset=val_dataset, batch_size=16)PatchesThere are different ways of breaking up an image into patches. The moststraightforward one is simply rearranging the pixels, so let’s start with that one.RearrangingTensorflow has a utility function called tf.image.extract_patches() that does thejob, and we’re implementing a simplified version of this function in PyTorch withtensor.unfold() (using only a kernel size and a stride, but no padding or anythingelse):# Adapted from https://discuss.pytorch.org/t/tf-extract-image-# patches-in-pytorch/43837def extract_image_patches(x, kernel_size, stride=1):# Extract patchespatches = x.unfold(2, kernel_size, stride)patches = patches.unfold(3, kernel_size, stride)patches = patches.permute(0, 2, 3, 1, 4, 5).contiguous()return patches.view(n, patches.shape[1], patches.shape[2], -1)It works as if we were applying a convolution to the image. Each patch is actually areceptive field (the region the filter is moving over to convolve), but, instead ofconvolving the region, we’re just taking it as it is. The kernel size is the patch size,and the number of patches depends on the stride—the smaller the stride, the morepatches. If the stride matches the kernel size, we’re effectively breaking up theimage into non-overlapping patches, so let’s do that:Vision Transformer | 849
kernel_size = 4patches = extract_image_patches(img, kernel_size, stride=kernel_size)patches.shapeOutputtorch.Size([1, 3, 3, 16])Since kernel size is four, each patch has 16 pixels, and there are nine patches intotal. Even though each patch is a tensor of 16 elements, if we plot them as if theywere four-by-four images instead, it would look like this.Figure 10.22 - Sample image—split into patchesIt is very easy to see how the image was broken up in the figure above. In reality,though, the Transformer needs a sequence of flattened patches. Let’s reshapethem:seq_patches = patches.view(-1, patches.size(-1))850 | Chapter 10: Transform and Roll Out
- Page 824 and 825: Figure 10.2 - Chunking: the wrong a
- Page 826 and 827: chunks to compute the other half of
- Page 828 and 829: 67 # N, L, n_heads, d_k68 context =
- Page 830 and 831: dummy_points = torch.randn(16, 2, 4
- Page 832 and 833: Stacking Encoders and DecodersLet
- Page 834 and 835: "… with great depth comes great c
- Page 836 and 837: Transformer EncoderWe’ll be repre
- Page 838 and 839: Let’s see it in code, starting wi
- Page 840 and 841: Transformer Encoder1 class EncoderT
- Page 842 and 843: of the encoder-decoder (or Transfor
- Page 844 and 845: In PyTorch, the decoder "layer" is
- Page 846 and 847: In PyTorch, the decoder is implemen
- Page 848 and 849: Equation 10.7 - Data points' means
- Page 850 and 851: layer_norm = nn.LayerNorm(d_model)n
- Page 852 and 853: Figure 10.10 - Layer norm vs batch
- Page 854 and 855: Outputtensor([[[ 1.4636, 2.3663],[
- Page 856 and 857: The TransformerLet’s start with t
- Page 858 and 859: "values") in the decoder.• decode
- Page 860 and 861: Data Preparation1 # Generating trai
- Page 862 and 863: Figure 10.15 - Losses—Transformer
- Page 864 and 865: • First, and most important, PyTo
- Page 866 and 867: decode(), with a single one, encode
- Page 868 and 869: 46 for i in range(self.target_len):
- Page 870 and 871: Figure 10.18 - Losses - PyTorch’s
- Page 872 and 873: Figure 10.20 - Sample image—label
- Page 876 and 877: Figure 10.23 - Sample image—split
- Page 878 and 879: Einops"There is more than one way t
- Page 880 and 881: Figure 10.26 - Two patch embeddings
- Page 882 and 883: Now each sequence has ten elements,
- Page 884 and 885: It takes an instance of a Transform
- Page 886 and 887: Putting It All TogetherIn this chap
- Page 888 and 889: 1. Encoder-DecoderThe encoder-decod
- Page 890 and 891: This is the actual encoder-decoder
- Page 892 and 893: 3. DecoderThe Transformer decoder h
- Page 894 and 895: 5. Encoder "Layer"The encoder "laye
- Page 896 and 897: 7. "Sub-Layer" WrapperThe "sub-laye
- Page 898 and 899: 8. Multi-Headed AttentionThe multi-
- Page 900 and 901: Model Configuration & TrainingModel
- Page 902 and 903: • training the Transformer to tac
- Page 904 and 905: Part IVNatural Language Processing|
- Page 906 and 907: Additional SetupThis is a special c
- Page 908 and 909: "Down the Yellow Brick Rabbit Hole"
- Page 910 and 911: The actual texts of the books are c
- Page 912 and 913: "What is this punkt?"That’s the P
- Page 914 and 915: 14 # If there is a configuration fi
- Page 916 and 917: Sentence Tokenization in spaCyBy th
- Page 918 and 919: AttributesThe Dataset has many attr
- Page 920 and 921: Output{'labels': 1,'sentence': 'The
- Page 922 and 923: elements from the text. But preproc
kernel_size = 4
patches = extract_image_patches(
img, kernel_size, stride=kernel_size
)
patches.shape
Output
torch.Size([1, 3, 3, 16])
Since kernel size is four, each patch has 16 pixels, and there are nine patches in
total. Even though each patch is a tensor of 16 elements, if we plot them as if they
were four-by-four images instead, it would look like this.
Figure 10.22 - Sample image—split into patches
It is very easy to see how the image was broken up in the figure above. In reality,
though, the Transformer needs a sequence of flattened patches. Let’s reshape
them:
seq_patches = patches.view(-1, patches.size(-1))
850 | Chapter 10: Transform and Roll Out