Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

4041 # Builds a weighted random sampler to handle imbalanced classes42 sampler = make_balanced_sampler(y_train_tensor)4344 # Uses sampler in the training set to get a balanced data loader45 train_loader = DataLoader(46 dataset=train_dataset, batch_size=16, sampler=sampler)47 val_loader = DataLoader(dataset=val_dataset, batch_size=16)PatchesThere are different ways of breaking up an image into patches. The moststraightforward one is simply rearranging the pixels, so let’s start with that one.RearrangingTensorflow has a utility function called tf.image.extract_patches() that does thejob, and we’re implementing a simplified version of this function in PyTorch withtensor.unfold() (using only a kernel size and a stride, but no padding or anythingelse):# Adapted from https://discuss.pytorch.org/t/tf-extract-image-# patches-in-pytorch/43837def extract_image_patches(x, kernel_size, stride=1):# Extract patchespatches = x.unfold(2, kernel_size, stride)patches = patches.unfold(3, kernel_size, stride)patches = patches.permute(0, 2, 3, 1, 4, 5).contiguous()return patches.view(n, patches.shape[1], patches.shape[2], -1)It works as if we were applying a convolution to the image. Each patch is actually areceptive field (the region the filter is moving over to convolve), but, instead ofconvolving the region, we’re just taking it as it is. The kernel size is the patch size,and the number of patches depends on the stride—the smaller the stride, the morepatches. If the stride matches the kernel size, we’re effectively breaking up theimage into non-overlapping patches, so let’s do that:Vision Transformer | 849

kernel_size = 4patches = extract_image_patches(img, kernel_size, stride=kernel_size)patches.shapeOutputtorch.Size([1, 3, 3, 16])Since kernel size is four, each patch has 16 pixels, and there are nine patches intotal. Even though each patch is a tensor of 16 elements, if we plot them as if theywere four-by-four images instead, it would look like this.Figure 10.22 - Sample image—split into patchesIt is very easy to see how the image was broken up in the figure above. In reality,though, the Transformer needs a sequence of flattened patches. Let’s reshapethem:seq_patches = patches.view(-1, patches.size(-1))850 | Chapter 10: Transform and Roll Out

kernel_size = 4

patches = extract_image_patches(

img, kernel_size, stride=kernel_size

)

patches.shape

Output

torch.Size([1, 3, 3, 16])

Since kernel size is four, each patch has 16 pixels, and there are nine patches in

total. Even though each patch is a tensor of 16 elements, if we plot them as if they

were four-by-four images instead, it would look like this.

Figure 10.22 - Sample image—split into patches

It is very easy to see how the image was broken up in the figure above. In reality,

though, the Transformer needs a sequence of flattened patches. Let’s reshape

them:

seq_patches = patches.view(-1, patches.size(-1))

850 | Chapter 10: Transform and Roll Out

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!