Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

4041 # Builds a weighted random sampler to handle imbalanced classes42 sampler = make_balanced_sampler(y_train_tensor)4344 # Uses sampler in the training set to get a balanced data loader45 train_loader = DataLoader(46 dataset=train_dataset, batch_size=16, sampler=sampler)47 val_loader = DataLoader(dataset=val_dataset, batch_size=16)PatchesThere are different ways of breaking up an image into patches. The moststraightforward one is simply rearranging the pixels, so let’s start with that one.RearrangingTensorflow has a utility function called tf.image.extract_patches() that does thejob, and we’re implementing a simplified version of this function in PyTorch withtensor.unfold() (using only a kernel size and a stride, but no padding or anythingelse):# Adapted from https://discuss.pytorch.org/t/tf-extract-image-# patches-in-pytorch/43837def extract_image_patches(x, kernel_size, stride=1):# Extract patchespatches = x.unfold(2, kernel_size, stride)patches = patches.unfold(3, kernel_size, stride)patches = patches.permute(0, 2, 3, 1, 4, 5).contiguous()return patches.view(n, patches.shape[1], patches.shape[2], -1)It works as if we were applying a convolution to the image. Each patch is actually areceptive field (the region the filter is moving over to convolve), but, instead ofconvolving the region, we’re just taking it as it is. The kernel size is the patch size,and the number of patches depends on the stride—the smaller the stride, the morepatches. If the stride matches the kernel size, we’re effectively breaking up theimage into non-overlapping patches, so let’s do that:Vision Transformer | 849

kernel_size = 4patches = extract_image_patches(img, kernel_size, stride=kernel_size)patches.shapeOutputtorch.Size([1, 3, 3, 16])Since kernel size is four, each patch has 16 pixels, and there are nine patches intotal. Even though each patch is a tensor of 16 elements, if we plot them as if theywere four-by-four images instead, it would look like this.Figure 10.22 - Sample image—split into patchesIt is very easy to see how the image was broken up in the figure above. In reality,though, the Transformer needs a sequence of flattened patches. Let’s reshapethem:seq_patches = patches.view(-1, patches.size(-1))850 | Chapter 10: Transform and Roll Out

40

41 # Builds a weighted random sampler to handle imbalanced classes

42 sampler = make_balanced_sampler(y_train_tensor)

43

44 # Uses sampler in the training set to get a balanced data loader

45 train_loader = DataLoader(

46 dataset=train_dataset, batch_size=16, sampler=sampler)

47 val_loader = DataLoader(dataset=val_dataset, batch_size=16)

Patches

There are different ways of breaking up an image into patches. The most

straightforward one is simply rearranging the pixels, so let’s start with that one.

Rearranging

Tensorflow has a utility function called tf.image.extract_patches() that does the

job, and we’re implementing a simplified version of this function in PyTorch with

tensor.unfold() (using only a kernel size and a stride, but no padding or anything

else):

# Adapted from https://discuss.pytorch.org/t/tf-extract-image-

# patches-in-pytorch/43837

def extract_image_patches(x, kernel_size, stride=1):

# Extract patches

patches = x.unfold(2, kernel_size, stride)

patches = patches.unfold(3, kernel_size, stride)

patches = patches.permute(0, 2, 3, 1, 4, 5).contiguous()

return patches.view(n, patches.shape[1], patches.shape[2], -1)

It works as if we were applying a convolution to the image. Each patch is actually a

receptive field (the region the filter is moving over to convolve), but, instead of

convolving the region, we’re just taking it as it is. The kernel size is the patch size,

and the number of patches depends on the stride—the smaller the stride, the more

patches. If the stride matches the kernel size, we’re effectively breaking up the

image into non-overlapping patches, so let’s do that:

Vision Transformer | 849

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!