22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The patch embedding is not the original receptive field anymore, but rather the

convolved receptive field. After convolving the image, given a kernel size and a

stride, the patches get flattened, so we end up with the same sequence of nine

patches:

torch.manual_seed(13)

patch_embed = PatchEmbed(

img.size(-1), kernel_size, 1, kernel_size**2

)

embedded = patch_embed(img)

embedded.shape

Output

torch.Size([1, 9, 16])

Figure 10.24 - Sample image—split into a sequence of patch embeddings

But the patches are linear projections of the original pixel values now. We’re

projecting each 16-pixel patch into a 16-dimensional feature space—we’re not

changing the number of dimensions, but using different linear combinations of the

original dimensions.

The original image had 144 pixels and was split into nine patches

of 16 pixels each. Each patch embedding still has 16 dimensions,

so, overall, each image is still represented by 144 values. This is

by no means a coincidence: The idea behind this choice of values

is to preserve the dimensionality of the inputs.

852 | Chapter 10: Transform and Roll Out

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!