22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Output

tensor([[22., 23., 11., 24., 7., 1., 13., 13., 13.]])

Dimensions

We’ve performed convolutions, padding, and pooling in two dimensions because

we’re handling images. But there are one- and three-dimensional versions of some

of them as well:

• nn.Conv1d and F.conv1d(); nn.Conv3d and F.conv3d()

• nn.ConstandPad1d and nn.ConstandPad3d

• nn.ReplicationPad1d and nn.ReplicationPad3d

• nn.ReflectionPad1d

• nn.MaxPool1d and F.max_pool1d(); nn.MaxPool3d and F.max_pool3d()

• nn.AvgPool1d and F.avg_pool1d(); nn.AvgPool3d and F.avg_pool3d()

We will not venture into the third dimension in this book, but we’ll get back to onedimension

operations later.

"Aren’t color images three-dimensional since they have three

channels?"

Well, yes, but we will still be applying two-dimensional convolutions to them. We’ll

go through a detailed example using a three-channel image in the next chapter.

Typical Architecture

A typical architecture uses a sequence of one or more typical convolutional

blocks, with each block consisting of three operations:

1. Convolution

2. Activation function

3. Pooling

As images go through these operations, they will shrink in size. After three of these

blocks (assuming kernel size of two for pooling), for instance, an image will be

Dimensions | 367

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!