Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
Figure 5.15 - LeNet-5 architectureSource: Generated using Alexander Lenail’s NN-SVG and adapted by the author. Formore details, see LeCun, Y., et al. (1998). "Gradient-based learning applied todocument recognition," Proceedings of the IEEE, 86(11), 2278–2324. [92]Do you see anything familiar? The typical convolutional blocks are already there(to some extent): convolutions (C layers), activation functions (not shown), andsubsampling (S layers). There are some differences, though:• Back then, the subsampling was more complex than today’s max pooling, butthe general idea still holds.• The activation function, a sigmoid at the time, was applied after thesubsampling instead of before, as is typical today.• The F6 and Output layers were connected by something called Gaussianconnections, which is more complex than the typical activation function onewould use today.Typical Architecture | 369
Adapting LeNet-5 to today’s standards, it could be implemented like this:lenet = nn.Sequential()# Featurizer# Block 1: 1@28x28 -> 6@28x28 -> 6@14x14lenet.add_module('C1',nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, padding=2))lenet.add_module('func1', nn.ReLU())lenet.add_module('S2', nn.MaxPool2d(kernel_size=2))# Block 2: 6@14x14 -> 16@10x10 -> 16@5x5lenet.add_module('C3',nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5))lenet.add_module('func2', nn.ReLU())lenet.add_module('S4', nn.MaxPool2d(kernel_size=2))# Block 3: 16@5x5 -> 120@1x1lenet.add_module('C5',nn.Conv2d(in_channels=16, out_channels=120, kernel_size=5))lenet.add_module('func2', nn.ReLU())# Flatteninglenet.add_module('flatten', nn.Flatten())# Classification# Hidden Layerlenet.add_module('F6', nn.Linear(in_features=120, out_features=84))lenet.add_module('func3', nn.ReLU())# Output Layerlenet.add_module('OUTPUT',nn.Linear(in_features=84, out_features=10))LeNet-5 used three convolutional blocks, although the last one does not have amax pooling, because the convolution already produces a single pixel. Regardingthe number of channels, they increase as the image size decreases:• input image: single-channel 28x28 pixels• first block: produces six-channel 14x14 pixels370 | Chapter 5: Convolutions
- Page 344 and 345: As you can see, in PyTorch the coef
- Page 346 and 347: Figure 4.16 - Deep model (for real)
- Page 348 and 349: Figure 4.18 - Losses (before and af
- Page 350 and 351: Equation 4.3 - Activation functions
- Page 352 and 353: Helper Function #41 def index_split
- Page 354 and 355: Model Configuration1 # Sets learnin
- Page 356 and 357: Bonus ChapterFeature SpaceThis chap
- Page 358 and 359: Affine TransformationsAn affine tra
- Page 360 and 361: Figure B.3 - Annotated model diagra
- Page 362 and 363: Figure B.5 - In the beginning…But
- Page 364 and 365: OK, now we can clearly see a differ
- Page 366 and 367: In the model above, the sigmoid fun
- Page 368 and 369: the more dimensions, the more separ
- Page 370 and 371: import randomimport numpy as npfrom
- Page 372 and 373: identity = np.array([[[[0, 0, 0],[0
- Page 374 and 375: Figure 5.4 - Striding the image, on
- Page 376 and 377: Output-----------------------------
- Page 378 and 379: Outputtensor([[[[9., 5., 0., 7.],[0
- Page 380 and 381: OutputParameter containing:tensor([
- Page 382 and 383: Moreover, notice that if we were to
- Page 384 and 385: In code, as usual, PyTorch gives us
- Page 386 and 387: Outputtensor([[[[5., 5., 0., 8., 7.
- Page 388 and 389: edge = np.array([[[[0, 1, 0],[1, -4
- Page 390 and 391: A pooling kernel of two-by-two resu
- Page 392 and 393: Outputtensor([[22., 23., 11., 24.,
- Page 396 and 397: • second block: produces 16-chann
- Page 398 and 399: Transformed Dataset1 class Transfor
- Page 400 and 401: LossNew problem, new loss. Since we
- Page 402 and 403: Outputtensor([4.0000, 1.0000, 0.500
- Page 404 and 405: The loss only considers the predict
- Page 406 and 407: Outputtensor([[-1.5229, -0.3146, -2
- Page 408 and 409: IMPORTANT: I can’t stress this en
- Page 410 and 411: figures at the beginning of this ch
- Page 412 and 413: The three units in the output layer
- Page 414 and 415: StepByStep Method@staticmethoddef _
- Page 416 and 417: The meow() method is totally indepe
- Page 418 and 419: StepByStep Methoddef visualize_filt
- Page 420 and 421: dummy_model = nn.Linear(1, 1)dummy_
- Page 422 and 423: dummy_listOutput[(Linear(in_feature
- Page 424 and 425: Output{Conv2d(1, 1, kernel_size=(3,
- Page 426 and 427: will be the externally defined vari
- Page 428 and 429: Removing Hookssbs_cnn1.remove_hooks
- Page 430 and 431: return figsetattr(StepByStep, 'visu
- Page 432 and 433: Figure 5.22 - Feature maps (classif
- Page 434 and 435: classification: The predicted class
- Page 436 and 437: convolutional layers to our model a
- Page 438 and 439: Capturing Outputsfeaturizer_layers
- Page 440 and 441: the filters learned by the model pr
- Page 442 and 443: given chapter are imported at its v
Adapting LeNet-5 to today’s standards, it could be implemented like this:
lenet = nn.Sequential()
# Featurizer
# Block 1: 1@28x28 -> 6@28x28 -> 6@14x14
lenet.add_module('C1',
nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, padding=2)
)
lenet.add_module('func1', nn.ReLU())
lenet.add_module('S2', nn.MaxPool2d(kernel_size=2))
# Block 2: 6@14x14 -> 16@10x10 -> 16@5x5
lenet.add_module('C3',
nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5)
)
lenet.add_module('func2', nn.ReLU())
lenet.add_module('S4', nn.MaxPool2d(kernel_size=2))
# Block 3: 16@5x5 -> 120@1x1
lenet.add_module('C5',
nn.Conv2d(in_channels=16, out_channels=120, kernel_size=5)
)
lenet.add_module('func2', nn.ReLU())
# Flattening
lenet.add_module('flatten', nn.Flatten())
# Classification
# Hidden Layer
lenet.add_module('F6', nn.Linear(in_features=120, out_features=84))
lenet.add_module('func3', nn.ReLU())
# Output Layer
lenet.add_module('OUTPUT',
nn.Linear(in_features=84, out_features=10)
)
LeNet-5 used three convolutional blocks, although the last one does not have a
max pooling, because the convolution already produces a single pixel. Regarding
the number of channels, they increase as the image size decreases:
• input image: single-channel 28x28 pixels
• first block: produces six-channel 14x14 pixels
370 | Chapter 5: Convolutions