Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
ILSVRC-2012The 2012 edition [111] of the ILSVRC is probably the most popular of them all. Itswinner, the architecture dubbed AlexNet, represented a milestone for imageclassification, sharply reducing classification error. The training data had 1.2million images belonging to 1,000 categories (it is actually a subset of the ImageNetdataset).AlexNet (SuperVision Team)This architecture was developed by the SuperVision team, composed of AlexKrizhevsky, Ilya Sutskever, and Geoffrey Hinton from the University of Toronto(now you know why it’s called AlexNet). Here is their model’s description:Our model is a large, deep convolutional neural network trained on raw RGBpixel values. The neural network, which has 60 million parameters and650,000 neurons, consists of five convolutional layers, some of which arefollowed by max pooling layers, and three globally-connected layers with afinal 1000-way softmax. It was trained on two NVIDIA GPUs for about aweek. To make training faster, we used non-saturating neurons and a veryefficient GPU implementation of convolutional nets. To reduce overfitting inthe globally-connected layers we employed hidden-unit "dropout", arecently-developed regularization method that proved to be very effective.Source: Results (ILSVRC2012) [112]You should be able to recognize all the elements in the description: five typicalconvolutional blocks (convolution, activation function, and max pooling)corresponding to the "featurizer" part of the model, three hidden (linear) layerscombined with dropout layers corresponding to the "classifier" part of the model,and the softmax output layer typical of multiclass classification problems.It is pretty much the fancier model from Chapter 6 but on steroids! We’ll be usingAlexNet to demonstrate how to use a pre-trained model. If you’re interested inlearning more about AlexNet, the paper is called "ImageNet Classification withDeep Convolutional Neural Networks." [113]ILSVRC-2014The 2014 edition [114] gave rise to two now household names when it comes toarchitectures for computer vision problems: VGG and Inception. The training dataImageNet Large Scale Visual Recognition Challenge (ILSVRC) | 501
had 1.2 million images belonging to 1,000 categories, just like the 2012 edition.VGGThe architecture developed by Karen Simonyan and Andrew Zisserman from theOxford Vision Geometry Group (VGG) is pretty much an even larger or, better yet,deeper model than AlexNet (and now you know the origin of yet anotherarchitecture name). Their goal is made crystal clear in their model’s description:…we explore the effect of the convolutional network (ConvNet) depth on itsaccuracy.Source: Results (ILSVRC2014) [115]VGG models are massive, so we’re not paying much attention to them here. If youwant to learn more about it, its paper is called "Very Deep Convolutional Networksfor Large-Scale Image Recognition." [116]Inception (GoogLeNet Team)The Inception architecture is probably the one with the best meme of all: "We needto go deeper." The authors, Christian Szegedy, et al., like the VGG team, wanted totrain a deeper model. But they came up with a clever way of doing it (highlights aremine):Additional dimension reduction layers based on embedding learningintuition allow us to increase both the depth and the width of the networksignificantly without incurring significant computational overhead.Source: Results (ILSVRC2014) [117]If you want to learn more about it, the paper is called "Going Deeper withConvolutions." [118]"What are these dimension-reduction layers?"No worries, we’ll get back to it in the "Inception Modules" section.ILSVRC-2015The 2015 edition [119] popularized residual connections in its aptly namedarchitecture: Res(idual) Net(work). The training data used in the competition502 | Chapter 7: Transfer Learning
- Page 476 and 477: ax.set_xlabel('Learning Rate')ax.se
- Page 478 and 479: LRFinderThe function we’ve implem
- Page 480 and 481: value in our moving average has an
- Page 482 and 483: Figure 6.15 - Distribution of weigh
- Page 484 and 485: In code, the implementation of the
- Page 486 and 487: As expected, the EWMA without corre
- Page 488 and 489: optimizer = optim.Adam(model.parame
- Page 490 and 491: IMPORTANT: The logging function mus
- Page 492 and 493: Output{'state': {140601337662512: {
- Page 494 and 495: different optimizer, set them to ca
- Page 496 and 497: • dampening: dampening factor for
- Page 498 and 499: Figure 6.20 - Paths taken by SGD (w
- Page 500 and 501: Equation 6.16 - Looking aheadOnce N
- Page 502 and 503: Figure 6.22 - Path taken by each SG
- Page 504 and 505: for epoch in range(4):# training lo
- Page 506 and 507: course) up to a given number of epo
- Page 508 and 509: Next, we create a protected method
- Page 510 and 511: Mini-Batch SchedulersThese schedule
- Page 512 and 513: Schedulers in StepByStep — Part I
- Page 514 and 515: Scheduler PathsBefore trying out a
- Page 516 and 517: After applying each scheduler to SG
- Page 518 and 519: Data Preparation1 # Loads temporary
- Page 520 and 521: Figure 6.31 - LossesEvaluationprint
- Page 522 and 523: [96] http://www.samkass.com/theorie
- Page 524 and 525: ImportsFor the sake of organization
- Page 528 and 529: remained unchanged.ResNet (MSRA Tea
- Page 530 and 531: Transfer Learning in PracticeIn Cha
- Page 532 and 533: dropout. You’re already familiar
- Page 534 and 535: OutputDownloading: "https://downloa
- Page 536 and 537: Replacing the "Top" of the Model1 a
- Page 538 and 539: Model Size Classifier Layer(s) Repl
- Page 540 and 541: Model TrainingWe have everything se
- Page 542 and 543: "Removing" the Top Layer1 alex.clas
- Page 544 and 545: torch.save(train_preproc.tensors, '
- Page 546 and 547: Outputtensor([[109, 124],[124, 124]
- Page 548 and 549: Model Configuration1 optimizer_mode
- Page 550 and 551: Figure 7.4 - 1x1 convolutionThe inp
- Page 552 and 553: The weights used by PIL are 0.299 f
- Page 554 and 555: • reduce the number of output cha
- Page 556 and 557: The constructor method defines the
- Page 558 and 559: Does it sound familiar? That’s wh
- Page 560 and 561: and w to represent these parameters
- Page 562 and 563: A mini-batch of size 64 is small en
- Page 564 and 565: normed1 = batch_normalizer(batch1[0
- Page 566 and 567: OutputOrderedDict([('running_mean',
- Page 568 and 569: OutputOrderedDict([('running_mean',
- Page 570 and 571: batch_normalizer = nn.BatchNorm2d(n
- Page 572 and 573: torch.manual_seed(23)dummy_points =
- Page 574 and 575: np.concatenate([dummy_points[:5].nu
had 1.2 million images belonging to 1,000 categories, just like the 2012 edition.
VGG
The architecture developed by Karen Simonyan and Andrew Zisserman from the
Oxford Vision Geometry Group (VGG) is pretty much an even larger or, better yet,
deeper model than AlexNet (and now you know the origin of yet another
architecture name). Their goal is made crystal clear in their model’s description:
…we explore the effect of the convolutional network (ConvNet) depth on its
accuracy.
Source: Results (ILSVRC2014) [115]
VGG models are massive, so we’re not paying much attention to them here. If you
want to learn more about it, its paper is called "Very Deep Convolutional Networks
for Large-Scale Image Recognition." [116]
Inception (GoogLeNet Team)
The Inception architecture is probably the one with the best meme of all: "We need
to go deeper." The authors, Christian Szegedy, et al., like the VGG team, wanted to
train a deeper model. But they came up with a clever way of doing it (highlights are
mine):
Additional dimension reduction layers based on embedding learning
intuition allow us to increase both the depth and the width of the network
significantly without incurring significant computational overhead.
Source: Results (ILSVRC2014) [117]
If you want to learn more about it, the paper is called "Going Deeper with
Convolutions." [118]
"What are these dimension-reduction layers?"
No worries, we’ll get back to it in the "Inception Modules" section.
ILSVRC-2015
The 2015 edition [119] popularized residual connections in its aptly named
architecture: Res(idual) Net(work). The training data used in the competition
502 | Chapter 7: Transfer Learning