Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
remained unchanged.ResNet (MSRA Team)The trick developed by Kaiming He, et al. was to add residual connections, orshortcuts, to a very deep architecture.We train neural networks with depth of over 150 layers. We propose a "deepresidual learning" framework that eases the optimization and convergence ofextremely deep networks.Source: Results (ILSVRC2015) [120]In a nutshell, it allows the network to more easily learn the identity function. We’llget back to it in the "Residual Connections" section later in this chapter. If you wantto learn more about it, the paper is called "Deep Residual Learning for ImageRecognition." [121]By the way, Kaiming He also has an initialization scheme namedafter him—sometimes referred to as "He initialization," sometimesreferred to as "Kaiming initialization"—and we’ll learn about thosein the next chapter.ImagenetteIf you are looking for a smaller, more manageable dataset that’s ImageNetlike,Imagenette is for you! Developed by Jeremy Howard from fast.ai, it is asubset of ten easily classified classes from ImageNet.You can find it here: https://github.com/fastai/imagenette.Comparing ArchitecturesNow that you’re familiar with some of the popular architectures (many of which arereadily available as Torchvision models), let’s compare their performance (Top-1accuracy %), number of operations in a single forward pass (billions), and sizes (inmillions of parameters). The figure below is very illustrative in this sense.Comparing Architectures | 503
Figure 7.1 - Comparing architectures (size proportional to number of parameters)Source: Data for accuracy and GFLOPs estimates obtained from this report [122] ; numberof parameters (proportional to the size of the circles) obtained from Torchvision’s models.For a more detailed analysis, see Canziani, A., Culurciello, E., Paszke, A. "An Analysis ofDeep Neural Network Models for Practical Applications" [123] (2017).See how massive the VGG models are, both in size and in the number of operationsrequired to deliver a single prediction? On the other hand, check out Inception-V3and ResNet-50's positions in the plot: They would give more bang for your buck.The former has a slightly higher performance, and the latter is slightly faster.These are the models you’re likely using for transfer learning:Inception and ResNet.On the bottom left, there is AlexNet. It was miles ahead of anything else in 2012,but it is not competitive at all anymore."If AlexNet is not competitive, why are you using it to illustratetransfer learning?"A fair point indeed. The reason is, its architectural elements are already familiar toyou, thus making it easier for me to explain how we’re modifying it to fit ourpurposes.504 | Chapter 7: Transfer Learning
- Page 478 and 479: LRFinderThe function we’ve implem
- Page 480 and 481: value in our moving average has an
- Page 482 and 483: Figure 6.15 - Distribution of weigh
- Page 484 and 485: In code, the implementation of the
- Page 486 and 487: As expected, the EWMA without corre
- Page 488 and 489: optimizer = optim.Adam(model.parame
- Page 490 and 491: IMPORTANT: The logging function mus
- Page 492 and 493: Output{'state': {140601337662512: {
- Page 494 and 495: different optimizer, set them to ca
- Page 496 and 497: • dampening: dampening factor for
- Page 498 and 499: Figure 6.20 - Paths taken by SGD (w
- Page 500 and 501: Equation 6.16 - Looking aheadOnce N
- Page 502 and 503: Figure 6.22 - Path taken by each SG
- Page 504 and 505: for epoch in range(4):# training lo
- Page 506 and 507: course) up to a given number of epo
- Page 508 and 509: Next, we create a protected method
- Page 510 and 511: Mini-Batch SchedulersThese schedule
- Page 512 and 513: Schedulers in StepByStep — Part I
- Page 514 and 515: Scheduler PathsBefore trying out a
- Page 516 and 517: After applying each scheduler to SG
- Page 518 and 519: Data Preparation1 # Loads temporary
- Page 520 and 521: Figure 6.31 - LossesEvaluationprint
- Page 522 and 523: [96] http://www.samkass.com/theorie
- Page 524 and 525: ImportsFor the sake of organization
- Page 526 and 527: ILSVRC-2012The 2012 edition [111] o
- Page 530 and 531: Transfer Learning in PracticeIn Cha
- Page 532 and 533: dropout. You’re already familiar
- Page 534 and 535: OutputDownloading: "https://downloa
- Page 536 and 537: Replacing the "Top" of the Model1 a
- Page 538 and 539: Model Size Classifier Layer(s) Repl
- Page 540 and 541: Model TrainingWe have everything se
- Page 542 and 543: "Removing" the Top Layer1 alex.clas
- Page 544 and 545: torch.save(train_preproc.tensors, '
- Page 546 and 547: Outputtensor([[109, 124],[124, 124]
- Page 548 and 549: Model Configuration1 optimizer_mode
- Page 550 and 551: Figure 7.4 - 1x1 convolutionThe inp
- Page 552 and 553: The weights used by PIL are 0.299 f
- Page 554 and 555: • reduce the number of output cha
- Page 556 and 557: The constructor method defines the
- Page 558 and 559: Does it sound familiar? That’s wh
- Page 560 and 561: and w to represent these parameters
- Page 562 and 563: A mini-batch of size 64 is small en
- Page 564 and 565: normed1 = batch_normalizer(batch1[0
- Page 566 and 567: OutputOrderedDict([('running_mean',
- Page 568 and 569: OutputOrderedDict([('running_mean',
- Page 570 and 571: batch_normalizer = nn.BatchNorm2d(n
- Page 572 and 573: torch.manual_seed(23)dummy_points =
- Page 574 and 575: np.concatenate([dummy_points[:5].nu
- Page 576 and 577: Another advantage of these shortcut
remained unchanged.
ResNet (MSRA Team)
The trick developed by Kaiming He, et al. was to add residual connections, or
shortcuts, to a very deep architecture.
We train neural networks with depth of over 150 layers. We propose a "deep
residual learning" framework that eases the optimization and convergence of
extremely deep networks.
Source: Results (ILSVRC2015) [120]
In a nutshell, it allows the network to more easily learn the identity function. We’ll
get back to it in the "Residual Connections" section later in this chapter. If you want
to learn more about it, the paper is called "Deep Residual Learning for Image
Recognition." [121]
By the way, Kaiming He also has an initialization scheme named
after him—sometimes referred to as "He initialization," sometimes
referred to as "Kaiming initialization"—and we’ll learn about those
in the next chapter.
Imagenette
If you are looking for a smaller, more manageable dataset that’s ImageNetlike,
Imagenette is for you! Developed by Jeremy Howard from fast.ai, it is a
subset of ten easily classified classes from ImageNet.
You can find it here: https://github.com/fastai/imagenette.
Comparing Architectures
Now that you’re familiar with some of the popular architectures (many of which are
readily available as Torchvision models), let’s compare their performance (Top-1
accuracy %), number of operations in a single forward pass (billions), and sizes (in
millions of parameters). The figure below is very illustrative in this sense.
Comparing Architectures | 503