Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

ILSVRC-2012The 2012 edition [111] of the ILSVRC is probably the most popular of them all. Itswinner, the architecture dubbed AlexNet, represented a milestone for imageclassification, sharply reducing classification error. The training data had 1.2million images belonging to 1,000 categories (it is actually a subset of the ImageNetdataset).AlexNet (SuperVision Team)This architecture was developed by the SuperVision team, composed of AlexKrizhevsky, Ilya Sutskever, and Geoffrey Hinton from the University of Toronto(now you know why it’s called AlexNet). Here is their model’s description:Our model is a large, deep convolutional neural network trained on raw RGBpixel values. The neural network, which has 60 million parameters and650,000 neurons, consists of five convolutional layers, some of which arefollowed by max pooling layers, and three globally-connected layers with afinal 1000-way softmax. It was trained on two NVIDIA GPUs for about aweek. To make training faster, we used non-saturating neurons and a veryefficient GPU implementation of convolutional nets. To reduce overfitting inthe globally-connected layers we employed hidden-unit "dropout", arecently-developed regularization method that proved to be very effective.Source: Results (ILSVRC2012) [112]You should be able to recognize all the elements in the description: five typicalconvolutional blocks (convolution, activation function, and max pooling)corresponding to the "featurizer" part of the model, three hidden (linear) layerscombined with dropout layers corresponding to the "classifier" part of the model,and the softmax output layer typical of multiclass classification problems.It is pretty much the fancier model from Chapter 6 but on steroids! We’ll be usingAlexNet to demonstrate how to use a pre-trained model. If you’re interested inlearning more about AlexNet, the paper is called "ImageNet Classification withDeep Convolutional Neural Networks." [113]ILSVRC-2014The 2014 edition [114] gave rise to two now household names when it comes toarchitectures for computer vision problems: VGG and Inception. The training dataImageNet Large Scale Visual Recognition Challenge (ILSVRC) | 501

had 1.2 million images belonging to 1,000 categories, just like the 2012 edition.VGGThe architecture developed by Karen Simonyan and Andrew Zisserman from theOxford Vision Geometry Group (VGG) is pretty much an even larger or, better yet,deeper model than AlexNet (and now you know the origin of yet anotherarchitecture name). Their goal is made crystal clear in their model’s description:…we explore the effect of the convolutional network (ConvNet) depth on itsaccuracy.Source: Results (ILSVRC2014) [115]VGG models are massive, so we’re not paying much attention to them here. If youwant to learn more about it, its paper is called "Very Deep Convolutional Networksfor Large-Scale Image Recognition." [116]Inception (GoogLeNet Team)The Inception architecture is probably the one with the best meme of all: "We needto go deeper." The authors, Christian Szegedy, et al., like the VGG team, wanted totrain a deeper model. But they came up with a clever way of doing it (highlights aremine):Additional dimension reduction layers based on embedding learningintuition allow us to increase both the depth and the width of the networksignificantly without incurring significant computational overhead.Source: Results (ILSVRC2014) [117]If you want to learn more about it, the paper is called "Going Deeper withConvolutions." [118]"What are these dimension-reduction layers?"No worries, we’ll get back to it in the "Inception Modules" section.ILSVRC-2015The 2015 edition [119] popularized residual connections in its aptly namedarchitecture: Res(idual) Net(work). The training data used in the competition502 | Chapter 7: Transfer Learning

had 1.2 million images belonging to 1,000 categories, just like the 2012 edition.

VGG

The architecture developed by Karen Simonyan and Andrew Zisserman from the

Oxford Vision Geometry Group (VGG) is pretty much an even larger or, better yet,

deeper model than AlexNet (and now you know the origin of yet another

architecture name). Their goal is made crystal clear in their model’s description:

…we explore the effect of the convolutional network (ConvNet) depth on its

accuracy.

Source: Results (ILSVRC2014) [115]

VGG models are massive, so we’re not paying much attention to them here. If you

want to learn more about it, its paper is called "Very Deep Convolutional Networks

for Large-Scale Image Recognition." [116]

Inception (GoogLeNet Team)

The Inception architecture is probably the one with the best meme of all: "We need

to go deeper." The authors, Christian Szegedy, et al., like the VGG team, wanted to

train a deeper model. But they came up with a clever way of doing it (highlights are

mine):

Additional dimension reduction layers based on embedding learning

intuition allow us to increase both the depth and the width of the network

significantly without incurring significant computational overhead.

Source: Results (ILSVRC2014) [117]

If you want to learn more about it, the paper is called "Going Deeper with

Convolutions." [118]

"What are these dimension-reduction layers?"

No worries, we’ll get back to it in the "Inception Modules" section.

ILSVRC-2015

The 2015 edition [119] popularized residual connections in its aptly named

architecture: Res(idual) Net(work). The training data used in the competition

502 | Chapter 7: Transfer Learning

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!