22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ILSVRC-2012

The 2012 edition [111] of the ILSVRC is probably the most popular of them all. Its

winner, the architecture dubbed AlexNet, represented a milestone for image

classification, sharply reducing classification error. The training data had 1.2

million images belonging to 1,000 categories (it is actually a subset of the ImageNet

dataset).

AlexNet (SuperVision Team)

This architecture was developed by the SuperVision team, composed of Alex

Krizhevsky, Ilya Sutskever, and Geoffrey Hinton from the University of Toronto

(now you know why it’s called AlexNet). Here is their model’s description:

Our model is a large, deep convolutional neural network trained on raw RGB

pixel values. The neural network, which has 60 million parameters and

650,000 neurons, consists of five convolutional layers, some of which are

followed by max pooling layers, and three globally-connected layers with a

final 1000-way softmax. It was trained on two NVIDIA GPUs for about a

week. To make training faster, we used non-saturating neurons and a very

efficient GPU implementation of convolutional nets. To reduce overfitting in

the globally-connected layers we employed hidden-unit "dropout", a

recently-developed regularization method that proved to be very effective.

Source: Results (ILSVRC2012) [112]

You should be able to recognize all the elements in the description: five typical

convolutional blocks (convolution, activation function, and max pooling)

corresponding to the "featurizer" part of the model, three hidden (linear) layers

combined with dropout layers corresponding to the "classifier" part of the model,

and the softmax output layer typical of multiclass classification problems.

It is pretty much the fancier model from Chapter 6 but on steroids! We’ll be using

AlexNet to demonstrate how to use a pre-trained model. If you’re interested in

learning more about AlexNet, the paper is called "ImageNet Classification with

Deep Convolutional Neural Networks." [113]

ILSVRC-2014

The 2014 edition [114] gave rise to two now household names when it comes to

architectures for computer vision problems: VGG and Inception. The training data

ImageNet Large Scale Visual Recognition Challenge (ILSVRC) | 501

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!