Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
Model Configuration1 optimizer_model = optim.Adam(model.parameters(), lr=3e-4)2 sbs_incep = StepByStep(model, inception_loss, optimizer_model)"Wait, aren’t we pre-processing the dataset this time?"Unfortunately, no. The preprocessed_dataset() function cannot handle multipleoutputs. Instead of making the process convoluted in order to handle thepeculiarities of the Inception model, I am sticking with the simpler (yet slower) wayof training the last layer while it is still attached to the rest of the model.The Inception model is also different from the others in its expected input size: 299instead of 224. So, we need to recreate the data loaders accordingly:Data Preparation1 normalizer = Normalize(mean=[0.485, 0.456, 0.406],2 std=[0.229, 0.224, 0.225])34 composer = Compose([Resize(299),5 ToTensor(),6 normalizer])78 train_data = ImageFolder(root='rps', transform=composer)9 val_data = ImageFolder(root='rps-test-set', transform=composer)10 # Builds a loader of each set11 train_loader = DataLoader(12 train_data, batch_size=16, shuffle=True)13 val_loader = DataLoader(val_data, batch_size=16)We’re ready, so let’s train our model for a single epoch and evaluate the result:Model Training1 sbs_incep.set_loaders(train_loader, val_loader)2 sbs_incep.train(1)StepByStep.loader_apply(val_loader, sbs_incep.correct)Auxiliary Classifiers (Side-Heads) | 523
Outputtensor([[108, 124],[116, 124],[108, 124]])It achieved an accuracy of 89.25% on the validation set. Not bad!There is more to the Inception model than auxiliary classifiers, though. Let’s checkout some of its other architectural elements.1x1 ConvolutionsThis particular architectural element is not exactly new, but it is a somewhat specialcase of an already known element. So far, the smallest kernel used in aconvolutional layer had a size of three-by-three. These kernels performed anelement-wise multiplication, and then they added up the resulting elements toproduce a single value for each region to which they were applied. So far, nothingnew.The idea of a kernel of size one-by-one is somewhat counterintuitive at first. For asingle channel, this kernel is only scaling the values of its input and nothing else.That seems hardly useful.But everything changes if you have multiple channels! Remember the threechannelconvolutions from Chapter 6? A filter has as many channels as its input.This means that each channel will be scaled independently and the results will beadded up, resulting in one channel as output (per filter).A 1x1 convolution can be used to reduce the number ofchannels; that is, it may work as a dimension-reduction layer.An image is worth a thousand words, so let’s visualize this.524 | Chapter 7: Transfer Learning
- Page 498 and 499: Figure 6.20 - Paths taken by SGD (w
- Page 500 and 501: Equation 6.16 - Looking aheadOnce N
- Page 502 and 503: Figure 6.22 - Path taken by each SG
- Page 504 and 505: for epoch in range(4):# training lo
- Page 506 and 507: course) up to a given number of epo
- Page 508 and 509: Next, we create a protected method
- Page 510 and 511: Mini-Batch SchedulersThese schedule
- Page 512 and 513: Schedulers in StepByStep — Part I
- Page 514 and 515: Scheduler PathsBefore trying out a
- Page 516 and 517: After applying each scheduler to SG
- Page 518 and 519: Data Preparation1 # Loads temporary
- Page 520 and 521: Figure 6.31 - LossesEvaluationprint
- Page 522 and 523: [96] http://www.samkass.com/theorie
- Page 524 and 525: ImportsFor the sake of organization
- Page 526 and 527: ILSVRC-2012The 2012 edition [111] o
- Page 528 and 529: remained unchanged.ResNet (MSRA Tea
- Page 530 and 531: Transfer Learning in PracticeIn Cha
- Page 532 and 533: dropout. You’re already familiar
- Page 534 and 535: OutputDownloading: "https://downloa
- Page 536 and 537: Replacing the "Top" of the Model1 a
- Page 538 and 539: Model Size Classifier Layer(s) Repl
- Page 540 and 541: Model TrainingWe have everything se
- Page 542 and 543: "Removing" the Top Layer1 alex.clas
- Page 544 and 545: torch.save(train_preproc.tensors, '
- Page 546 and 547: Outputtensor([[109, 124],[124, 124]
- Page 550 and 551: Figure 7.4 - 1x1 convolutionThe inp
- Page 552 and 553: The weights used by PIL are 0.299 f
- Page 554 and 555: • reduce the number of output cha
- Page 556 and 557: The constructor method defines the
- Page 558 and 559: Does it sound familiar? That’s wh
- Page 560 and 561: and w to represent these parameters
- Page 562 and 563: A mini-batch of size 64 is small en
- Page 564 and 565: normed1 = batch_normalizer(batch1[0
- Page 566 and 567: OutputOrderedDict([('running_mean',
- Page 568 and 569: OutputOrderedDict([('running_mean',
- Page 570 and 571: batch_normalizer = nn.BatchNorm2d(n
- Page 572 and 573: torch.manual_seed(23)dummy_points =
- Page 574 and 575: np.concatenate([dummy_points[:5].nu
- Page 576 and 577: Another advantage of these shortcut
- Page 578 and 579: It should be pretty clear, except f
- Page 580 and 581: Data Preparation1 # ImageNet statis
- Page 582 and 583: Data Preparation — Preprocessing1
- Page 584 and 585: • freezing the layers of the mode
- Page 586 and 587: Extra ChapterVanishing and Explodin
- Page 588 and 589: discussing it, let me illustrate it
- Page 590 and 591: Model Configuration (2)1 loss_fn =
- Page 592 and 593: weights. If done properly, the init
- Page 594 and 595: just did), or, if you are training
- Page 596 and 597: Figure E.3 - The effect of batch no
Output
tensor([[108, 124],
[116, 124],
[108, 124]])
It achieved an accuracy of 89.25% on the validation set. Not bad!
There is more to the Inception model than auxiliary classifiers, though. Let’s check
out some of its other architectural elements.
1x1 Convolutions
This particular architectural element is not exactly new, but it is a somewhat special
case of an already known element. So far, the smallest kernel used in a
convolutional layer had a size of three-by-three. These kernels performed an
element-wise multiplication, and then they added up the resulting elements to
produce a single value for each region to which they were applied. So far, nothing
new.
The idea of a kernel of size one-by-one is somewhat counterintuitive at first. For a
single channel, this kernel is only scaling the values of its input and nothing else.
That seems hardly useful.
But everything changes if you have multiple channels! Remember the threechannel
convolutions from Chapter 6? A filter has as many channels as its input.
This means that each channel will be scaled independently and the results will be
added up, resulting in one channel as output (per filter).
A 1x1 convolution can be used to reduce the number of
channels; that is, it may work as a dimension-reduction layer.
An image is worth a thousand words, so let’s visualize this.
524 | Chapter 7: Transfer Learning