Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
OutputDownloading: "https://download.pytorch.org/models/alexnet-owt-4df8aa71.pth" to ./pretrained/alexnet-owt-4df8aa71.pthFrom now on, it works as if we had saved a model to disk. To load the model’s statedictionary, we can use its load_state_dict() method:Loading Model1 alex.load_state_dict(state_dict)Output<All keys matched successfully>There we go! We have a fully trained AlexNet to play with! Now what?Model FreezingIn most cases, you don’t want to continue training the whole model. I mean, intheory, you could pick it up where it was left off by the original authors and resumetraining using your own dataset. That’s a lot of work, and you’d need a lot of data tomake any kind of meaningful progress. There must be a better way! Of course,there is: We can freeze the model.Freezing the model means it won’t learn anymore; that is, itsparameters / weights will not be updated anymore.What best characterizes a tensor representing a learnable parameter? It requiresgradients. So, if we’d like to make them stop learning anything, we need to changeexactly that:Helper Function #6 — Model freezing1 def freeze_model(model):2 for parameter in model.parameters():3 parameter.requires_grad = FalseTransfer Learning in Practice | 509
freeze_model(alex)The function above loops over all parameters of a given model and freezes them."If the model is frozen, how I am supposed to train it for my ownpurpose?"Excellent question! We have to unfreeze a small part of the model or, better yet,replace a small part of the model. We’ll be replacing the…Top of the ModelThe "top" of the model is loosely defined as the last layer(s) of the model, usuallybelonging to its classifier part. The featurizer part is usually left untouched sincewe’re trying to leverage the model’s ability to generate features for us. Let’sinspect AlexNet’s classifier once again:print(alex.classifier)OutputSequential((0): Dropout(p=0.5, inplace=False)(1): Linear(in_features=9216, out_features=4096, bias=True)(2): ReLU(inplace=True)(3): Dropout(p=0.5, inplace=False)(4): Linear(in_features=4096, out_features=4096, bias=True)(5): ReLU(inplace=True)(6): Linear(in_features=4096, out_features=1000, bias=True))It has two hidden layers and one output layer. The output layer produces 1,000logits, one for each class in the ILSVRC challenge. But, unless you are playing withthe dataset used for the challenge, you’d have your own classes to compute logitsfor.In our Rock Paper Scissors dataset, we have three classes. So, we need to replace theoutput layer accordingly:510 | Chapter 7: Transfer Learning
- Page 484 and 485: In code, the implementation of the
- Page 486 and 487: As expected, the EWMA without corre
- Page 488 and 489: optimizer = optim.Adam(model.parame
- Page 490 and 491: IMPORTANT: The logging function mus
- Page 492 and 493: Output{'state': {140601337662512: {
- Page 494 and 495: different optimizer, set them to ca
- Page 496 and 497: • dampening: dampening factor for
- Page 498 and 499: Figure 6.20 - Paths taken by SGD (w
- Page 500 and 501: Equation 6.16 - Looking aheadOnce N
- Page 502 and 503: Figure 6.22 - Path taken by each SG
- Page 504 and 505: for epoch in range(4):# training lo
- Page 506 and 507: course) up to a given number of epo
- Page 508 and 509: Next, we create a protected method
- Page 510 and 511: Mini-Batch SchedulersThese schedule
- Page 512 and 513: Schedulers in StepByStep — Part I
- Page 514 and 515: Scheduler PathsBefore trying out a
- Page 516 and 517: After applying each scheduler to SG
- Page 518 and 519: Data Preparation1 # Loads temporary
- Page 520 and 521: Figure 6.31 - LossesEvaluationprint
- Page 522 and 523: [96] http://www.samkass.com/theorie
- Page 524 and 525: ImportsFor the sake of organization
- Page 526 and 527: ILSVRC-2012The 2012 edition [111] o
- Page 528 and 529: remained unchanged.ResNet (MSRA Tea
- Page 530 and 531: Transfer Learning in PracticeIn Cha
- Page 532 and 533: dropout. You’re already familiar
- Page 536 and 537: Replacing the "Top" of the Model1 a
- Page 538 and 539: Model Size Classifier Layer(s) Repl
- Page 540 and 541: Model TrainingWe have everything se
- Page 542 and 543: "Removing" the Top Layer1 alex.clas
- Page 544 and 545: torch.save(train_preproc.tensors, '
- Page 546 and 547: Outputtensor([[109, 124],[124, 124]
- Page 548 and 549: Model Configuration1 optimizer_mode
- Page 550 and 551: Figure 7.4 - 1x1 convolutionThe inp
- Page 552 and 553: The weights used by PIL are 0.299 f
- Page 554 and 555: • reduce the number of output cha
- Page 556 and 557: The constructor method defines the
- Page 558 and 559: Does it sound familiar? That’s wh
- Page 560 and 561: and w to represent these parameters
- Page 562 and 563: A mini-batch of size 64 is small en
- Page 564 and 565: normed1 = batch_normalizer(batch1[0
- Page 566 and 567: OutputOrderedDict([('running_mean',
- Page 568 and 569: OutputOrderedDict([('running_mean',
- Page 570 and 571: batch_normalizer = nn.BatchNorm2d(n
- Page 572 and 573: torch.manual_seed(23)dummy_points =
- Page 574 and 575: np.concatenate([dummy_points[:5].nu
- Page 576 and 577: Another advantage of these shortcut
- Page 578 and 579: It should be pretty clear, except f
- Page 580 and 581: Data Preparation1 # ImageNet statis
- Page 582 and 583: Data Preparation — Preprocessing1
Output
Downloading: "https://download.pytorch.org/models/alexnet-owt-
4df8aa71.pth" to ./pretrained/alexnet-owt-4df8aa71.pth
From now on, it works as if we had saved a model to disk. To load the model’s state
dictionary, we can use its load_state_dict() method:
Loading Model
1 alex.load_state_dict(state_dict)
Output
<All keys matched successfully>
There we go! We have a fully trained AlexNet to play with! Now what?
Model Freezing
In most cases, you don’t want to continue training the whole model. I mean, in
theory, you could pick it up where it was left off by the original authors and resume
training using your own dataset. That’s a lot of work, and you’d need a lot of data to
make any kind of meaningful progress. There must be a better way! Of course,
there is: We can freeze the model.
Freezing the model means it won’t learn anymore; that is, its
parameters / weights will not be updated anymore.
What best characterizes a tensor representing a learnable parameter? It requires
gradients. So, if we’d like to make them stop learning anything, we need to change
exactly that:
Helper Function #6 — Model freezing
1 def freeze_model(model):
2 for parameter in model.parameters():
3 parameter.requires_grad = False
Transfer Learning in Practice | 509