Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Model TrainingWe have everything set to train the "top" layer of our modified version of AlexNet:Model Training1 sbs_alex = StepByStep(alex, multi_loss_fn, optimizer_alex)2 sbs_alex.set_loaders(train_loader, val_loader)3 sbs_alex.train(1)You probably noticed it took several seconds (and a lot more if you’re running on aCPU) to run the code above, even though it is training for one single epoch."How come? Most of the model is frozen; there is only one measlylayer to train…"You’re right, there is only one measly layer to compute gradients for and to updateparameters for, but the forward pass still uses the whole model. So, every singleimage (out of 2,520 in our training set) will have its features computed using morethan 61 million parameters! No wonder it is taking some time! By the way, only12,291 parameters are trainable.If you’re thinking "there must be a better way…," you’re absolutely right—that’s thetopic of the next section.But, first, let’s see how effective transfer learning is by evaluating our model afterhaving trained it over one epoch only:StepByStep.loader_apply(val_loader, sbs_alex.correct)Outputtensor([[111, 124],[124, 124],[124, 124]])That’s 96.51% accuracy in the validation set (it is 99.33% for the training set, incase you’re wondering). Even if it is taking some time to train, these results arepretty good!Transfer Learning in Practice | 515

Generating a Dataset of FeaturesWe’ve just realized that most of the time it takes to train the last layer of ourmodel over one single epoch was spent in the forward pass. Now, imagine if wewanted to train it over ten epochs: Not only would the model spend most of itstime performing the forward pass, but, even worse, it would perform the sameoperations ten times over.Since all layers but the last are frozen, the output of the secondto-lastlayer is always the same.That’s assuming you’re not doing data augmentation, of course.That’s a huge waste of your time, energy, and money (if you’re paying for cloudcomputing)."What can we do about it?"Well, since the frozen layers are simply generating features that will be the inputof the trainable layers, why not treat the frozen layers as such? We could do it infour easy steps:• Keep only the frozen layers in the model.• Run the whole dataset through it and collect its outputs as a dataset offeatures.• Train a separate model (that corresponds to the "top" of the original model)using the dataset of features.• Attach the trained model to the top of the frozen layers.This way, we’re effectively splitting the feature extraction and actual trainingphases, thus avoiding the overhead of generating features over and over again forevery single forward pass.To keep only the frozen layers, we need to get rid of the "top" of the original model.But, since we also want to attach our new layer to the whole model after training,it is a better idea to simply replace the "top" layer with an identity layer instead ofremoving it entirely:516 | Chapter 7: Transfer Learning

Model Training

We have everything set to train the "top" layer of our modified version of AlexNet:

Model Training

1 sbs_alex = StepByStep(alex, multi_loss_fn, optimizer_alex)

2 sbs_alex.set_loaders(train_loader, val_loader)

3 sbs_alex.train(1)

You probably noticed it took several seconds (and a lot more if you’re running on a

CPU) to run the code above, even though it is training for one single epoch.

"How come? Most of the model is frozen; there is only one measly

layer to train…"

You’re right, there is only one measly layer to compute gradients for and to update

parameters for, but the forward pass still uses the whole model. So, every single

image (out of 2,520 in our training set) will have its features computed using more

than 61 million parameters! No wonder it is taking some time! By the way, only

12,291 parameters are trainable.

If you’re thinking "there must be a better way…," you’re absolutely right—that’s the

topic of the next section.

But, first, let’s see how effective transfer learning is by evaluating our model after

having trained it over one epoch only:

StepByStep.loader_apply(val_loader, sbs_alex.correct)

Output

tensor([[111, 124],

[124, 124],

[124, 124]])

That’s 96.51% accuracy in the validation set (it is 99.33% for the training set, in

case you’re wondering). Even if it is taking some time to train, these results are

pretty good!

Transfer Learning in Practice | 515

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!