Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

If, by any chance, you ended up with something like the weird plot below, don’tworry just yet!Figure 2.7 - Weird results on TensorBoard :PRemember, I said writing the data of multiple runs into the samefolder was bad? This is why…Since we’re writing data to the folderruns/simple_linear_regression, if we do not change the nameof the folder (or erase the data there) before running the code asecond time, TensorBoard gets somewhat confused, as you canguess from its output:• Found more than one graph event per run (because we ranadd_graph() more than once)• Found more than one "run metadata" event with tag step1(because we ran add_scalars() more than once)If you are using a local installation, you can see those messages inthe terminal window or Anaconda prompt you used to runtensorboard --log_dir=runs.So, you finished training your model, you inspected TensorBoard plots, and you’rehappy with the losses you got.Congratulations! Your job is done; you successfully trained your model!There is only one more thing you need to know, and that is how to handle…TensorBoard | 163

Saving and Loading ModelsTraining a model successfully is great, no doubt about that, but not all models willtrain quickly, and training may get interrupted (computer crashing, timeout after12 hours of continuous GPU usage on Google Colab, etc.). It would be a pity to haveto start over, right?So, it is important to be able to checkpoint or save our model, that is, save it to disk,in case we’d like to restart training later or deploy it as an application to makepredictions.Model StateTo checkpoint a model, we basically have to save its state to a file so that it can beloaded back later—nothing special, actually.What defines the state of a model?• model.state_dict(): kinda obvious, right?• optimizer.state_dict(): remember, optimizers have a state_dict() as well• losses: after all, you should keep track of its evolution• epoch: it is just a number, so why not? :-)• anything else you’d like to have restored laterSavingNow, we wrap everything into a Python dictionary and use torch.save() to dumpit all into a file. Easy peasy! We have just saved our model to a file namedmodel_checkpoint.pth.Notebook Cell 2.4 - Saving checkpointcheckpoint = {'epoch': n_epochs,'model_state_dict': model.state_dict(),'optimizer_state_dict': optimizer.state_dict(),'loss': losses,'val_loss': val_losses}torch.save(checkpoint, 'model_checkpoint.pth')164 | Chapter 2: Rethinking the Training Loop

If, by any chance, you ended up with something like the weird plot below, don’t

worry just yet!

Figure 2.7 - Weird results on TensorBoard :P

Remember, I said writing the data of multiple runs into the same

folder was bad? This is why…

Since we’re writing data to the folder

runs/simple_linear_regression, if we do not change the name

of the folder (or erase the data there) before running the code a

second time, TensorBoard gets somewhat confused, as you can

guess from its output:

• Found more than one graph event per run (because we ran

add_graph() more than once)

• Found more than one "run metadata" event with tag step1

(because we ran add_scalars() more than once)

If you are using a local installation, you can see those messages in

the terminal window or Anaconda prompt you used to run

tensorboard --log_dir=runs.

So, you finished training your model, you inspected TensorBoard plots, and you’re

happy with the losses you got.

Congratulations! Your job is done; you successfully trained your model!

There is only one more thing you need to know, and that is how to handle…

TensorBoard | 163

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!