Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Run - Model Training V2%run -i model_training/v2.py"Wow! What happened here?!"It seems like a lot changed. Let’s take a closer look, step by step:• We added an inner loop to handle the mini-batches produced by theDataLoader (line 12).• We sent only one mini-batch to the device, as opposed to sending the wholetraining set (lines 16 and 17).For larger datasets, loading data on demand (into a CPU tensor)inside Dataset’s __getitem__() method and then sending all datapoints that belong to the same mini-batch at once to your GPU(device) is the way to go to make the best use of your graphicscard’s RAM.Moreover, if you have many GPUs to train your model on, it isbest to keep your dataset “device agnostic" and assign thebatches to different GPUs during training.• We performed a train_step_fn() on a mini-batch (line 21) and appended thecorresponding loss to a list (line 22).• After going through all mini-batches, that is, at the end of an epoch, wecalculated the total loss for the epoch, which is the average loss over all minibatches,appending the result to a list (lines 26 and 28).After another two updates, our current state of development is:• Data Preparation V1• Model Configuration V1• Model Training V2DataLoader | 141

Not so bad, right? So, it is time to check if our code still works well:# Checks model's parametersprint(model.state_dict())OutputOrderedDict([('0.weight', tensor([[1.9684]], device='cuda:0')),('0.bias', tensor([1.0235], device='cuda:0'))])Did you get slightly different values? Try running the wholepipeline again:Full Pipeline%run -i data_preparation/v1.py%run -i model_configuration/v1.py%run -i model_training/v2.pySince the DataLoader draws random samples, executing othercode between the last two steps of the pipeline may interferewith the reproducibility of the results.Anyway, as long as your results are less than 0.01 far from minefor both weight and bias, your code is working fine :-)Did you notice it is taking longer to train now? Can you guesswhy?ANSWER: The training time is longer now because the inner loop is executed fivetimes for each epoch (in our example, since we are using a mini-batch of size 16 andwe have 80 training data points in total, we execute the inner loop 80 / 16 = 5times). So, in total, we are calling the train_step_fn() a total of 5,000 times now!No wonder it’s taking longer!Mini-Batch Inner LoopFrom now on, it is very unlikely that you’ll ever use (full) batch gradient descentagain, both in this book or in real life :-) So, it makes sense to, once again, organize a142 | Chapter 2: Rethinking the Training Loop

Not so bad, right? So, it is time to check if our code still works well:

# Checks model's parameters

print(model.state_dict())

Output

OrderedDict([('0.weight', tensor([[1.9684]], device='cuda:0')),

('0.bias', tensor([1.0235], device='cuda:0'))])

Did you get slightly different values? Try running the whole

pipeline again:

Full Pipeline

%run -i data_preparation/v1.py

%run -i model_configuration/v1.py

%run -i model_training/v2.py

Since the DataLoader draws random samples, executing other

code between the last two steps of the pipeline may interfere

with the reproducibility of the results.

Anyway, as long as your results are less than 0.01 far from mine

for both weight and bias, your code is working fine :-)

Did you notice it is taking longer to train now? Can you guess

why?

ANSWER: The training time is longer now because the inner loop is executed five

times for each epoch (in our example, since we are using a mini-batch of size 16 and

we have 80 training data points in total, we execute the inner loop 80 / 16 = 5

times). So, in total, we are calling the train_step_fn() a total of 5,000 times now!

No wonder it’s taking longer!

Mini-Batch Inner Loop

From now on, it is very unlikely that you’ll ever use (full) batch gradient descent

again, both in this book or in real life :-) So, it makes sense to, once again, organize a

142 | Chapter 2: Rethinking the Training Loop

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!