Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
Run - Model Training V2%run -i model_training/v2.py"Wow! What happened here?!"It seems like a lot changed. Let’s take a closer look, step by step:• We added an inner loop to handle the mini-batches produced by theDataLoader (line 12).• We sent only one mini-batch to the device, as opposed to sending the wholetraining set (lines 16 and 17).For larger datasets, loading data on demand (into a CPU tensor)inside Dataset’s __getitem__() method and then sending all datapoints that belong to the same mini-batch at once to your GPU(device) is the way to go to make the best use of your graphicscard’s RAM.Moreover, if you have many GPUs to train your model on, it isbest to keep your dataset “device agnostic" and assign thebatches to different GPUs during training.• We performed a train_step_fn() on a mini-batch (line 21) and appended thecorresponding loss to a list (line 22).• After going through all mini-batches, that is, at the end of an epoch, wecalculated the total loss for the epoch, which is the average loss over all minibatches,appending the result to a list (lines 26 and 28).After another two updates, our current state of development is:• Data Preparation V1• Model Configuration V1• Model Training V2DataLoader | 141
Not so bad, right? So, it is time to check if our code still works well:# Checks model's parametersprint(model.state_dict())OutputOrderedDict([('0.weight', tensor([[1.9684]], device='cuda:0')),('0.bias', tensor([1.0235], device='cuda:0'))])Did you get slightly different values? Try running the wholepipeline again:Full Pipeline%run -i data_preparation/v1.py%run -i model_configuration/v1.py%run -i model_training/v2.pySince the DataLoader draws random samples, executing othercode between the last two steps of the pipeline may interferewith the reproducibility of the results.Anyway, as long as your results are less than 0.01 far from minefor both weight and bias, your code is working fine :-)Did you notice it is taking longer to train now? Can you guesswhy?ANSWER: The training time is longer now because the inner loop is executed fivetimes for each epoch (in our example, since we are using a mini-batch of size 16 andwe have 80 training data points in total, we execute the inner loop 80 / 16 = 5times). So, in total, we are calling the train_step_fn() a total of 5,000 times now!No wonder it’s taking longer!Mini-Batch Inner LoopFrom now on, it is very unlikely that you’ll ever use (full) batch gradient descentagain, both in this book or in real life :-) So, it makes sense to, once again, organize a142 | Chapter 2: Rethinking the Training Loop
- Page 116 and 117: 56 # need to tell it to let it go..
- Page 118 and 119: computation.If you chose "Local Ins
- Page 120 and 121: Figure 1.6 - Now parameter "b" does
- Page 122 and 123: There are many optimizers: SGD is t
- Page 124 and 125: 41 optimizer.zero_grad() 34243 prin
- Page 126 and 127: Notebook Cell 1.8 - PyTorch’s los
- Page 128 and 129: Outputarray(0.00804466, dtype=float
- Page 130 and 131: Let’s build a proper (yet simple)
- Page 132 and 133: "What do we need this for?"It turns
- Page 134 and 135: 1 Instantiating a model2 What IS th
- Page 136 and 137: In the __init__() method, we create
- Page 138 and 139: LayersA Linear model can be seen as
- Page 140 and 141: There are MANY different layers tha
- Page 142 and 143: We use magic, just like that:%run -
- Page 144 and 145: • Step 1: compute model’s predi
- Page 146 and 147: RecapFirst of all, congratulations
- Page 148 and 149: Chapter 2Rethinking the Training Lo
- Page 150 and 151: Let’s take a look at the code onc
- Page 152 and 153: Higher-Order FunctionsAlthough this
- Page 154 and 155: def exponentiation_builder(exponent
- Page 156 and 157: Apart from returning the loss value
- Page 158 and 159: Our code should look like this; see
- Page 160 and 161: There is no need to load the whole
- Page 162 and 163: but if we want to get serious about
- Page 164 and 165: How does this change our code so fa
- Page 168 and 169: piece of code that’s going to be
- Page 170 and 171: for it. We could do the same for th
- Page 172 and 173: EvaluationHow can we evaluate the m
- Page 174 and 175: And then, we update our model confi
- Page 176 and 177: Run - Model Training V4%run -i mode
- Page 178 and 179: Loading Extension# Load the TensorB
- Page 180 and 181: browser, you’ll likely see someth
- Page 182 and 183: model’s graph (not quite the same
- Page 184 and 185: Figure 2.5 - Scalars on TensorBoard
- Page 186 and 187: Define - Model Training V51 %%write
- Page 188 and 189: If, by any chance, you ended up wit
- Page 190 and 191: The procedure is exactly the same,
- Page 192 and 193: soon, so please bear with me for no
- Page 194 and 195: After recovering our model’s stat
- Page 196 and 197: Run - Model Configuration V31 # %lo
- Page 198 and 199: This is the general structure you
- Page 200 and 201: Chapter 2.1Going ClassySpoilersIn t
- Page 202 and 203: # A completely empty (and useless)
- Page 204 and 205: # These attributes are defined here
- Page 206 and 207: # Creates the train_step function f
- Page 208 and 209: # Builds function that performs a s
- Page 210 and 211: setattrThe setattr function sets th
- Page 212 and 213: See? We effectively modified the un
- Page 214 and 215: the random seed as arguments.This s
Not so bad, right? So, it is time to check if our code still works well:
# Checks model's parameters
print(model.state_dict())
Output
OrderedDict([('0.weight', tensor([[1.9684]], device='cuda:0')),
('0.bias', tensor([1.0235], device='cuda:0'))])
Did you get slightly different values? Try running the whole
pipeline again:
Full Pipeline
%run -i data_preparation/v1.py
%run -i model_configuration/v1.py
%run -i model_training/v2.py
Since the DataLoader draws random samples, executing other
code between the last two steps of the pipeline may interfere
with the reproducibility of the results.
Anyway, as long as your results are less than 0.01 far from mine
for both weight and bias, your code is working fine :-)
Did you notice it is taking longer to train now? Can you guess
why?
ANSWER: The training time is longer now because the inner loop is executed five
times for each epoch (in our example, since we are using a mini-batch of size 16 and
we have 80 training data points in total, we execute the inner loop 80 / 16 = 5
times). So, in total, we are calling the train_step_fn() a total of 5,000 times now!
No wonder it’s taking longer!
Mini-Batch Inner Loop
From now on, it is very unlikely that you’ll ever use (full) batch gradient descent
again, both in this book or in real life :-) So, it makes sense to, once again, organize a
142 | Chapter 2: Rethinking the Training Loop