Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
but if we want to get serious about all this, we must use mini-batch gradientdescent. Thus, we need mini-batches. Thus, we need to slice our datasetaccordingly. Do you want to do it manually?! Me neither!So we use PyTorch’s DataLoader class for this job. We tell it which dataset to use(the one we just built in the previous section), the desired mini-batch size, and ifwe’d like to shuffle it or not. That’s it!IMPORTANT: in the absolute majority of cases, you should setshuffle=True for your training set to improve the performanceof gradient descent. There are a few exceptions, though, like timeseries problems, where shuffling actually leads to data leakage.So, always ask yourself: "Do I have a reason NOT to shuffle thedata?""What about the validation and test sets?" There is no need toshuffle them since we are not computing gradients with them.There is more to a DataLoader than meets the eye—it is alsopossible to use it together with a sampler to fetch mini-batchesthat compensate for imbalanced classes, for instance. Too muchto handle right now, but we will eventually get there.Our loader will behave like an iterator, so we can loop over it and fetch a differentmini-batch every time."How do I choose my mini-batch size?"It is typical to use powers of two for mini-batch sizes, like 16, 32, 64, or 128, and 32seems to be the choice of most people, Yann LeCun [56] included.Some more-complex models may use even larger sizes, although sizes are usuallyconstrained by hardware limitations (i.e., how many data points actually fit intomemory).In our example, we have only 80 training points, so I chose a mini-batch size of 16to conveniently split the training set into five mini-batches.DataLoader | 137
Notebook Cell 2.3 - Building a data loader for our training datatrain_loader = DataLoader(dataset=train_data,batch_size=16,shuffle=True,)To retrieve a mini-batch, one can simply run the command below—it will return alist containing two tensors, one for the features, another one for the labels:next(iter(train_loader))Output[tensor([[0.1196],[0.1395],...[0.8155],[0.5979]]), tensor([[1.3214],[1.3051],...[2.6606],[2.0407]])]"Why not use a list instead?"If you call list(train_loader), you’ll get, as a result, a list of five elements; that is,all five mini-batches. Then you could take the first element of that list to obtain asingle mini-batch as in the example above. It would defeat the purpose of using theiterable provided by the DataLoader; that is, to iterate over the elements (minibatches,in that case) one at a time.To learn more about it, check RealPython’s material on iterables [57] and iterators [58] .138 | Chapter 2: Rethinking the Training Loop
- Page 112 and 113: print(error.requires_grad, yhat.req
- Page 114 and 115: Output(tensor([0.], device='cuda:0'
- Page 116 and 117: 56 # need to tell it to let it go..
- Page 118 and 119: computation.If you chose "Local Ins
- Page 120 and 121: Figure 1.6 - Now parameter "b" does
- Page 122 and 123: There are many optimizers: SGD is t
- Page 124 and 125: 41 optimizer.zero_grad() 34243 prin
- Page 126 and 127: Notebook Cell 1.8 - PyTorch’s los
- Page 128 and 129: Outputarray(0.00804466, dtype=float
- Page 130 and 131: Let’s build a proper (yet simple)
- Page 132 and 133: "What do we need this for?"It turns
- Page 134 and 135: 1 Instantiating a model2 What IS th
- Page 136 and 137: In the __init__() method, we create
- Page 138 and 139: LayersA Linear model can be seen as
- Page 140 and 141: There are MANY different layers tha
- Page 142 and 143: We use magic, just like that:%run -
- Page 144 and 145: • Step 1: compute model’s predi
- Page 146 and 147: RecapFirst of all, congratulations
- Page 148 and 149: Chapter 2Rethinking the Training Lo
- Page 150 and 151: Let’s take a look at the code onc
- Page 152 and 153: Higher-Order FunctionsAlthough this
- Page 154 and 155: def exponentiation_builder(exponent
- Page 156 and 157: Apart from returning the loss value
- Page 158 and 159: Our code should look like this; see
- Page 160 and 161: There is no need to load the whole
- Page 164 and 165: How does this change our code so fa
- Page 166 and 167: Run - Model Training V2%run -i mode
- Page 168 and 169: piece of code that’s going to be
- Page 170 and 171: for it. We could do the same for th
- Page 172 and 173: EvaluationHow can we evaluate the m
- Page 174 and 175: And then, we update our model confi
- Page 176 and 177: Run - Model Training V4%run -i mode
- Page 178 and 179: Loading Extension# Load the TensorB
- Page 180 and 181: browser, you’ll likely see someth
- Page 182 and 183: model’s graph (not quite the same
- Page 184 and 185: Figure 2.5 - Scalars on TensorBoard
- Page 186 and 187: Define - Model Training V51 %%write
- Page 188 and 189: If, by any chance, you ended up wit
- Page 190 and 191: The procedure is exactly the same,
- Page 192 and 193: soon, so please bear with me for no
- Page 194 and 195: After recovering our model’s stat
- Page 196 and 197: Run - Model Configuration V31 # %lo
- Page 198 and 199: This is the general structure you
- Page 200 and 201: Chapter 2.1Going ClassySpoilersIn t
- Page 202 and 203: # A completely empty (and useless)
- Page 204 and 205: # These attributes are defined here
- Page 206 and 207: # Creates the train_step function f
- Page 208 and 209: # Builds function that performs a s
- Page 210 and 211: setattrThe setattr function sets th
but if we want to get serious about all this, we must use mini-batch gradient
descent. Thus, we need mini-batches. Thus, we need to slice our dataset
accordingly. Do you want to do it manually?! Me neither!
So we use PyTorch’s DataLoader class for this job. We tell it which dataset to use
(the one we just built in the previous section), the desired mini-batch size, and if
we’d like to shuffle it or not. That’s it!
IMPORTANT: in the absolute majority of cases, you should set
shuffle=True for your training set to improve the performance
of gradient descent. There are a few exceptions, though, like time
series problems, where shuffling actually leads to data leakage.
So, always ask yourself: "Do I have a reason NOT to shuffle the
data?"
"What about the validation and test sets?" There is no need to
shuffle them since we are not computing gradients with them.
There is more to a DataLoader than meets the eye—it is also
possible to use it together with a sampler to fetch mini-batches
that compensate for imbalanced classes, for instance. Too much
to handle right now, but we will eventually get there.
Our loader will behave like an iterator, so we can loop over it and fetch a different
mini-batch every time.
"How do I choose my mini-batch size?"
It is typical to use powers of two for mini-batch sizes, like 16, 32, 64, or 128, and 32
seems to be the choice of most people, Yann LeCun [56] included.
Some more-complex models may use even larger sizes, although sizes are usually
constrained by hardware limitations (i.e., how many data points actually fit into
memory).
In our example, we have only 80 training points, so I chose a mini-batch size of 16
to conveniently split the training set into five mini-batches.
DataLoader | 137