Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

but if we want to get serious about all this, we must use mini-batch gradientdescent. Thus, we need mini-batches. Thus, we need to slice our datasetaccordingly. Do you want to do it manually?! Me neither!So we use PyTorch’s DataLoader class for this job. We tell it which dataset to use(the one we just built in the previous section), the desired mini-batch size, and ifwe’d like to shuffle it or not. That’s it!IMPORTANT: in the absolute majority of cases, you should setshuffle=True for your training set to improve the performanceof gradient descent. There are a few exceptions, though, like timeseries problems, where shuffling actually leads to data leakage.So, always ask yourself: "Do I have a reason NOT to shuffle thedata?""What about the validation and test sets?" There is no need toshuffle them since we are not computing gradients with them.There is more to a DataLoader than meets the eye—it is alsopossible to use it together with a sampler to fetch mini-batchesthat compensate for imbalanced classes, for instance. Too muchto handle right now, but we will eventually get there.Our loader will behave like an iterator, so we can loop over it and fetch a differentmini-batch every time."How do I choose my mini-batch size?"It is typical to use powers of two for mini-batch sizes, like 16, 32, 64, or 128, and 32seems to be the choice of most people, Yann LeCun [56] included.Some more-complex models may use even larger sizes, although sizes are usuallyconstrained by hardware limitations (i.e., how many data points actually fit intomemory).In our example, we have only 80 training points, so I chose a mini-batch size of 16to conveniently split the training set into five mini-batches.DataLoader | 137

Notebook Cell 2.3 - Building a data loader for our training datatrain_loader = DataLoader(dataset=train_data,batch_size=16,shuffle=True,)To retrieve a mini-batch, one can simply run the command below—it will return alist containing two tensors, one for the features, another one for the labels:next(iter(train_loader))Output[tensor([[0.1196],[0.1395],...[0.8155],[0.5979]]), tensor([[1.3214],[1.3051],...[2.6606],[2.0407]])]"Why not use a list instead?"If you call list(train_loader), you’ll get, as a result, a list of five elements; that is,all five mini-batches. Then you could take the first element of that list to obtain asingle mini-batch as in the example above. It would defeat the purpose of using theiterable provided by the DataLoader; that is, to iterate over the elements (minibatches,in that case) one at a time.To learn more about it, check RealPython’s material on iterables [57] and iterators [58] .138 | Chapter 2: Rethinking the Training Loop

but if we want to get serious about all this, we must use mini-batch gradient

descent. Thus, we need mini-batches. Thus, we need to slice our dataset

accordingly. Do you want to do it manually?! Me neither!

So we use PyTorch’s DataLoader class for this job. We tell it which dataset to use

(the one we just built in the previous section), the desired mini-batch size, and if

we’d like to shuffle it or not. That’s it!

IMPORTANT: in the absolute majority of cases, you should set

shuffle=True for your training set to improve the performance

of gradient descent. There are a few exceptions, though, like time

series problems, where shuffling actually leads to data leakage.

So, always ask yourself: "Do I have a reason NOT to shuffle the

data?"

"What about the validation and test sets?" There is no need to

shuffle them since we are not computing gradients with them.

There is more to a DataLoader than meets the eye—it is also

possible to use it together with a sampler to fetch mini-batches

that compensate for imbalanced classes, for instance. Too much

to handle right now, but we will eventually get there.

Our loader will behave like an iterator, so we can loop over it and fetch a different

mini-batch every time.

"How do I choose my mini-batch size?"

It is typical to use powers of two for mini-batch sizes, like 16, 32, 64, or 128, and 32

seems to be the choice of most people, Yann LeCun [56] included.

Some more-complex models may use even larger sizes, although sizes are usually

constrained by hardware limitations (i.e., how many data points actually fit into

memory).

In our example, we have only 80 training points, so I chose a mini-batch size of 16

to conveniently split the training set into five mini-batches.

DataLoader | 137

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!