Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Output<torch.utils.data.dataset.Subset at 0x7fc6e7944290>Each subset contains the corresponding indices as an attribute:train_idx.indicesOutput[118,170,...10,161]Next, each Subset object is used as an argument to the corresponding sampler:train_sampler = SubsetRandomSampler(train_idx)val_sampler = SubsetRandomSampler(val_idx)So, we can use a single dataset from which to load the data since the split iscontrolled by the samplers. But we still need two data loaders, each using itscorresponding sampler:# Builds a loader of each settrain_loader = DataLoader(dataset=dataset, batch_size=16, sampler=train_sampler)val_loader = DataLoader(dataset=dataset, batch_size=16, sampler=val_sampler)If you’re using a sampler, you cannot set shuffle=True.Data Preparation | 287

We can also check if the loaders are returning the correct number of mini-batches:len(iter(train_loader)), len(iter(val_loader))Output(15, 4)There are 15 mini-batches in the training loader (15 mini-batches * 16 batch size =240 data points), and four mini-batches in the validation loader (4 mini-batches *16 batch size = 64 data points). In the validation set, the last mini-batch will haveonly 12 points, since there are only 60 points in total.OK, cool, this means we don’t need two (split) datasets anymore—we only needtwo samplers. Right? Well, it depends.Data Augmentation TransformsNo, I did not change topics :-) The reason why we may still need two split datasetsis exactly that: data augmentation. In general, we want to apply data augmentationto the training data only (yes, there is test-data augmentation too, but that’s adifferent matter). Data augmentation is accomplished using composingtransforms, which will be applied to all points in the dataset. See the problem?If we need some data points to be augmented, but not others, the easiest way toaccomplish this is to create two composers and use them in two different datasets.We can still use the indices, though:# Uses indices to perform the splitx_train_tensor = x_tensor[train_idx]y_train_tensor = y_tensor[train_idx]x_val_tensor = x_tensor[val_idx]y_val_tensor = y_tensor[val_idx]Then, here come the two composers: The train_composer() augments the data,and then scales it (min-max); the val_composer() only scales the data (min-max).288 | Chapter 4: Classifying Images

Output

<torch.utils.data.dataset.Subset at 0x7fc6e7944290>

Each subset contains the corresponding indices as an attribute:

train_idx.indices

Output

[118,

170,

...

10,

161]

Next, each Subset object is used as an argument to the corresponding sampler:

train_sampler = SubsetRandomSampler(train_idx)

val_sampler = SubsetRandomSampler(val_idx)

So, we can use a single dataset from which to load the data since the split is

controlled by the samplers. But we still need two data loaders, each using its

corresponding sampler:

# Builds a loader of each set

train_loader = DataLoader(

dataset=dataset, batch_size=16, sampler=train_sampler

)

val_loader = DataLoader(

dataset=dataset, batch_size=16, sampler=val_sampler

)

If you’re using a sampler, you cannot set shuffle=True.

Data Preparation | 287

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!