22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

• weights: A sequence of weights like the one we have just computed.

• num_samples: How many samples are going to be drawn from the dataset.

◦ A typical value is the length of the sequence of weights, as you’re likely

sampling from the whole training set.

• replacement: If True (the default value), it draws samples with replacement.

◦ If num_samples equals the length—that is, if the whole training set is used—it

makes sense to draw samples with replacement to effectively compensate

for the imbalance.

◦ It only makes sense to set it to False if num_samples < length of the dataset.

• generator: Optional, it takes a (pseudo) random number Generator that will be

used for drawing the samples.

◦ To ensure reproducibility, we need to create and assign a generator (which

has its own seed) to the sampler, since the manual seed we’ve already set is

not enough.

OK, we’ll sample from the whole training set, and we have our sequence of weights

ready. We are still missing a generator, though. Let’s create both the generator and

the sampler now:

generator = torch.Generator()

sampler = WeightedRandomSampler(

weights=sample_weights,

num_samples=len(sample_weights),

generator=generator,

replacement=True

)

"Didn’t you say we need to set a seed for the generator?! Where is it?"

Indeed, I said it. We’ll set it soon, after assigning the sampler to the data loader.

You’ll understand the reasoning behind this choice shortly, so please bear with me.

Now, let’s (re-)create the data loaders using the weighted sampler with the training

set:

292 | Chapter 4: Classifying Images

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!