Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
train_loader = DataLoader(dataset=train_dataset, batch_size=16, sampler=sampler)val_loader = DataLoader(dataset=val_dataset, batch_size=16)Once again, if we’re using a sampler, we cannot use the shuffle argument.There is a lot of boilerplate code here, right? Let’s build yet another function,Helper Function #5, to wrap it all up:Helper Function #51 def make_balanced_sampler(y):2 # Computes weights for compensating imbalanced classes3 classes, counts = y.unique(return_counts=True)4 weights = 1.0 / counts.float()5 sample_weights = weights[y.squeeze().long()]6 # Builds sampler with compute weights7 generator = torch.Generator()8 sampler = WeightedRandomSampler(9 weights=sample_weights,10 num_samples=len(sample_weights),11 generator=generator,12 replacement=True13 )14 return samplersampler = make_balanced_sampler(y_train_tensor)Much better! Its only argument is the tensor containing the labels: The functionwill compute the weights and build the corresponding weighted sampler on its own.Seeds and more (seeds)Time to set the seed for the generator used in the sampler assigned to the dataloader. It is a long sequence of objects, but we can work our way through it toretrieve the generator and call its manual_seed() method:Data Preparation | 293
train_loader.sampler.generator.manual_seed(42)random.seed(42)Now we can check if our sampler is doing its job correctly. Let’s have it sample a fullrun (240 data points in 15 mini-batches of 16 points each), and sum up the labels sowe know how many points are in the positive class:torch.tensor([t[1].sum() for t in iter(train_loader)]).sum()Outputtensor(123.)Close enough! We have 160 images of the positive class, and now, thanks to theweighted sampler, we’re sampling only 123 of them. It means we’re oversamplingthe negative class (which has 80 images) to a total of 117 images, adding up to 240images. Mission accomplished, our dataset is balanced now."Wait a minute! Why on Earth there was an extra seed(random.seed(42)) in the code above? Don’t we have enoughalready?"I agree, too many seeds. Besides one specific seed for the generator, we also haveto set yet another seed for Python’s random module.Honestly, this came to me as a surprise too when I found outabout it! As weird as it may sound, in Torchvision versions priorto 0.8, there was still some code that depended upon Python’snative random module, instead of PyTorch’s own randomgenerators. The problem happened when some of the randomtransformations for data augmentation were used, likeRandomRotation(), RandomAffine(), and others.It’s better to be safe than sorry, so we better set yet another seed to ensure thereproducibility of our code.And that’s exactly what we’re going to do! Remember the set_seed() method we294 | Chapter 4: Classifying Images
- Page 268 and 269: It looks like this:Figure 3.10 - Sp
- Page 270 and 271: True and False Positives and Negati
- Page 272 and 273: tpr_fpr(cm_thresh50)Output(0.909090
- Page 274 and 275: The trade-off between precision and
- Page 276 and 277: Figure 3.13 - Using a low threshold
- Page 278 and 279: Figure 3.16 - Trade-offs for two di
- Page 280 and 281: thresholds do not necessarily inclu
- Page 282 and 283: actual data, it is as bad as it can
- Page 284 and 285: If you want to learn more about bot
- Page 286 and 287: Model Training1 n_epochs = 10023 sb
- Page 288 and 289: step in your journey! What’s next
- Page 290 and 291: Chapter 4Classifying ImagesSpoilers
- Page 292 and 293: Data GenerationOur images are quite
- Page 294 and 295: Images and ChannelsIn case you’re
- Page 296 and 297: image_rgb = np.stack([image_r, imag
- Page 298 and 299: That’s fairly straightforward; we
- Page 300 and 301: • Transformations based on Tensor
- Page 302 and 303: position of an object in a picture
- Page 304 and 305: Outputtensor([[[0., 0., 0., 1., 0.]
- Page 306 and 307: Outputtensor([[[-1., -1., -1., 1.,
- Page 308 and 309: We can convert the former into the
- Page 310 and 311: composer = Compose([RandomHorizonta
- Page 312 and 313: Output<torch.utils.data.dataset.Sub
- Page 314 and 315: train_composer = Compose([RandomHor
- Page 316 and 317: The minority class should have the
- Page 320 and 321: implemented in Chapter 2.1? Let’s
- Page 322 and 323: Let’s take one mini-batch of imag
- Page 324 and 325: What does our model look like? Visu
- Page 326 and 327: Model TrainingLet’s train our mod
- Page 328 and 329: preceding hidden layer to compute i
- Page 330 and 331: fig = sbs_nn.plot_losses()Figure 4.
- Page 332 and 333: Equation 4.2 - Equivalence of deep
- Page 334 and 335: w_nn_equiv = w_nn_output.mm(w_nn_hi
- Page 336 and 337: Weights as PixelsDuring data prepar
- Page 338 and 339: is only 0.25 (for z = 0) and that i
- Page 340 and 341: nn.Tanh()(dummy_z)Outputtensor([-0.
- Page 342 and 343: dummy_z = torch.tensor([-3., 0., 3.
- Page 344 and 345: As you can see, in PyTorch the coef
- Page 346 and 347: Figure 4.16 - Deep model (for real)
- Page 348 and 349: Figure 4.18 - Losses (before and af
- Page 350 and 351: Equation 4.3 - Activation functions
- Page 352 and 353: Helper Function #41 def index_split
- Page 354 and 355: Model Configuration1 # Sets learnin
- Page 356 and 357: Bonus ChapterFeature SpaceThis chap
- Page 358 and 359: Affine TransformationsAn affine tra
- Page 360 and 361: Figure B.3 - Annotated model diagra
- Page 362 and 363: Figure B.5 - In the beginning…But
- Page 364 and 365: OK, now we can clearly see a differ
- Page 366 and 367: In the model above, the sigmoid fun
train_loader.sampler.generator.manual_seed(42)
random.seed(42)
Now we can check if our sampler is doing its job correctly. Let’s have it sample a full
run (240 data points in 15 mini-batches of 16 points each), and sum up the labels so
we know how many points are in the positive class:
torch.tensor([t[1].sum() for t in iter(train_loader)]).sum()
Output
tensor(123.)
Close enough! We have 160 images of the positive class, and now, thanks to the
weighted sampler, we’re sampling only 123 of them. It means we’re oversampling
the negative class (which has 80 images) to a total of 117 images, adding up to 240
images. Mission accomplished, our dataset is balanced now.
"Wait a minute! Why on Earth there was an extra seed
(random.seed(42)) in the code above? Don’t we have enough
already?"
I agree, too many seeds. Besides one specific seed for the generator, we also have
to set yet another seed for Python’s random module.
Honestly, this came to me as a surprise too when I found out
about it! As weird as it may sound, in Torchvision versions prior
to 0.8, there was still some code that depended upon Python’s
native random module, instead of PyTorch’s own random
generators. The problem happened when some of the random
transformations for data augmentation were used, like
RandomRotation(), RandomAffine(), and others.
It’s better to be safe than sorry, so we better set yet another seed to ensure the
reproducibility of our code.
And that’s exactly what we’re going to do! Remember the set_seed() method we
294 | Chapter 4: Classifying Images