Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
IMPORTANT: I can’t stress this enough: You must use the rightcombination of model and loss function!Option 1: nn.LogSoftmax as the last layer, meaning your model isproducing log probabilities, combined with the nn.NLLLoss()function.Option 2: No logsoftmax in the last layer, meaning your model isproducing logits, combined with the nn.CrossEntropyLoss()function.Mixing nn.LogSoftmax and nn.CrossEntropyLoss() is just wrong.Now that the difference between the arguments is clear, let’s take a closer look atthe nn.CrossEntropyLoss() function. It is a higher-order function, and it takes thesame three optional arguments as nn.NLLLoss():• reduction: It takes either mean, sum, or none, and the default is mean.• weight: It takes a tensor of length C; that is, containing as many weights asthere are classes.• ignore_index: It takes one integer, which corresponds to the one (and onlyone) class index that should be ignored.Let’s see a quick example of its usage, taking dummy logits as input:torch.manual_seed(11)dummy_logits = torch.randn((5, 3))dummy_labels = torch.tensor([0, 0, 1, 2, 1])loss_fn = nn.CrossEntropyLoss()loss_fn(dummy_logits, dummy_labels)Outputtensor(1.6553)No logsoftmax whatsoever, but the same resulting loss, as expected.A Multiclass Classification Problem | 383
Classification Losses Showdown!Honestly, I always feel this whole thing is a bit confusing, especially for someonewho’s learning it for the first time.Which loss functions take logits as inputs? Should I add a (log)softmax layer or not?Can I use the weight argument to handle imbalanced datasets? Too manyquestions, right?So, here is a table to help you figure out the landscape of loss functions forclassification problems, both binary and multiclass:BCE LossBCE WithNLL LossCross-EntropyLogits LossLossClassification binary binary multiclass multiclassInput (eachdata point)probability logit array of logprobabilitiesarray of logitsLabel (eachfloat (0.0 orfloat (0.0 orlong (classlong (classdata point)1.0)1.0)index)index)Model’s lastlayerSigmoid - LogSoftmax -weightnot classnot classclass weightsclass weightsargumentweightsweightspos_weightargumentn/a "weighted" loss n/a n/aModel ConfigurationLet’s build our first convolutional neural network for real! We can use the typicalconvolutional block: convolutional layer, activation function, pooling layer. Ourimages are quite small, so we only need one of those.We still need to decide how many channels our convolutional layer is going toproduce. In general, the number of channels increases with each convolutionalblock. For the sake of simplicity (and later visualization), let’s keep a single channel.We also need to decide on a kernel size (the receptive field or gray regions in the384 | Chapter 5: Convolutions
- Page 358 and 359: Affine TransformationsAn affine tra
- Page 360 and 361: Figure B.3 - Annotated model diagra
- Page 362 and 363: Figure B.5 - In the beginning…But
- Page 364 and 365: OK, now we can clearly see a differ
- Page 366 and 367: In the model above, the sigmoid fun
- Page 368 and 369: the more dimensions, the more separ
- Page 370 and 371: import randomimport numpy as npfrom
- Page 372 and 373: identity = np.array([[[[0, 0, 0],[0
- Page 374 and 375: Figure 5.4 - Striding the image, on
- Page 376 and 377: Output-----------------------------
- Page 378 and 379: Outputtensor([[[[9., 5., 0., 7.],[0
- Page 380 and 381: OutputParameter containing:tensor([
- Page 382 and 383: Moreover, notice that if we were to
- Page 384 and 385: In code, as usual, PyTorch gives us
- Page 386 and 387: Outputtensor([[[[5., 5., 0., 8., 7.
- Page 388 and 389: edge = np.array([[[[0, 1, 0],[1, -4
- Page 390 and 391: A pooling kernel of two-by-two resu
- Page 392 and 393: Outputtensor([[22., 23., 11., 24.,
- Page 394 and 395: Figure 5.15 - LeNet-5 architectureS
- Page 396 and 397: • second block: produces 16-chann
- Page 398 and 399: Transformed Dataset1 class Transfor
- Page 400 and 401: LossNew problem, new loss. Since we
- Page 402 and 403: Outputtensor([4.0000, 1.0000, 0.500
- Page 404 and 405: The loss only considers the predict
- Page 406 and 407: Outputtensor([[-1.5229, -0.3146, -2
- Page 410 and 411: figures at the beginning of this ch
- Page 412 and 413: The three units in the output layer
- Page 414 and 415: StepByStep Method@staticmethoddef _
- Page 416 and 417: The meow() method is totally indepe
- Page 418 and 419: StepByStep Methoddef visualize_filt
- Page 420 and 421: dummy_model = nn.Linear(1, 1)dummy_
- Page 422 and 423: dummy_listOutput[(Linear(in_feature
- Page 424 and 425: Output{Conv2d(1, 1, kernel_size=(3,
- Page 426 and 427: will be the externally defined vari
- Page 428 and 429: Removing Hookssbs_cnn1.remove_hooks
- Page 430 and 431: return figsetattr(StepByStep, 'visu
- Page 432 and 433: Figure 5.22 - Feature maps (classif
- Page 434 and 435: classification: The predicted class
- Page 436 and 437: convolutional layers to our model a
- Page 438 and 439: Capturing Outputsfeaturizer_layers
- Page 440 and 441: the filters learned by the model pr
- Page 442 and 443: given chapter are imported at its v
- Page 444 and 445: Data PreparationThe data preparatio
- Page 446 and 447: model anyway. We’ll use it to com
- Page 448 and 449: StepByStep Method@staticmethoddef m
- Page 450 and 451: "What’s wrong with the colors?"Th
- Page 452 and 453: three_channel_filter = np.array([[[
- Page 454 and 455: Fancier Model (Constructor)class CN
- Page 456 and 457: Fancier Model (Classifier)def class
Classification Losses Showdown!
Honestly, I always feel this whole thing is a bit confusing, especially for someone
who’s learning it for the first time.
Which loss functions take logits as inputs? Should I add a (log)softmax layer or not?
Can I use the weight argument to handle imbalanced datasets? Too many
questions, right?
So, here is a table to help you figure out the landscape of loss functions for
classification problems, both binary and multiclass:
BCE Loss
BCE With
NLL Loss
Cross-Entropy
Logits Loss
Loss
Classification binary binary multiclass multiclass
Input (each
data point)
probability logit array of log
probabilities
array of logits
Label (each
float (0.0 or
float (0.0 or
long (class
long (class
data point)
1.0)
1.0)
index)
index)
Model’s last
layer
Sigmoid - LogSoftmax -
weight
not class
not class
class weights
class weights
argument
weights
weights
pos_weight
argument
n/a "weighted" loss n/a n/a
Model Configuration
Let’s build our first convolutional neural network for real! We can use the typical
convolutional block: convolutional layer, activation function, pooling layer. Our
images are quite small, so we only need one of those.
We still need to decide how many channels our convolutional layer is going to
produce. In general, the number of channels increases with each convolutional
block. For the sake of simplicity (and later visualization), let’s keep a single channel.
We also need to decide on a kernel size (the receptive field or gray regions in the
384 | Chapter 5: Convolutions