Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

IMPORTANT: I can’t stress this enough: You must use the rightcombination of model and loss function!Option 1: nn.LogSoftmax as the last layer, meaning your model isproducing log probabilities, combined with the nn.NLLLoss()function.Option 2: No logsoftmax in the last layer, meaning your model isproducing logits, combined with the nn.CrossEntropyLoss()function.Mixing nn.LogSoftmax and nn.CrossEntropyLoss() is just wrong.Now that the difference between the arguments is clear, let’s take a closer look atthe nn.CrossEntropyLoss() function. It is a higher-order function, and it takes thesame three optional arguments as nn.NLLLoss():• reduction: It takes either mean, sum, or none, and the default is mean.• weight: It takes a tensor of length C; that is, containing as many weights asthere are classes.• ignore_index: It takes one integer, which corresponds to the one (and onlyone) class index that should be ignored.Let’s see a quick example of its usage, taking dummy logits as input:torch.manual_seed(11)dummy_logits = torch.randn((5, 3))dummy_labels = torch.tensor([0, 0, 1, 2, 1])loss_fn = nn.CrossEntropyLoss()loss_fn(dummy_logits, dummy_labels)Outputtensor(1.6553)No logsoftmax whatsoever, but the same resulting loss, as expected.A Multiclass Classification Problem | 383

Classification Losses Showdown!Honestly, I always feel this whole thing is a bit confusing, especially for someonewho’s learning it for the first time.Which loss functions take logits as inputs? Should I add a (log)softmax layer or not?Can I use the weight argument to handle imbalanced datasets? Too manyquestions, right?So, here is a table to help you figure out the landscape of loss functions forclassification problems, both binary and multiclass:BCE LossBCE WithNLL LossCross-EntropyLogits LossLossClassification binary binary multiclass multiclassInput (eachdata point)probability logit array of logprobabilitiesarray of logitsLabel (eachfloat (0.0 orfloat (0.0 orlong (classlong (classdata point)1.0)1.0)index)index)Model’s lastlayerSigmoid - LogSoftmax -weightnot classnot classclass weightsclass weightsargumentweightsweightspos_weightargumentn/a "weighted" loss n/a n/aModel ConfigurationLet’s build our first convolutional neural network for real! We can use the typicalconvolutional block: convolutional layer, activation function, pooling layer. Ourimages are quite small, so we only need one of those.We still need to decide how many channels our convolutional layer is going toproduce. In general, the number of channels increases with each convolutionalblock. For the sake of simplicity (and later visualization), let’s keep a single channel.We also need to decide on a kernel size (the receptive field or gray regions in the384 | Chapter 5: Convolutions

Classification Losses Showdown!

Honestly, I always feel this whole thing is a bit confusing, especially for someone

who’s learning it for the first time.

Which loss functions take logits as inputs? Should I add a (log)softmax layer or not?

Can I use the weight argument to handle imbalanced datasets? Too many

questions, right?

So, here is a table to help you figure out the landscape of loss functions for

classification problems, both binary and multiclass:

BCE Loss

BCE With

NLL Loss

Cross-Entropy

Logits Loss

Loss

Classification binary binary multiclass multiclass

Input (each

data point)

probability logit array of log

probabilities

array of logits

Label (each

float (0.0 or

float (0.0 or

long (class

long (class

data point)

1.0)

1.0)

index)

index)

Model’s last

layer

Sigmoid - LogSoftmax -

weight

not class

not class

class weights

class weights

argument

weights

weights

pos_weight

argument

n/a "weighted" loss n/a n/a

Model Configuration

Let’s build our first convolutional neural network for real! We can use the typical

convolutional block: convolutional layer, activation function, pooling layer. Our

images are quite small, so we only need one of those.

We still need to decide how many channels our convolutional layer is going to

produce. In general, the number of channels increases with each convolutional

block. For the sake of simplicity (and later visualization), let’s keep a single channel.

We also need to decide on a kernel size (the receptive field or gray regions in the

384 | Chapter 5: Convolutions

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!