Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
The first summation adds up the errors corresponding to the points in the positiveclass. The second summation adds up the errors corresponding to the points in thenegative class. I believe the formula above is quite straightforward and easy tounderstand. Unfortunately, it is usually skipped over, and only its equivalent ispresented:Equation 3.15 - Binary Cross-Entropy formula, the clever wayThe formula above is a clever way of computing the loss in a single expression, sure,but the split of positive and negative points is less obvious. If you pause for aminute, you’ll realize that points in the positive class (y=1) have their second termequal zero, while points in the negative class (y=0) have their first term equal zero.Let’s see how it looks in code:summation = torch.sum(dummy_labels * torch.log(dummy_predictions) +(1 - dummy_labels) * torch.log(1 - dummy_predictions))loss = -summation / n_totallossOutputtensor(0.1643)Of course, we got the same loss (0.1643) as before.For a very detailed explanation of the rationale behind this loss function, make sureto check my post: "Understanding binary cross-entropy / log loss: a visualexplanation." [70]BCELossSure enough, PyTorch implements the binary cross-entropy loss, nn.BCELoss(). Justlike its regression counterpart, nn.MSELoss(), introduced in Chapter 1, it is a higherorderfunction that returns the actual loss function.Loss | 223
The nn.BCELoss() higher-order function takes two optional arguments (the othersare deprecated, and you can safely ignore them):• reduction: It takes either mean, sum, or none. The default mean corresponds toour Equation 3.15 above. As expected, sum will return the sum of the errors,instead of the average. The last option, none, corresponds to the unreducedform; that is, it returns the full array of errors.• weight: The default is none, meaning every data point has equal weight. Ifsupplied, it needs to be a tensor with a size equal to the number of elements in amini-batch, representing the weights assigned to each element in the batch. Inother words, this argument allows you to assign different weights to eachelement of the current batch, based on its position. So, the first element wouldhave a given weight, the second element would have a different weight, and soon, regardless of the actual class of that particular data point. Soundsconfusing? Weird? Yes, this is weird; I think so too. Of course, this is not uselessor a mistake, but the proper usage of this argument is a more advanced topicand outside the scope of this book.This argument DOES NOT help with weighting imbalanceddatasets! We’ll see how to handle that shortly.We’ll be sticking with the default arguments, corresponding to Equation 3.15above.loss_fn = nn.BCELoss(reduction='mean')loss_fnOutputBCELoss()As expected, nn.BCELoss() returned another function; that is, the actual lossfunction. The latter takes both predictions and labels to compute the loss.224 | Chapter 3: A Simple Classification Problem
- Page 198 and 199: This is the general structure you
- Page 200 and 201: Chapter 2.1Going ClassySpoilersIn t
- Page 202 and 203: # A completely empty (and useless)
- Page 204 and 205: # These attributes are defined here
- Page 206 and 207: # Creates the train_step function f
- Page 208 and 209: # Builds function that performs a s
- Page 210 and 211: setattrThe setattr function sets th
- Page 212 and 213: See? We effectively modified the un
- Page 214 and 215: the random seed as arguments.This s
- Page 216 and 217: The current state of development of
- Page 218 and 219: Lossesdef plot_losses(self):fig = p
- Page 220 and 221: Run - Data Preparation V21 # %load
- Page 222 and 223: Model TrainingWe start by instantia
- Page 224 and 225: Making PredictionsLet’s make up s
- Page 226 and 227: OutputOrderedDict([('0.weight', ten
- Page 228 and 229: Run - Data Preparation V21 # %load
- Page 230 and 231: • defining our StepByStep class
- Page 232 and 233: import numpy as npimport torchimpor
- Page 234 and 235: Next, we’ll standardize the featu
- Page 236 and 237: Equation 3.1 - A linear regression
- Page 238 and 239: The odds ratio is given by the rati
- Page 240 and 241: As expected, probabilities that add
- Page 242 and 243: Sigmoid Functiondef sigmoid(z):retu
- Page 244 and 245: A picture is worth a thousand words
- Page 246 and 247: OutputOrderedDict([('linear.weight'
- Page 250 and 251: IMPORTANT: Make sure to pass the pr
- Page 252 and 253: To make it clear: In this chapter,
- Page 254 and 255: argument of nn.BCEWithLogitsLoss().
- Page 256 and 257: It is not that hard, to be honest.
- Page 258 and 259: Figure 3.6 - Training and validatio
- Page 260 and 261: Outputarray([[0.5504593 ],[0.949995
- Page 262 and 263: decision boundary.Look at the expre
- Page 264 and 265: Are my data points separable?That
- Page 266 and 267: model = nn.Sequential()model.add_mo
- Page 268 and 269: It looks like this:Figure 3.10 - Sp
- Page 270 and 271: True and False Positives and Negati
- Page 272 and 273: tpr_fpr(cm_thresh50)Output(0.909090
- Page 274 and 275: The trade-off between precision and
- Page 276 and 277: Figure 3.13 - Using a low threshold
- Page 278 and 279: Figure 3.16 - Trade-offs for two di
- Page 280 and 281: thresholds do not necessarily inclu
- Page 282 and 283: actual data, it is as bad as it can
- Page 284 and 285: If you want to learn more about bot
- Page 286 and 287: Model Training1 n_epochs = 10023 sb
- Page 288 and 289: step in your journey! What’s next
- Page 290 and 291: Chapter 4Classifying ImagesSpoilers
- Page 292 and 293: Data GenerationOur images are quite
- Page 294 and 295: Images and ChannelsIn case you’re
- Page 296 and 297: image_rgb = np.stack([image_r, imag
The nn.BCELoss() higher-order function takes two optional arguments (the others
are deprecated, and you can safely ignore them):
• reduction: It takes either mean, sum, or none. The default mean corresponds to
our Equation 3.15 above. As expected, sum will return the sum of the errors,
instead of the average. The last option, none, corresponds to the unreduced
form; that is, it returns the full array of errors.
• weight: The default is none, meaning every data point has equal weight. If
supplied, it needs to be a tensor with a size equal to the number of elements in a
mini-batch, representing the weights assigned to each element in the batch. In
other words, this argument allows you to assign different weights to each
element of the current batch, based on its position. So, the first element would
have a given weight, the second element would have a different weight, and so
on, regardless of the actual class of that particular data point. Sounds
confusing? Weird? Yes, this is weird; I think so too. Of course, this is not useless
or a mistake, but the proper usage of this argument is a more advanced topic
and outside the scope of this book.
This argument DOES NOT help with weighting imbalanced
datasets! We’ll see how to handle that shortly.
We’ll be sticking with the default arguments, corresponding to Equation 3.15
above.
loss_fn = nn.BCELoss(reduction='mean')
loss_fn
Output
BCELoss()
As expected, nn.BCELoss() returned another function; that is, the actual loss
function. The latter takes both predictions and labels to compute the loss.
224 | Chapter 3: A Simple Classification Problem