Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

The first summation adds up the errors corresponding to the points in the positiveclass. The second summation adds up the errors corresponding to the points in thenegative class. I believe the formula above is quite straightforward and easy tounderstand. Unfortunately, it is usually skipped over, and only its equivalent ispresented:Equation 3.15 - Binary Cross-Entropy formula, the clever wayThe formula above is a clever way of computing the loss in a single expression, sure,but the split of positive and negative points is less obvious. If you pause for aminute, you’ll realize that points in the positive class (y=1) have their second termequal zero, while points in the negative class (y=0) have their first term equal zero.Let’s see how it looks in code:summation = torch.sum(dummy_labels * torch.log(dummy_predictions) +(1 - dummy_labels) * torch.log(1 - dummy_predictions))loss = -summation / n_totallossOutputtensor(0.1643)Of course, we got the same loss (0.1643) as before.For a very detailed explanation of the rationale behind this loss function, make sureto check my post: "Understanding binary cross-entropy / log loss: a visualexplanation." [70]BCELossSure enough, PyTorch implements the binary cross-entropy loss, nn.BCELoss(). Justlike its regression counterpart, nn.MSELoss(), introduced in Chapter 1, it is a higherorderfunction that returns the actual loss function.Loss | 223

The nn.BCELoss() higher-order function takes two optional arguments (the othersare deprecated, and you can safely ignore them):• reduction: It takes either mean, sum, or none. The default mean corresponds toour Equation 3.15 above. As expected, sum will return the sum of the errors,instead of the average. The last option, none, corresponds to the unreducedform; that is, it returns the full array of errors.• weight: The default is none, meaning every data point has equal weight. Ifsupplied, it needs to be a tensor with a size equal to the number of elements in amini-batch, representing the weights assigned to each element in the batch. Inother words, this argument allows you to assign different weights to eachelement of the current batch, based on its position. So, the first element wouldhave a given weight, the second element would have a different weight, and soon, regardless of the actual class of that particular data point. Soundsconfusing? Weird? Yes, this is weird; I think so too. Of course, this is not uselessor a mistake, but the proper usage of this argument is a more advanced topicand outside the scope of this book.This argument DOES NOT help with weighting imbalanceddatasets! We’ll see how to handle that shortly.We’ll be sticking with the default arguments, corresponding to Equation 3.15above.loss_fn = nn.BCELoss(reduction='mean')loss_fnOutputBCELoss()As expected, nn.BCELoss() returned another function; that is, the actual lossfunction. The latter takes both predictions and labels to compute the loss.224 | Chapter 3: A Simple Classification Problem

The nn.BCELoss() higher-order function takes two optional arguments (the others

are deprecated, and you can safely ignore them):

• reduction: It takes either mean, sum, or none. The default mean corresponds to

our Equation 3.15 above. As expected, sum will return the sum of the errors,

instead of the average. The last option, none, corresponds to the unreduced

form; that is, it returns the full array of errors.

• weight: The default is none, meaning every data point has equal weight. If

supplied, it needs to be a tensor with a size equal to the number of elements in a

mini-batch, representing the weights assigned to each element in the batch. In

other words, this argument allows you to assign different weights to each

element of the current batch, based on its position. So, the first element would

have a given weight, the second element would have a different weight, and so

on, regardless of the actual class of that particular data point. Sounds

confusing? Weird? Yes, this is weird; I think so too. Of course, this is not useless

or a mistake, but the proper usage of this argument is a more advanced topic

and outside the scope of this book.

This argument DOES NOT help with weighting imbalanced

datasets! We’ll see how to handle that shortly.

We’ll be sticking with the default arguments, corresponding to Equation 3.15

above.

loss_fn = nn.BCELoss(reduction='mean')

loss_fn

Output

BCELoss()

As expected, nn.BCELoss() returned another function; that is, the actual loss

function. The latter takes both predictions and labels to compute the loss.

224 | Chapter 3: A Simple Classification Problem

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!