Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

OutputOrderedDict([('linear.weight', tensor([[0.5406, 0.5869]])),('linear.bias', tensor([-0.1657]))])Did you notice that state_dict() contains parameters from the linear layer only?Even though the model has a second sigmoid layer, this layer does not contain anyparameters since it does not need to learn anything: The sigmoid function will bethe same regardless of which model it is a part of.LossWe already have a model, and now we need to define an appropriate loss for it. Abinary classification problem calls for the binary cross-entropy (BCE) loss,sometimes known as log loss.The BCE loss requires the predicted probabilities, as returned by the sigmoidfunction, and the true labels (y) for its computation. For each data point i in thetraining set, it starts by computing the error corresponding to the point’s trueclass.If the data point belongs to the positive class (y=1), we would like our model topredict a probability close to one, right? A perfect one would result in thelogarithm of one, which is zero. It makes sense; a perfect prediction means zeroloss. It goes like this:Equation 3.11 - Error for a data point in the positive classWhat if the data point belongs to the negative class (y=0)? Then we cannot simplyuse the predicted probability. Why not? Because the model outputs the probabilityof a point’s belonging to the positive, not the negative, class. Luckily, the latter canbe easily computed:Equation 3.12 - Probability of a data point’s belonging to the negative classAnd thus, the error associated with a data point’s belonging to the negative classLoss | 221

goes like this:Equation 3.13 - Error for a data point in the negative classOnce all errors are computed, they are aggregated into a loss value. For thebinary-cross entropy loss, we simply take the average of the errors and invert itssign.Equation 3.14 - Binary Cross-Entropy formula, the intuitive wayLet’s assume we have two dummy data points, one for each class. Then, let’spretend our model made predictions for them: 0.9 and 0.2. The predictions are notbad since it predicts a 90% probability of being positive for an actual positive, andonly 20% of being positive for an actual negative. How does this look in code? Hereit is:dummy_labels = torch.tensor([1.0, 0.0])dummy_predictions = torch.tensor([.9, .2])# Positive class (labels == 1)positive_pred = dummy_predictions[dummy_labels == 1]first_summation = torch.log(positive_pred).sum()# Negative class (labels == 0)negative_pred = dummy_predictions[dummy_labels == 0]second_summation = torch.log(1 - negative_pred).sum()# n_total = n_pos + n_negn_total = dummy_labels.size(0)loss = -(first_summation + second_summation) / n_totallossOutputtensor(0.1643)222 | Chapter 3: A Simple Classification Problem

Output

OrderedDict([('linear.weight', tensor([[0.5406, 0.5869]])),

('linear.bias', tensor([-0.1657]))])

Did you notice that state_dict() contains parameters from the linear layer only?

Even though the model has a second sigmoid layer, this layer does not contain any

parameters since it does not need to learn anything: The sigmoid function will be

the same regardless of which model it is a part of.

Loss

We already have a model, and now we need to define an appropriate loss for it. A

binary classification problem calls for the binary cross-entropy (BCE) loss,

sometimes known as log loss.

The BCE loss requires the predicted probabilities, as returned by the sigmoid

function, and the true labels (y) for its computation. For each data point i in the

training set, it starts by computing the error corresponding to the point’s true

class.

If the data point belongs to the positive class (y=1), we would like our model to

predict a probability close to one, right? A perfect one would result in the

logarithm of one, which is zero. It makes sense; a perfect prediction means zero

loss. It goes like this:

Equation 3.11 - Error for a data point in the positive class

What if the data point belongs to the negative class (y=0)? Then we cannot simply

use the predicted probability. Why not? Because the model outputs the probability

of a point’s belonging to the positive, not the negative, class. Luckily, the latter can

be easily computed:

Equation 3.12 - Probability of a data point’s belonging to the negative class

And thus, the error associated with a data point’s belonging to the negative class

Loss | 221

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!