Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
OutputOrderedDict([('linear.weight', tensor([[0.5406, 0.5869]])),('linear.bias', tensor([-0.1657]))])Did you notice that state_dict() contains parameters from the linear layer only?Even though the model has a second sigmoid layer, this layer does not contain anyparameters since it does not need to learn anything: The sigmoid function will bethe same regardless of which model it is a part of.LossWe already have a model, and now we need to define an appropriate loss for it. Abinary classification problem calls for the binary cross-entropy (BCE) loss,sometimes known as log loss.The BCE loss requires the predicted probabilities, as returned by the sigmoidfunction, and the true labels (y) for its computation. For each data point i in thetraining set, it starts by computing the error corresponding to the point’s trueclass.If the data point belongs to the positive class (y=1), we would like our model topredict a probability close to one, right? A perfect one would result in thelogarithm of one, which is zero. It makes sense; a perfect prediction means zeroloss. It goes like this:Equation 3.11 - Error for a data point in the positive classWhat if the data point belongs to the negative class (y=0)? Then we cannot simplyuse the predicted probability. Why not? Because the model outputs the probabilityof a point’s belonging to the positive, not the negative, class. Luckily, the latter canbe easily computed:Equation 3.12 - Probability of a data point’s belonging to the negative classAnd thus, the error associated with a data point’s belonging to the negative classLoss | 221
goes like this:Equation 3.13 - Error for a data point in the negative classOnce all errors are computed, they are aggregated into a loss value. For thebinary-cross entropy loss, we simply take the average of the errors and invert itssign.Equation 3.14 - Binary Cross-Entropy formula, the intuitive wayLet’s assume we have two dummy data points, one for each class. Then, let’spretend our model made predictions for them: 0.9 and 0.2. The predictions are notbad since it predicts a 90% probability of being positive for an actual positive, andonly 20% of being positive for an actual negative. How does this look in code? Hereit is:dummy_labels = torch.tensor([1.0, 0.0])dummy_predictions = torch.tensor([.9, .2])# Positive class (labels == 1)positive_pred = dummy_predictions[dummy_labels == 1]first_summation = torch.log(positive_pred).sum()# Negative class (labels == 0)negative_pred = dummy_predictions[dummy_labels == 0]second_summation = torch.log(1 - negative_pred).sum()# n_total = n_pos + n_negn_total = dummy_labels.size(0)loss = -(first_summation + second_summation) / n_totallossOutputtensor(0.1643)222 | Chapter 3: A Simple Classification Problem
- Page 196 and 197: Run - Model Configuration V31 # %lo
- Page 198 and 199: This is the general structure you
- Page 200 and 201: Chapter 2.1Going ClassySpoilersIn t
- Page 202 and 203: # A completely empty (and useless)
- Page 204 and 205: # These attributes are defined here
- Page 206 and 207: # Creates the train_step function f
- Page 208 and 209: # Builds function that performs a s
- Page 210 and 211: setattrThe setattr function sets th
- Page 212 and 213: See? We effectively modified the un
- Page 214 and 215: the random seed as arguments.This s
- Page 216 and 217: The current state of development of
- Page 218 and 219: Lossesdef plot_losses(self):fig = p
- Page 220 and 221: Run - Data Preparation V21 # %load
- Page 222 and 223: Model TrainingWe start by instantia
- Page 224 and 225: Making PredictionsLet’s make up s
- Page 226 and 227: OutputOrderedDict([('0.weight', ten
- Page 228 and 229: Run - Data Preparation V21 # %load
- Page 230 and 231: • defining our StepByStep class
- Page 232 and 233: import numpy as npimport torchimpor
- Page 234 and 235: Next, we’ll standardize the featu
- Page 236 and 237: Equation 3.1 - A linear regression
- Page 238 and 239: The odds ratio is given by the rati
- Page 240 and 241: As expected, probabilities that add
- Page 242 and 243: Sigmoid Functiondef sigmoid(z):retu
- Page 244 and 245: A picture is worth a thousand words
- Page 248 and 249: The first summation adds up the err
- Page 250 and 251: IMPORTANT: Make sure to pass the pr
- Page 252 and 253: To make it clear: In this chapter,
- Page 254 and 255: argument of nn.BCEWithLogitsLoss().
- Page 256 and 257: It is not that hard, to be honest.
- Page 258 and 259: Figure 3.6 - Training and validatio
- Page 260 and 261: Outputarray([[0.5504593 ],[0.949995
- Page 262 and 263: decision boundary.Look at the expre
- Page 264 and 265: Are my data points separable?That
- Page 266 and 267: model = nn.Sequential()model.add_mo
- Page 268 and 269: It looks like this:Figure 3.10 - Sp
- Page 270 and 271: True and False Positives and Negati
- Page 272 and 273: tpr_fpr(cm_thresh50)Output(0.909090
- Page 274 and 275: The trade-off between precision and
- Page 276 and 277: Figure 3.13 - Using a low threshold
- Page 278 and 279: Figure 3.16 - Trade-offs for two di
- Page 280 and 281: thresholds do not necessarily inclu
- Page 282 and 283: actual data, it is as bad as it can
- Page 284 and 285: If you want to learn more about bot
- Page 286 and 287: Model Training1 n_epochs = 10023 sb
- Page 288 and 289: step in your journey! What’s next
- Page 290 and 291: Chapter 4Classifying ImagesSpoilers
- Page 292 and 293: Data GenerationOur images are quite
- Page 294 and 295: Images and ChannelsIn case you’re
Output
OrderedDict([('linear.weight', tensor([[0.5406, 0.5869]])),
('linear.bias', tensor([-0.1657]))])
Did you notice that state_dict() contains parameters from the linear layer only?
Even though the model has a second sigmoid layer, this layer does not contain any
parameters since it does not need to learn anything: The sigmoid function will be
the same regardless of which model it is a part of.
Loss
We already have a model, and now we need to define an appropriate loss for it. A
binary classification problem calls for the binary cross-entropy (BCE) loss,
sometimes known as log loss.
The BCE loss requires the predicted probabilities, as returned by the sigmoid
function, and the true labels (y) for its computation. For each data point i in the
training set, it starts by computing the error corresponding to the point’s true
class.
If the data point belongs to the positive class (y=1), we would like our model to
predict a probability close to one, right? A perfect one would result in the
logarithm of one, which is zero. It makes sense; a perfect prediction means zero
loss. It goes like this:
Equation 3.11 - Error for a data point in the positive class
What if the data point belongs to the negative class (y=0)? Then we cannot simply
use the predicted probability. Why not? Because the model outputs the probability
of a point’s belonging to the positive, not the negative, class. Luckily, the latter can
be easily computed:
Equation 3.12 - Probability of a data point’s belonging to the negative class
And thus, the error associated with a data point’s belonging to the negative class
Loss | 221