Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
The trade-off between precision and recallHere, too, there is no free lunch. The trade-off is a bit different, though.Let’s say false negatives are bad for our application, and we want to improverecall. Once again, let’s make a model that only predicts the positive class,using a threshold of zero. We get no false negatives whatsoever (becausethere aren’t any negatives in the first place). Our recall is 100%. Now you’reprobably waiting for the bad news, right?If all points are predicted to be positive, every negative example will be afalse positive. The precision is exactly the proportion of positive samples inthe dataset.What if false positives are the problem instead? We would like to increaseprecision. It’s time to make a model that only predicts the negative class byusing a threshold of one. We get no false positives whatsoever (becausethere aren’t any positives in the first place). Our precision is 100%.Of course, this is too good to be true. If all points are predicted to benegative, there are no true positives. Our recall is 0%.No free lunch, no cake, just another couple of useless models.There is one metric left to explore.AccuracyThis is the simplest and most intuitive of them all: how many times your model gotit right, considering all data points. Totally straightforward!In our example, the model got 17 points right out of a total of 20 data points. Itsaccuracy is 85%. Not bad, right? The higher the accuracy, the better, but it does nottell the whole story. If you have an imbalanced dataset, relying on accuracy can bemisleading.Let’s say we have 1,000 data points: 990 points are negative, and only 10 areClassification Threshold | 249
positive. Now, let’s take that model that uses a threshold of one and only predictsthe negative class. This way, we get all 990 negative points right at the cost of tenfalse negatives. This model’s accuracy is 99%. But the model is still useless becauseit will never get a positive example right.Accuracy may be misleading because it does not involve a trade-off with anothermetric, like the previous ones.Speaking of trade-offs…Trade-offs and CurvesWe already know there are trade-offs between true and false positive rates, as wellas between precision and recall. We also know that there are many confusionmatrices, one for each threshold. What if we combine these two pieces ofinformation? I present to you the receiver operating characteristic (ROC) andprecision-recall (PR) curves! Well, they are not curves yet, but they will be soonenough!Figure 3.12 - Trade-offs for a threshold of 50%We’ve already computed TPR (recall) (91%), FPR (22%), and precision (83%) for ourmodel using the threshold of 50%. If we plot them, we’ll get the figure above.Time to try different thresholds.Low ThresholdWhat about 30%? If the predicted probability is greater than or equal to 30%, weclassify the data point as positive, and as negative otherwise. That’s a very loosethreshold since we don’t require the model to be very confident to consider a datapoint to be positive. What can we expect from it? More false positives, fewer falsenegatives.250 | Chapter 3: A Simple Classification Problem
- Page 224 and 225: Making PredictionsLet’s make up s
- Page 226 and 227: OutputOrderedDict([('0.weight', ten
- Page 228 and 229: Run - Data Preparation V21 # %load
- Page 230 and 231: • defining our StepByStep class
- Page 232 and 233: import numpy as npimport torchimpor
- Page 234 and 235: Next, we’ll standardize the featu
- Page 236 and 237: Equation 3.1 - A linear regression
- Page 238 and 239: The odds ratio is given by the rati
- Page 240 and 241: As expected, probabilities that add
- Page 242 and 243: Sigmoid Functiondef sigmoid(z):retu
- Page 244 and 245: A picture is worth a thousand words
- Page 246 and 247: OutputOrderedDict([('linear.weight'
- Page 248 and 249: The first summation adds up the err
- Page 250 and 251: IMPORTANT: Make sure to pass the pr
- Page 252 and 253: To make it clear: In this chapter,
- Page 254 and 255: argument of nn.BCEWithLogitsLoss().
- Page 256 and 257: It is not that hard, to be honest.
- Page 258 and 259: Figure 3.6 - Training and validatio
- Page 260 and 261: Outputarray([[0.5504593 ],[0.949995
- Page 262 and 263: decision boundary.Look at the expre
- Page 264 and 265: Are my data points separable?That
- Page 266 and 267: model = nn.Sequential()model.add_mo
- Page 268 and 269: It looks like this:Figure 3.10 - Sp
- Page 270 and 271: True and False Positives and Negati
- Page 272 and 273: tpr_fpr(cm_thresh50)Output(0.909090
- Page 276 and 277: Figure 3.13 - Using a low threshold
- Page 278 and 279: Figure 3.16 - Trade-offs for two di
- Page 280 and 281: thresholds do not necessarily inclu
- Page 282 and 283: actual data, it is as bad as it can
- Page 284 and 285: If you want to learn more about bot
- Page 286 and 287: Model Training1 n_epochs = 10023 sb
- Page 288 and 289: step in your journey! What’s next
- Page 290 and 291: Chapter 4Classifying ImagesSpoilers
- Page 292 and 293: Data GenerationOur images are quite
- Page 294 and 295: Images and ChannelsIn case you’re
- Page 296 and 297: image_rgb = np.stack([image_r, imag
- Page 298 and 299: That’s fairly straightforward; we
- Page 300 and 301: • Transformations based on Tensor
- Page 302 and 303: position of an object in a picture
- Page 304 and 305: Outputtensor([[[0., 0., 0., 1., 0.]
- Page 306 and 307: Outputtensor([[[-1., -1., -1., 1.,
- Page 308 and 309: We can convert the former into the
- Page 310 and 311: composer = Compose([RandomHorizonta
- Page 312 and 313: Output<torch.utils.data.dataset.Sub
- Page 314 and 315: train_composer = Compose([RandomHor
- Page 316 and 317: The minority class should have the
- Page 318 and 319: train_loader = DataLoader(dataset=t
- Page 320 and 321: implemented in Chapter 2.1? Let’s
- Page 322 and 323: Let’s take one mini-batch of imag
The trade-off between precision and recall
Here, too, there is no free lunch. The trade-off is a bit different, though.
Let’s say false negatives are bad for our application, and we want to improve
recall. Once again, let’s make a model that only predicts the positive class,
using a threshold of zero. We get no false negatives whatsoever (because
there aren’t any negatives in the first place). Our recall is 100%. Now you’re
probably waiting for the bad news, right?
If all points are predicted to be positive, every negative example will be a
false positive. The precision is exactly the proportion of positive samples in
the dataset.
What if false positives are the problem instead? We would like to increase
precision. It’s time to make a model that only predicts the negative class by
using a threshold of one. We get no false positives whatsoever (because
there aren’t any positives in the first place). Our precision is 100%.
Of course, this is too good to be true. If all points are predicted to be
negative, there are no true positives. Our recall is 0%.
No free lunch, no cake, just another couple of useless models.
There is one metric left to explore.
Accuracy
This is the simplest and most intuitive of them all: how many times your model got
it right, considering all data points. Totally straightforward!
In our example, the model got 17 points right out of a total of 20 data points. Its
accuracy is 85%. Not bad, right? The higher the accuracy, the better, but it does not
tell the whole story. If you have an imbalanced dataset, relying on accuracy can be
misleading.
Let’s say we have 1,000 data points: 990 points are negative, and only 10 are
Classification Threshold | 249