Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

The trade-off between precision and recallHere, too, there is no free lunch. The trade-off is a bit different, though.Let’s say false negatives are bad for our application, and we want to improverecall. Once again, let’s make a model that only predicts the positive class,using a threshold of zero. We get no false negatives whatsoever (becausethere aren’t any negatives in the first place). Our recall is 100%. Now you’reprobably waiting for the bad news, right?If all points are predicted to be positive, every negative example will be afalse positive. The precision is exactly the proportion of positive samples inthe dataset.What if false positives are the problem instead? We would like to increaseprecision. It’s time to make a model that only predicts the negative class byusing a threshold of one. We get no false positives whatsoever (becausethere aren’t any positives in the first place). Our precision is 100%.Of course, this is too good to be true. If all points are predicted to benegative, there are no true positives. Our recall is 0%.No free lunch, no cake, just another couple of useless models.There is one metric left to explore.AccuracyThis is the simplest and most intuitive of them all: how many times your model gotit right, considering all data points. Totally straightforward!In our example, the model got 17 points right out of a total of 20 data points. Itsaccuracy is 85%. Not bad, right? The higher the accuracy, the better, but it does nottell the whole story. If you have an imbalanced dataset, relying on accuracy can bemisleading.Let’s say we have 1,000 data points: 990 points are negative, and only 10 areClassification Threshold | 249

positive. Now, let’s take that model that uses a threshold of one and only predictsthe negative class. This way, we get all 990 negative points right at the cost of tenfalse negatives. This model’s accuracy is 99%. But the model is still useless becauseit will never get a positive example right.Accuracy may be misleading because it does not involve a trade-off with anothermetric, like the previous ones.Speaking of trade-offs…Trade-offs and CurvesWe already know there are trade-offs between true and false positive rates, as wellas between precision and recall. We also know that there are many confusionmatrices, one for each threshold. What if we combine these two pieces ofinformation? I present to you the receiver operating characteristic (ROC) andprecision-recall (PR) curves! Well, they are not curves yet, but they will be soonenough!Figure 3.12 - Trade-offs for a threshold of 50%We’ve already computed TPR (recall) (91%), FPR (22%), and precision (83%) for ourmodel using the threshold of 50%. If we plot them, we’ll get the figure above.Time to try different thresholds.Low ThresholdWhat about 30%? If the predicted probability is greater than or equal to 30%, weclassify the data point as positive, and as negative otherwise. That’s a very loosethreshold since we don’t require the model to be very confident to consider a datapoint to be positive. What can we expect from it? More false positives, fewer falsenegatives.250 | Chapter 3: A Simple Classification Problem

The trade-off between precision and recall

Here, too, there is no free lunch. The trade-off is a bit different, though.

Let’s say false negatives are bad for our application, and we want to improve

recall. Once again, let’s make a model that only predicts the positive class,

using a threshold of zero. We get no false negatives whatsoever (because

there aren’t any negatives in the first place). Our recall is 100%. Now you’re

probably waiting for the bad news, right?

If all points are predicted to be positive, every negative example will be a

false positive. The precision is exactly the proportion of positive samples in

the dataset.

What if false positives are the problem instead? We would like to increase

precision. It’s time to make a model that only predicts the negative class by

using a threshold of one. We get no false positives whatsoever (because

there aren’t any positives in the first place). Our precision is 100%.

Of course, this is too good to be true. If all points are predicted to be

negative, there are no true positives. Our recall is 0%.

No free lunch, no cake, just another couple of useless models.

There is one metric left to explore.

Accuracy

This is the simplest and most intuitive of them all: how many times your model got

it right, considering all data points. Totally straightforward!

In our example, the model got 17 points right out of a total of 20 data points. Its

accuracy is 85%. Not bad, right? The higher the accuracy, the better, but it does not

tell the whole story. If you have an imbalanced dataset, relying on accuracy can be

misleading.

Let’s say we have 1,000 data points: 990 points are negative, and only 10 are

Classification Threshold | 249

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!