22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Figure 4.18 - Losses (before and after activations)

It took only a handful of epochs for our new model to outperform the previous one.

Clearly, this model is not equivalent to a logistic regression: It is much, much

better.

To be completely honest with you, both models are kinda crappy. They perform

quite poorly if you look at their accuracies (ranging from 43% to 65% for the

validation set). The sole purpose of this exercise is to demonstrate that activation

functions, by breaking the equivalence to a logistic regression, are capable of

achieving better results in minimizing losses.

This particular model also exhibits a validation loss lower than the training loss,

which isn’t what you generally expect. We’ve already seen a case like this in

Chapter 3: The validation set was easier than the training set. The current example

is a bit more nuanced than that—here is the explanation:

• Short version: This is a quirk!

• Long version: First, our model is not so great and has a tendency to predict

more points in the positive class (high FPR and TPR); second, one of the minibatches

from the validation set has almost all of its points in the positive class,

so its loss is very low; third, there are only four mini-batches in the validation

set, so the average loss is easily affected by a single mini-batch.

It’s time to ask ourselves two questions:

• Why is the equivalence to a logistic regression broken?

• What exactly are the activation functions doing under the hood?

The first question is answered in the next subsection, "Show Me the Math Again!"

Deep Model | 323

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!