Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
Model Training1 n_epochs = 10023 sbs = StepByStep(model, loss_fn, optimizer)4 sbs.set_loaders(train_loader, val_loader)5 sbs.train(n_epochs)print(model.state_dict())OutputOrderedDict([('linear.weight', tensor([[ 1.1822, -1.8684]], device='cuda:0')),('linear.bias', tensor([-0.0587], device='cuda:0'))])Evaluatinglogits_val = sbs.predict(X_val)probabilities_val = sigmoid(logits_val).squeeze()cm_thresh50 = confusion_matrix(y_val, (probabilities_val >= 0.5))cm_thresh50Outputarray([[ 7, 2],[ 1, 10]])RecapIn this chapter, we’ve gone through many concepts related to classificationproblems. This is what we’ve covered:• defining a binary classification problem• generating and preparing a toy dataset using Scikit-Learn’s make_moons()method• defining logits as the result of a linear combination of featuresRecap | 261
• understanding what odds ratios and log odds ratios are• figuring out we can interpret logits as log odds ratios• mapping logits into probabilities using a sigmoid function• defining a logistic regression as a simple neural network with a sigmoidfunction in the output• understanding the binary cross-entropy loss and its PyTorch implementationnn.BCELoss()• understanding the difference between nn.BCELoss() andnn.BCEWithLogitsLoss()• highlighting the importance of choosing the correct combination of the lastlayer and loss function• using PyTorch’s loss functions' arguments to handle imbalanced datasets• configuring model, loss function, and optimizer for a classification problem• training a model using the StepByStep class• understanding that the validation loss may be lower than the training loss• making predictions and mapping predicted logits to probabilities• using a classification threshold to convert probabilities into classes• understanding the definition of a decision boundary• understanding the concept of separability of classes and how it’s related todimensionality• exploring different classification thresholds and their effect on the confusionmatrix• reviewing typical metrics for evaluating classification algorithms, like true andfalse positive rates, precision, and recall• building ROC and precision-recall curves out of metrics computed for multiplethresholds• understanding the reason behind the quirk of losing precision while raising theclassification threshold• defining the best and worst possible ROC and PR curves• using the area under the curve to compare different modelsWow! That’s a whole lot of material! Congratulations on finishing yet another big262 | Chapter 3: A Simple Classification Problem
- Page 236 and 237: Equation 3.1 - A linear regression
- Page 238 and 239: The odds ratio is given by the rati
- Page 240 and 241: As expected, probabilities that add
- Page 242 and 243: Sigmoid Functiondef sigmoid(z):retu
- Page 244 and 245: A picture is worth a thousand words
- Page 246 and 247: OutputOrderedDict([('linear.weight'
- Page 248 and 249: The first summation adds up the err
- Page 250 and 251: IMPORTANT: Make sure to pass the pr
- Page 252 and 253: To make it clear: In this chapter,
- Page 254 and 255: argument of nn.BCEWithLogitsLoss().
- Page 256 and 257: It is not that hard, to be honest.
- Page 258 and 259: Figure 3.6 - Training and validatio
- Page 260 and 261: Outputarray([[0.5504593 ],[0.949995
- Page 262 and 263: decision boundary.Look at the expre
- Page 264 and 265: Are my data points separable?That
- Page 266 and 267: model = nn.Sequential()model.add_mo
- Page 268 and 269: It looks like this:Figure 3.10 - Sp
- Page 270 and 271: True and False Positives and Negati
- Page 272 and 273: tpr_fpr(cm_thresh50)Output(0.909090
- Page 274 and 275: The trade-off between precision and
- Page 276 and 277: Figure 3.13 - Using a low threshold
- Page 278 and 279: Figure 3.16 - Trade-offs for two di
- Page 280 and 281: thresholds do not necessarily inclu
- Page 282 and 283: actual data, it is as bad as it can
- Page 284 and 285: If you want to learn more about bot
- Page 288 and 289: step in your journey! What’s next
- Page 290 and 291: Chapter 4Classifying ImagesSpoilers
- Page 292 and 293: Data GenerationOur images are quite
- Page 294 and 295: Images and ChannelsIn case you’re
- Page 296 and 297: image_rgb = np.stack([image_r, imag
- Page 298 and 299: That’s fairly straightforward; we
- Page 300 and 301: • Transformations based on Tensor
- Page 302 and 303: position of an object in a picture
- Page 304 and 305: Outputtensor([[[0., 0., 0., 1., 0.]
- Page 306 and 307: Outputtensor([[[-1., -1., -1., 1.,
- Page 308 and 309: We can convert the former into the
- Page 310 and 311: composer = Compose([RandomHorizonta
- Page 312 and 313: Output<torch.utils.data.dataset.Sub
- Page 314 and 315: train_composer = Compose([RandomHor
- Page 316 and 317: The minority class should have the
- Page 318 and 319: train_loader = DataLoader(dataset=t
- Page 320 and 321: implemented in Chapter 2.1? Let’s
- Page 322 and 323: Let’s take one mini-batch of imag
- Page 324 and 325: What does our model look like? Visu
- Page 326 and 327: Model TrainingLet’s train our mod
- Page 328 and 329: preceding hidden layer to compute i
- Page 330 and 331: fig = sbs_nn.plot_losses()Figure 4.
- Page 332 and 333: Equation 4.2 - Equivalence of deep
- Page 334 and 335: w_nn_equiv = w_nn_output.mm(w_nn_hi
• understanding what odds ratios and log odds ratios are
• figuring out we can interpret logits as log odds ratios
• mapping logits into probabilities using a sigmoid function
• defining a logistic regression as a simple neural network with a sigmoid
function in the output
• understanding the binary cross-entropy loss and its PyTorch implementation
nn.BCELoss()
• understanding the difference between nn.BCELoss() and
nn.BCEWithLogitsLoss()
• highlighting the importance of choosing the correct combination of the last
layer and loss function
• using PyTorch’s loss functions' arguments to handle imbalanced datasets
• configuring model, loss function, and optimizer for a classification problem
• training a model using the StepByStep class
• understanding that the validation loss may be lower than the training loss
• making predictions and mapping predicted logits to probabilities
• using a classification threshold to convert probabilities into classes
• understanding the definition of a decision boundary
• understanding the concept of separability of classes and how it’s related to
dimensionality
• exploring different classification thresholds and their effect on the confusion
matrix
• reviewing typical metrics for evaluating classification algorithms, like true and
false positive rates, precision, and recall
• building ROC and precision-recall curves out of metrics computed for multiple
thresholds
• understanding the reason behind the quirk of losing precision while raising the
classification threshold
• defining the best and worst possible ROC and PR curves
• using the area under the curve to compare different models
Wow! That’s a whole lot of material! Congratulations on finishing yet another big
262 | Chapter 3: A Simple Classification Problem