Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
As you can see, in PyTorch the coefficient of leakage is called negative_slope, witha default value of 0.01."Any particular reason to choose 0.01 as the coefficient of leakage?"Not really, it is just a small number that enables the update of the weights, whichleads to another question: Why not try a different coefficient? Sure enough,people started using other coefficients to improve performance."Maybe the model can learn the coefficient of leakage too?"Sure it can!Parametric ReLU (PReLU)The Parametric ReLU is the natural evolution of the Leaky ReLU: Instead ofarbitrarily choosing a coefficient of leakage (such as 0.01), let’s make it aparameter (a). Hopefully, the model will learn how to prevent dead neurons, orhow to bring them back to life (zombie neurons?!). Jokes aside, that’s an ingenioussolution to the problem.Figure 4.15 - Parametric ReLU function and its gradientAs you can see in Figure 4.15, the slope on the left-hand side is much larger now,0.25 to be precise, PyTorch’s default value for the parameter a.Activation Functions | 319
We can set the parameter a using the functional version (argument weight inF.prelu()):dummy_z = torch.tensor([-3., 0., 3.])F.prelu(dummy_z, weight=torch.tensor(0.25))Outputtensor([-0.7500, 0.0000, 3.0000])But, in the regular module (nn.PReLU), it doesn’t make sense to set it, since it is goingto be learned, right? We can still set the initial value for it, though:nn.PReLU(init=0.25)(dummy_z)Outputtensor([-0.7500, 0.0000, 3.0000], grad_fn=<PreluBackward>)Did you notice the grad_fn attribute on the resulting tensor? It shouldn’t be asurprise, after all—where there is learning, there is a gradient.Deep ModelNow that we’ve learned that activation functions break the equivalence to ashallow model, let’s use them to transform our former deep-ish model into a realdeep model. It has the same architecture as the previous model, except for theactivation functions applied to the outputs of the hidden layers. Here is thediagram of the updated model.320 | Chapter 4: Classifying Images
- Page 294 and 295: Images and ChannelsIn case you’re
- Page 296 and 297: image_rgb = np.stack([image_r, imag
- Page 298 and 299: That’s fairly straightforward; we
- Page 300 and 301: • Transformations based on Tensor
- Page 302 and 303: position of an object in a picture
- Page 304 and 305: Outputtensor([[[0., 0., 0., 1., 0.]
- Page 306 and 307: Outputtensor([[[-1., -1., -1., 1.,
- Page 308 and 309: We can convert the former into the
- Page 310 and 311: composer = Compose([RandomHorizonta
- Page 312 and 313: Output<torch.utils.data.dataset.Sub
- Page 314 and 315: train_composer = Compose([RandomHor
- Page 316 and 317: The minority class should have the
- Page 318 and 319: train_loader = DataLoader(dataset=t
- Page 320 and 321: implemented in Chapter 2.1? Let’s
- Page 322 and 323: Let’s take one mini-batch of imag
- Page 324 and 325: What does our model look like? Visu
- Page 326 and 327: Model TrainingLet’s train our mod
- Page 328 and 329: preceding hidden layer to compute i
- Page 330 and 331: fig = sbs_nn.plot_losses()Figure 4.
- Page 332 and 333: Equation 4.2 - Equivalence of deep
- Page 334 and 335: w_nn_equiv = w_nn_output.mm(w_nn_hi
- Page 336 and 337: Weights as PixelsDuring data prepar
- Page 338 and 339: is only 0.25 (for z = 0) and that i
- Page 340 and 341: nn.Tanh()(dummy_z)Outputtensor([-0.
- Page 342 and 343: dummy_z = torch.tensor([-3., 0., 3.
- Page 346 and 347: Figure 4.16 - Deep model (for real)
- Page 348 and 349: Figure 4.18 - Losses (before and af
- Page 350 and 351: Equation 4.3 - Activation functions
- Page 352 and 353: Helper Function #41 def index_split
- Page 354 and 355: Model Configuration1 # Sets learnin
- Page 356 and 357: Bonus ChapterFeature SpaceThis chap
- Page 358 and 359: Affine TransformationsAn affine tra
- Page 360 and 361: Figure B.3 - Annotated model diagra
- Page 362 and 363: Figure B.5 - In the beginning…But
- Page 364 and 365: OK, now we can clearly see a differ
- Page 366 and 367: In the model above, the sigmoid fun
- Page 368 and 369: the more dimensions, the more separ
- Page 370 and 371: import randomimport numpy as npfrom
- Page 372 and 373: identity = np.array([[[[0, 0, 0],[0
- Page 374 and 375: Figure 5.4 - Striding the image, on
- Page 376 and 377: Output-----------------------------
- Page 378 and 379: Outputtensor([[[[9., 5., 0., 7.],[0
- Page 380 and 381: OutputParameter containing:tensor([
- Page 382 and 383: Moreover, notice that if we were to
- Page 384 and 385: In code, as usual, PyTorch gives us
- Page 386 and 387: Outputtensor([[[[5., 5., 0., 8., 7.
- Page 388 and 389: edge = np.array([[[[0, 1, 0],[1, -4
- Page 390 and 391: A pooling kernel of two-by-two resu
- Page 392 and 393: Outputtensor([[22., 23., 11., 24.,
We can set the parameter a using the functional version (argument weight in
F.prelu()):
dummy_z = torch.tensor([-3., 0., 3.])
F.prelu(dummy_z, weight=torch.tensor(0.25))
Output
tensor([-0.7500, 0.0000, 3.0000])
But, in the regular module (nn.PReLU), it doesn’t make sense to set it, since it is going
to be learned, right? We can still set the initial value for it, though:
nn.PReLU(init=0.25)(dummy_z)
Output
tensor([-0.7500, 0.0000, 3.0000], grad_fn=<PreluBackward>)
Did you notice the grad_fn attribute on the resulting tensor? It shouldn’t be a
surprise, after all—where there is learning, there is a gradient.
Deep Model
Now that we’ve learned that activation functions break the equivalence to a
shallow model, let’s use them to transform our former deep-ish model into a real
deep model. It has the same architecture as the previous model, except for the
activation functions applied to the outputs of the hidden layers. Here is the
diagram of the updated model.
320 | Chapter 4: Classifying Images