Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

As you can see, in PyTorch the coefficient of leakage is called negative_slope, witha default value of 0.01."Any particular reason to choose 0.01 as the coefficient of leakage?"Not really, it is just a small number that enables the update of the weights, whichleads to another question: Why not try a different coefficient? Sure enough,people started using other coefficients to improve performance."Maybe the model can learn the coefficient of leakage too?"Sure it can!Parametric ReLU (PReLU)The Parametric ReLU is the natural evolution of the Leaky ReLU: Instead ofarbitrarily choosing a coefficient of leakage (such as 0.01), let’s make it aparameter (a). Hopefully, the model will learn how to prevent dead neurons, orhow to bring them back to life (zombie neurons?!). Jokes aside, that’s an ingenioussolution to the problem.Figure 4.15 - Parametric ReLU function and its gradientAs you can see in Figure 4.15, the slope on the left-hand side is much larger now,0.25 to be precise, PyTorch’s default value for the parameter a.Activation Functions | 319

We can set the parameter a using the functional version (argument weight inF.prelu()):dummy_z = torch.tensor([-3., 0., 3.])F.prelu(dummy_z, weight=torch.tensor(0.25))Outputtensor([-0.7500, 0.0000, 3.0000])But, in the regular module (nn.PReLU), it doesn’t make sense to set it, since it is goingto be learned, right? We can still set the initial value for it, though:nn.PReLU(init=0.25)(dummy_z)Outputtensor([-0.7500, 0.0000, 3.0000], grad_fn=<PreluBackward>)Did you notice the grad_fn attribute on the resulting tensor? It shouldn’t be asurprise, after all—where there is learning, there is a gradient.Deep ModelNow that we’ve learned that activation functions break the equivalence to ashallow model, let’s use them to transform our former deep-ish model into a realdeep model. It has the same architecture as the previous model, except for theactivation functions applied to the outputs of the hidden layers. Here is thediagram of the updated model.320 | Chapter 4: Classifying Images

We can set the parameter a using the functional version (argument weight in

F.prelu()):

dummy_z = torch.tensor([-3., 0., 3.])

F.prelu(dummy_z, weight=torch.tensor(0.25))

Output

tensor([-0.7500, 0.0000, 3.0000])

But, in the regular module (nn.PReLU), it doesn’t make sense to set it, since it is going

to be learned, right? We can still set the initial value for it, though:

nn.PReLU(init=0.25)(dummy_z)

Output

tensor([-0.7500, 0.0000, 3.0000], grad_fn=<PreluBackward>)

Did you notice the grad_fn attribute on the resulting tensor? It shouldn’t be a

surprise, after all—where there is learning, there is a gradient.

Deep Model

Now that we’ve learned that activation functions break the equivalence to a

shallow model, let’s use them to transform our former deep-ish model into a real

deep model. It has the same architecture as the previous model, except for the

activation functions applied to the outputs of the hidden layers. Here is the

diagram of the updated model.

320 | Chapter 4: Classifying Images

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!