Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

is only 0.25 (for z = 0) and that it gets close to zero as the absolute value of zreaches a value of five.Also, remember that the activation values of any given layer are the inputs of thefollowing layer and, given the range of the sigmoid, the activation values are goingto be centered around 0.5, instead of zero. This means that, even if we normalizeour inputs to feed the first layer, it will not be the case anymore for the otherlayers."Why does it matter if the outputs are centered around zero or not?"In previous chapters, we standardized features (zero mean, unit standarddeviation) to improve the performance of gradient descent. The same reasoningapplies here since the outputs of any given layer are the inputs of the followinglayer. There is actually more to it, and we’ll briefly touch on this topic again in theReLU activation function when talking about the "internal covariate shift."PyTorch has the sigmoid function available in two flavors, as we’ve already seen itin Chapter 3: torch.sigmoid() and nn.Sigmoid. The first one is a simple function,and the second one is a full-fledged class inherited from nn.Module, thus being, forall intents and purposes, a model on its own.dummy_z = torch.tensor([-3., 0., 3.])torch.sigmoid(dummy_z)Outputtensor([0.0474, 0.5000, 0.9526])nn.Sigmoid()(dummy_z)Outputtensor([0.0474, 0.5000, 0.9526])Activation Functions | 313

Hyperbolic Tangent (TanH)The hyperbolic tangent activation function was the evolution of the sigmoid, as itsoutputs are values with a zero mean, different from its predecessor.Figure 4.12 - TanH function and its gradientAs you can see in Figure 4.12, the TanH activation function "squashes" the inputvalues into the range (-1, 1). Therefore, being centered at zero, the activationvalues are already (somewhat) normalized inputs for the next layer, making thehyperbolic tangent a better activation function than the sigmoid.Regarding the gradient, it has a much larger peak value of 1.0 (again, for z = 0), butits decrease is even faster, approaching zero to absolute values of z as low as three.This is the underlying cause of what is referred to as the problem of vanishinggradients, which causes the training of the network to be progressively slower.Just like the sigmoid function, the hyperbolic tangent also comes in two flavors:torch.tanh() and nn.Tanh.dummy_z = torch.tensor([-3., 0., 3.])torch.tanh(dummy_z)Outputtensor([-0.9951, 0.0000, 0.9951])314 | Chapter 4: Classifying Images

Hyperbolic Tangent (TanH)

The hyperbolic tangent activation function was the evolution of the sigmoid, as its

outputs are values with a zero mean, different from its predecessor.

Figure 4.12 - TanH function and its gradient

As you can see in Figure 4.12, the TanH activation function "squashes" the input

values into the range (-1, 1). Therefore, being centered at zero, the activation

values are already (somewhat) normalized inputs for the next layer, making the

hyperbolic tangent a better activation function than the sigmoid.

Regarding the gradient, it has a much larger peak value of 1.0 (again, for z = 0), but

its decrease is even faster, approaching zero to absolute values of z as low as three.

This is the underlying cause of what is referred to as the problem of vanishing

gradients, which causes the training of the network to be progressively slower.

Just like the sigmoid function, the hyperbolic tangent also comes in two flavors:

torch.tanh() and nn.Tanh.

dummy_z = torch.tensor([-3., 0., 3.])

torch.tanh(dummy_z)

Output

tensor([-0.9951, 0.0000, 0.9951])

314 | Chapter 4: Classifying Images

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!