Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
is only 0.25 (for z = 0) and that it gets close to zero as the absolute value of zreaches a value of five.Also, remember that the activation values of any given layer are the inputs of thefollowing layer and, given the range of the sigmoid, the activation values are goingto be centered around 0.5, instead of zero. This means that, even if we normalizeour inputs to feed the first layer, it will not be the case anymore for the otherlayers."Why does it matter if the outputs are centered around zero or not?"In previous chapters, we standardized features (zero mean, unit standarddeviation) to improve the performance of gradient descent. The same reasoningapplies here since the outputs of any given layer are the inputs of the followinglayer. There is actually more to it, and we’ll briefly touch on this topic again in theReLU activation function when talking about the "internal covariate shift."PyTorch has the sigmoid function available in two flavors, as we’ve already seen itin Chapter 3: torch.sigmoid() and nn.Sigmoid. The first one is a simple function,and the second one is a full-fledged class inherited from nn.Module, thus being, forall intents and purposes, a model on its own.dummy_z = torch.tensor([-3., 0., 3.])torch.sigmoid(dummy_z)Outputtensor([0.0474, 0.5000, 0.9526])nn.Sigmoid()(dummy_z)Outputtensor([0.0474, 0.5000, 0.9526])Activation Functions | 313
Hyperbolic Tangent (TanH)The hyperbolic tangent activation function was the evolution of the sigmoid, as itsoutputs are values with a zero mean, different from its predecessor.Figure 4.12 - TanH function and its gradientAs you can see in Figure 4.12, the TanH activation function "squashes" the inputvalues into the range (-1, 1). Therefore, being centered at zero, the activationvalues are already (somewhat) normalized inputs for the next layer, making thehyperbolic tangent a better activation function than the sigmoid.Regarding the gradient, it has a much larger peak value of 1.0 (again, for z = 0), butits decrease is even faster, approaching zero to absolute values of z as low as three.This is the underlying cause of what is referred to as the problem of vanishinggradients, which causes the training of the network to be progressively slower.Just like the sigmoid function, the hyperbolic tangent also comes in two flavors:torch.tanh() and nn.Tanh.dummy_z = torch.tensor([-3., 0., 3.])torch.tanh(dummy_z)Outputtensor([-0.9951, 0.0000, 0.9951])314 | Chapter 4: Classifying Images
- Page 288 and 289: step in your journey! What’s next
- Page 290 and 291: Chapter 4Classifying ImagesSpoilers
- Page 292 and 293: Data GenerationOur images are quite
- Page 294 and 295: Images and ChannelsIn case you’re
- Page 296 and 297: image_rgb = np.stack([image_r, imag
- Page 298 and 299: That’s fairly straightforward; we
- Page 300 and 301: • Transformations based on Tensor
- Page 302 and 303: position of an object in a picture
- Page 304 and 305: Outputtensor([[[0., 0., 0., 1., 0.]
- Page 306 and 307: Outputtensor([[[-1., -1., -1., 1.,
- Page 308 and 309: We can convert the former into the
- Page 310 and 311: composer = Compose([RandomHorizonta
- Page 312 and 313: Output<torch.utils.data.dataset.Sub
- Page 314 and 315: train_composer = Compose([RandomHor
- Page 316 and 317: The minority class should have the
- Page 318 and 319: train_loader = DataLoader(dataset=t
- Page 320 and 321: implemented in Chapter 2.1? Let’s
- Page 322 and 323: Let’s take one mini-batch of imag
- Page 324 and 325: What does our model look like? Visu
- Page 326 and 327: Model TrainingLet’s train our mod
- Page 328 and 329: preceding hidden layer to compute i
- Page 330 and 331: fig = sbs_nn.plot_losses()Figure 4.
- Page 332 and 333: Equation 4.2 - Equivalence of deep
- Page 334 and 335: w_nn_equiv = w_nn_output.mm(w_nn_hi
- Page 336 and 337: Weights as PixelsDuring data prepar
- Page 340 and 341: nn.Tanh()(dummy_z)Outputtensor([-0.
- Page 342 and 343: dummy_z = torch.tensor([-3., 0., 3.
- Page 344 and 345: As you can see, in PyTorch the coef
- Page 346 and 347: Figure 4.16 - Deep model (for real)
- Page 348 and 349: Figure 4.18 - Losses (before and af
- Page 350 and 351: Equation 4.3 - Activation functions
- Page 352 and 353: Helper Function #41 def index_split
- Page 354 and 355: Model Configuration1 # Sets learnin
- Page 356 and 357: Bonus ChapterFeature SpaceThis chap
- Page 358 and 359: Affine TransformationsAn affine tra
- Page 360 and 361: Figure B.3 - Annotated model diagra
- Page 362 and 363: Figure B.5 - In the beginning…But
- Page 364 and 365: OK, now we can clearly see a differ
- Page 366 and 367: In the model above, the sigmoid fun
- Page 368 and 369: the more dimensions, the more separ
- Page 370 and 371: import randomimport numpy as npfrom
- Page 372 and 373: identity = np.array([[[[0, 0, 0],[0
- Page 374 and 375: Figure 5.4 - Striding the image, on
- Page 376 and 377: Output-----------------------------
- Page 378 and 379: Outputtensor([[[[9., 5., 0., 7.],[0
- Page 380 and 381: OutputParameter containing:tensor([
- Page 382 and 383: Moreover, notice that if we were to
- Page 384 and 385: In code, as usual, PyTorch gives us
- Page 386 and 387: Outputtensor([[[[5., 5., 0., 8., 7.
Hyperbolic Tangent (TanH)
The hyperbolic tangent activation function was the evolution of the sigmoid, as its
outputs are values with a zero mean, different from its predecessor.
Figure 4.12 - TanH function and its gradient
As you can see in Figure 4.12, the TanH activation function "squashes" the input
values into the range (-1, 1). Therefore, being centered at zero, the activation
values are already (somewhat) normalized inputs for the next layer, making the
hyperbolic tangent a better activation function than the sigmoid.
Regarding the gradient, it has a much larger peak value of 1.0 (again, for z = 0), but
its decrease is even faster, approaching zero to absolute values of z as low as three.
This is the underlying cause of what is referred to as the problem of vanishing
gradients, which causes the training of the network to be progressively slower.
Just like the sigmoid function, the hyperbolic tangent also comes in two flavors:
torch.tanh() and nn.Tanh.
dummy_z = torch.tensor([-3., 0., 3.])
torch.tanh(dummy_z)
Output
tensor([-0.9951, 0.0000, 0.9951])
314 | Chapter 4: Classifying Images