Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
In the model above, the sigmoid function isn’t an activationfunction: It is there only to convert logits into probabilities.You may be wondering: "Can I mix different activation functions inthe same model?" It is definitely possible, but it is also highlyunusual. In general, models are built using the same activationfunction across all hidden layers. The ReLU or one of its variantsare the most common choices because they lead to fastertraining, while TanH and sigmoid activation functions are used invery specific cases (recurrent neural networks, for instance).But, more important, since it can perform two transformations now (andactivations, obviously), this is how the model is working:Figure B.11 - Activated feature space—deeper modelFirst of all, these plots were built using a model trained for 15 epochs only(compared to 160 epochs in all previous models). Adding another hidden layersurely makes the model more powerful, thus leading to a satisfactory solution in amuch shorter amount of time."Great, let’s just make ridiculously deep models and solve everything!Right?"Not so fast! As models grow deeper, other issues start popping up, like the(in)famous vanishing gradients problem. We’ll get back to that later. For now,adding one or two extra layers is likely safe, but please don’t get carried away withit.More Dimensions, More BoundariesWe can also make a model more powerful by adding more units to a hidden layer.By doing this, we’re increasing dimensionality; that is, mapping our twodimensionalfeature space into a, say, ten-dimensional feature space (which wecannot visualize). But we can map it back to two dimensions in a second hiddenMore Dimensions, More Boundaries | 341
layer with the sole purpose of taking a peek at it.I am skipping the diagram, but here is the code:fs_model = nn.Sequential()fs_model.add_module('hidden0', nn.Linear(2, 10))fs_model.add_module('activation0', nn.PReLU())fs_model.add_module('hidden1', nn.Linear(10, 2))fs_model.add_module('output', nn.Linear(2, 1))fs_model.add_module('sigmoid', nn.Sigmoid())Its first hidden layer has ten units now and uses PReLU as an activation function.The second hidden layer, however, has no activation function: This layer isworking as a projection of 10D into 2D, such that the decision boundary can bevisualized in two dimensions.In practice, this extra hidden layer is redundant. Remember,without an activation function between two layers, they areequivalent to a single layer. We are doing this here with the solepurpose of visualizing it.And here are the results, after training it for ten epochs only.Figure B.12 - Activated feature space—wider modelBy mapping the original feature space into some crazy ten-dimensional one, wemake it easier for our model to figure out a way of separating the data. Remember,342 | Bonus Chapter: Feature Space
- Page 316 and 317: The minority class should have the
- Page 318 and 319: train_loader = DataLoader(dataset=t
- Page 320 and 321: implemented in Chapter 2.1? Let’s
- Page 322 and 323: Let’s take one mini-batch of imag
- Page 324 and 325: What does our model look like? Visu
- Page 326 and 327: Model TrainingLet’s train our mod
- Page 328 and 329: preceding hidden layer to compute i
- Page 330 and 331: fig = sbs_nn.plot_losses()Figure 4.
- Page 332 and 333: Equation 4.2 - Equivalence of deep
- Page 334 and 335: w_nn_equiv = w_nn_output.mm(w_nn_hi
- Page 336 and 337: Weights as PixelsDuring data prepar
- Page 338 and 339: is only 0.25 (for z = 0) and that i
- Page 340 and 341: nn.Tanh()(dummy_z)Outputtensor([-0.
- Page 342 and 343: dummy_z = torch.tensor([-3., 0., 3.
- Page 344 and 345: As you can see, in PyTorch the coef
- Page 346 and 347: Figure 4.16 - Deep model (for real)
- Page 348 and 349: Figure 4.18 - Losses (before and af
- Page 350 and 351: Equation 4.3 - Activation functions
- Page 352 and 353: Helper Function #41 def index_split
- Page 354 and 355: Model Configuration1 # Sets learnin
- Page 356 and 357: Bonus ChapterFeature SpaceThis chap
- Page 358 and 359: Affine TransformationsAn affine tra
- Page 360 and 361: Figure B.3 - Annotated model diagra
- Page 362 and 363: Figure B.5 - In the beginning…But
- Page 364 and 365: OK, now we can clearly see a differ
- Page 368 and 369: the more dimensions, the more separ
- Page 370 and 371: import randomimport numpy as npfrom
- Page 372 and 373: identity = np.array([[[[0, 0, 0],[0
- Page 374 and 375: Figure 5.4 - Striding the image, on
- Page 376 and 377: Output-----------------------------
- Page 378 and 379: Outputtensor([[[[9., 5., 0., 7.],[0
- Page 380 and 381: OutputParameter containing:tensor([
- Page 382 and 383: Moreover, notice that if we were to
- Page 384 and 385: In code, as usual, PyTorch gives us
- Page 386 and 387: Outputtensor([[[[5., 5., 0., 8., 7.
- Page 388 and 389: edge = np.array([[[[0, 1, 0],[1, -4
- Page 390 and 391: A pooling kernel of two-by-two resu
- Page 392 and 393: Outputtensor([[22., 23., 11., 24.,
- Page 394 and 395: Figure 5.15 - LeNet-5 architectureS
- Page 396 and 397: • second block: produces 16-chann
- Page 398 and 399: Transformed Dataset1 class Transfor
- Page 400 and 401: LossNew problem, new loss. Since we
- Page 402 and 403: Outputtensor([4.0000, 1.0000, 0.500
- Page 404 and 405: The loss only considers the predict
- Page 406 and 407: Outputtensor([[-1.5229, -0.3146, -2
- Page 408 and 409: IMPORTANT: I can’t stress this en
- Page 410 and 411: figures at the beginning of this ch
- Page 412 and 413: The three units in the output layer
- Page 414 and 415: StepByStep Method@staticmethoddef _
layer with the sole purpose of taking a peek at it.
I am skipping the diagram, but here is the code:
fs_model = nn.Sequential()
fs_model.add_module('hidden0', nn.Linear(2, 10))
fs_model.add_module('activation0', nn.PReLU())
fs_model.add_module('hidden1', nn.Linear(10, 2))
fs_model.add_module('output', nn.Linear(2, 1))
fs_model.add_module('sigmoid', nn.Sigmoid())
Its first hidden layer has ten units now and uses PReLU as an activation function.
The second hidden layer, however, has no activation function: This layer is
working as a projection of 10D into 2D, such that the decision boundary can be
visualized in two dimensions.
In practice, this extra hidden layer is redundant. Remember,
without an activation function between two layers, they are
equivalent to a single layer. We are doing this here with the sole
purpose of visualizing it.
And here are the results, after training it for ten epochs only.
Figure B.12 - Activated feature space—wider model
By mapping the original feature space into some crazy ten-dimensional one, we
make it easier for our model to figure out a way of separating the data. Remember,
342 | Bonus Chapter: Feature Space