Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
discussing it, let me illustrate it.Ball Dataset and Block ModelLet’s use a dataset of 1,000 random points drawn from a ten-dimensional ball (thisseems fancier than it actually is; you can think of it as a dataset with 1,000 pointswith ten features each) such that each feature has zero mean and unit standarddeviation. In this dataset, points situated within half of the radius of the ball arelabeled as negative cases, while the remaining points are labeled positive cases. It is afamiliar binary classification task.Data Generation1 X, y = load_data(n_points=1000, n_dims=10)Next, we can use these data points to create a dataset and a data loader (no minibatchesthis time):Data Preparation1 ball_dataset = TensorDataset(2 torch.as_tensor(X).float(), torch.as_tensor(y).float()3 )4 ball_loader = DataLoader(ball_dataset, batch_size=len(X))The data preparation part is done. What about the model configuration? Toillustrate the vanishing gradients problem, we need a deeper model than the oneswe’ve built so far. Let’s call it the "block" model: It is a block of several hiddenlayers (and activation functions) stacked together, every layer containing the samenumber of hidden units (neurons).Instead of building the model manually, I’ve created a function, build_model(), thatallows us to configure a model like that. Its main arguments are the number offeatures, the number of layers, the number of hidden units per layer, the activationfunction to be placed after each hidden layer, and if it should add a batchnormalization layer after every activation function or not:Vanishing and Exploding Gradients | 563
Model Configuration (1)1 torch.manual_seed(11)2 n_features = X.shape[1]3 n_layers = 54 hidden_units = 1005 activation_fn = nn.ReLU6 model = build_model(7 n_features, n_layers, hidden_units,8 activation_fn, use_bn=False9 )Let’s check the model out:print(model)OutputSequential((h1): Linear(in_features=10, out_features=100, bias=True)(a1): ReLU()(h2): Linear(in_features=100, out_features=100, bias=True)(a2): ReLU()(h3): Linear(in_features=100, out_features=100, bias=True)(a3): ReLU()(h4): Linear(in_features=100, out_features=100, bias=True)(a4): ReLU()(h5): Linear(in_features=100, out_features=100, bias=True)(a5): ReLU()(o): Linear(in_features=100, out_features=1, bias=True))Exactly as expected! The layers are labeled sequentially, from one up to thenumber of layers, and have prefixes according to their roles: h for linear layers, a foractivation functions, bn for batch normalization layers, and o for the last (output)layer.We’re only missing a loss function and an optimizer, and then we’re done with themodel configuration part too:564 | Extra Chapter: Vanishing and Exploding Gradients
- Page 538 and 539: Model Size Classifier Layer(s) Repl
- Page 540 and 541: Model TrainingWe have everything se
- Page 542 and 543: "Removing" the Top Layer1 alex.clas
- Page 544 and 545: torch.save(train_preproc.tensors, '
- Page 546 and 547: Outputtensor([[109, 124],[124, 124]
- Page 548 and 549: Model Configuration1 optimizer_mode
- Page 550 and 551: Figure 7.4 - 1x1 convolutionThe inp
- Page 552 and 553: The weights used by PIL are 0.299 f
- Page 554 and 555: • reduce the number of output cha
- Page 556 and 557: The constructor method defines the
- Page 558 and 559: Does it sound familiar? That’s wh
- Page 560 and 561: and w to represent these parameters
- Page 562 and 563: A mini-batch of size 64 is small en
- Page 564 and 565: normed1 = batch_normalizer(batch1[0
- Page 566 and 567: OutputOrderedDict([('running_mean',
- Page 568 and 569: OutputOrderedDict([('running_mean',
- Page 570 and 571: batch_normalizer = nn.BatchNorm2d(n
- Page 572 and 573: torch.manual_seed(23)dummy_points =
- Page 574 and 575: np.concatenate([dummy_points[:5].nu
- Page 576 and 577: Another advantage of these shortcut
- Page 578 and 579: It should be pretty clear, except f
- Page 580 and 581: Data Preparation1 # ImageNet statis
- Page 582 and 583: Data Preparation — Preprocessing1
- Page 584 and 585: • freezing the layers of the mode
- Page 586 and 587: Extra ChapterVanishing and Explodin
- Page 590 and 591: Model Configuration (2)1 loss_fn =
- Page 592 and 593: weights. If done properly, the init
- Page 594 and 595: just did), or, if you are training
- Page 596 and 597: Figure E.3 - The effect of batch no
- Page 598 and 599: Model Configuration1 torch.manual_s
- Page 600 and 601: torch.manual_seed(42)parm = nn.Para
- Page 602 and 603: (and only if) the norm exceeds the
- Page 604 and 605: if callable(self.clipping): 1self.c
- Page 606 and 607: Moreover, let’s use a ten times h
- Page 608 and 609: Clipping with HooksFirst, we reset
- Page 610 and 611: • visualizing the difference betw
- Page 612 and 613: Chapter 8SequencesSpoilersIn this c
- Page 614 and 615: Before shuffling, the pixels were o
- Page 616 and 617: And then let’s visualize the firs
- Page 618 and 619: sequence so far, and a data point f
- Page 620 and 621: Considering this, the not "unrolled
- Page 622 and 623: linear_input = nn.Linear(n_features
- Page 624 and 625: Outputtensor([[0.3924, 0.8146]], gr
- Page 626 and 627: Now we’re talking! The last hidde
- Page 628 and 629: Let’s take a look at the RNN’s
- Page 630 and 631: ◦ The initial hidden state, which
- Page 632 and 633: batch_first argument to True so we
- Page 634 and 635: OutputOrderedDict([('weight_ih_l0',
- Page 636 and 637: out, hidden = rnn_stacked(x)out, hi
Model Configuration (1)
1 torch.manual_seed(11)
2 n_features = X.shape[1]
3 n_layers = 5
4 hidden_units = 100
5 activation_fn = nn.ReLU
6 model = build_model(
7 n_features, n_layers, hidden_units,
8 activation_fn, use_bn=False
9 )
Let’s check the model out:
print(model)
Output
Sequential(
(h1): Linear(in_features=10, out_features=100, bias=True)
(a1): ReLU()
(h2): Linear(in_features=100, out_features=100, bias=True)
(a2): ReLU()
(h3): Linear(in_features=100, out_features=100, bias=True)
(a3): ReLU()
(h4): Linear(in_features=100, out_features=100, bias=True)
(a4): ReLU()
(h5): Linear(in_features=100, out_features=100, bias=True)
(a5): ReLU()
(o): Linear(in_features=100, out_features=1, bias=True)
)
Exactly as expected! The layers are labeled sequentially, from one up to the
number of layers, and have prefixes according to their roles: h for linear layers, a for
activation functions, bn for batch normalization layers, and o for the last (output)
layer.
We’re only missing a loss function and an optimizer, and then we’re done with the
model configuration part too:
564 | Extra Chapter: Vanishing and Exploding Gradients