Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
true_b = 1true_w = 2N = 100# Data Generationnp.random.seed(42)# We divide w by 10bad_w = true_w / 10# And multiply x by 10bad_x = np.random.rand(N, 1) * 10# So, the net effect on y is zero - it is still# the same as beforey = true_b + bad_w * bad_x + (.1 * np.random.randn(N, 1))Then, I performed the same split as before for both original and bad datasets andplotted the training sets side-by-side, as you can see below:# Generates train and validation sets# It uses the same train_idx and val_idx as before,# but it applies to bad_xbad_x_train, y_train = bad_x[train_idx], y[train_idx]bad_x_val, y_val = bad_x[val_idx], y[val_idx]Figure 0.13 - Same data, different scales for feature xStep 4 - Update the Parameters | 49
The only difference between the two plots is the scale of feature x. Its range was[0, 1], now it is [0, 10]. The label y hasn’t changed, and I did not touch true_b.Does this simple scaling have any meaningful impact on our gradient descent?Well, if it hadn’t, I wouldn’t be asking it, right? Let’s compute a new loss surface andcompare to the one we had before.Figure 0.14 - Loss surface—before and after scaling feature x (Obs.: left plot looks a bit differentthan Figure 0.6 because it is centered at the "after" minimum)Look at the contour values of Figure 0.14: The dark blue line was 3.0, and now it is50.0! For the same range of parameter values, loss values are much higher.50 | Chapter 0: Visualizing Gradient Descent
- Page 24 and 25: AcknowledgementsFirst and foremost,
- Page 26 and 27: Frequently Asked Questions (FAQ)Why
- Page 28 and 29: There is yet another advantage of f
- Page 30 and 31: • Classes and methods are written
- Page 32 and 33: What’s Next?It’s time to set up
- Page 34 and 35: After choosing a repository, it wil
- Page 36 and 37: 1. AnacondaIf you don’t have Anac
- Page 38 and 39: 3. PyTorchPyTorch is the coolest de
- Page 40 and 41: (pytorchbook) C:\> conda install py
- Page 42 and 43: (pytorchbook)$ pip install torchviz
- Page 44 and 45: 7. JupyterAfter cloning the reposit
- Page 46 and 47: Part IFundamentals| 21
- Page 48 and 49: notebook. If not, just click on Cha
- Page 50 and 51: Also, let’s say that, on average,
- Page 52 and 53: There is one exception to the "alwa
- Page 54 and 55: Random Initialization1 # Step 0 - I
- Page 56 and 57: Batch, Mini-batch, and Stochastic G
- Page 58 and 59: Outputarray([[-2. , -1.94, -1.88, .
- Page 60 and 61: one matrix for each data point, eac
- Page 62 and 63: Sure, different values of b produce
- Page 64 and 65: Output-3.044811379650508 -1.8337537
- Page 66 and 67: each parameter using the chain rule
- Page 68 and 69: What’s the impact of one update o
- Page 70 and 71: gradients, we know we need to take
- Page 72 and 73: Very High Learning RateWait, it may
- Page 76 and 77: Let’s look at the cross-sections
- Page 78 and 79: Zero Mean and Unit Standard Deviati
- Page 80 and 81: Sure, in the real world, you’ll n
- Page 82 and 83: computing the loss, as shown in the
- Page 84 and 85: • visualizing the effects of usin
- Page 86 and 87: If you’re using Jupyter’s defau
- Page 88 and 89: Notebook Cell 1.1 - Splitting synth
- Page 90 and 91: Step 2# Step 2 - Computing the loss
- Page 92 and 93: Output[0.49671415] [-0.1382643][0.8
- Page 94 and 95: Notebook Cell 1.2 - Implementing gr
- Page 96 and 97: # Sanity Check: do we get the same
- Page 98 and 99: Outputtensor(3.1416)tensor([1, 2, 3
- Page 100 and 101: Outputtensor([[1., 2., 1.],[1., 1.,
- Page 102 and 103: dummy_array = np.array([1, 2, 3])du
- Page 104 and 105: n_cudas = torch.cuda.device_count()
- Page 106 and 107: back_to_numpy = x_train_tensor.nump
- Page 108 and 109: I am assuming you’d like to use y
- Page 110 and 111: Outputtensor([0.1940], device='cuda
- Page 112 and 113: print(error.requires_grad, yhat.req
- Page 114 and 115: Output(tensor([0.], device='cuda:0'
- Page 116 and 117: 56 # need to tell it to let it go..
- Page 118 and 119: computation.If you chose "Local Ins
- Page 120 and 121: Figure 1.6 - Now parameter "b" does
- Page 122 and 123: There are many optimizers: SGD is t
The only difference between the two plots is the scale of feature x. Its range was
[0, 1], now it is [0, 10]. The label y hasn’t changed, and I did not touch true_b.
Does this simple scaling have any meaningful impact on our gradient descent?
Well, if it hadn’t, I wouldn’t be asking it, right? Let’s compute a new loss surface and
compare to the one we had before.
Figure 0.14 - Loss surface—before and after scaling feature x (Obs.: left plot looks a bit different
than Figure 0.6 because it is centered at the "after" minimum)
Look at the contour values of Figure 0.14: The dark blue line was 3.0, and now it is
50.0! For the same range of parameter values, loss values are much higher.
50 | Chapter 0: Visualizing Gradient Descent