Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
Output(tensor([0.], device='cuda:0'),tensor([0.], device='cuda:0'))What does the underscore (_) at the end of the method’s namemean? Do you remember? If not, go back to the previous sectionand find out.So, let’s ditch the manual computation of gradients and use both the backward()and zero_() methods instead.That’s it? Well, pretty much … but there is always a catch, and this time it has to dowith the update of the parameters.Updating Parameters"One does not simply update parameters…"BoromirUnfortunately, our Numpy's code for updating parameters is not enough. Whynot?! Let’s try it out, simply copying and pasting it (this is the first attempt), changingit slightly (second attempt), and then asking PyTorch to back off (yes, it is PyTorch’sfault!).Notebook Cell 1.6 - Updating parameters1 # Sets learning rate - this is "eta" ~ the "n"-like Greek letter2 lr = 0.134 # Step 0 - Initializes parameters "b" and "w" randomly5 torch.manual_seed(42)6 b = torch.randn(1, requires_grad=True, \7 dtype=torch.float, device=device)8 w = torch.randn(1, requires_grad=True, \9 dtype=torch.float, device=device)1011 # Defines number of epochs12 n_epochs = 100013Autograd | 89
14 for epoch in range(n_epochs):15 # Step 1 - Computes model's predicted output - forward pass16 yhat = b + w * x_train_tensor1718 # Step 2 - Computes the loss19 # We are using ALL data points, so this is BATCH gradient20 # descent. How wrong is our model? That's the error!21 error = (yhat - y_train_tensor)22 # It is a regression, so it computes mean squared error (MSE)23 loss = (error ** 2).mean()2425 # Step 3 - Computes gradients for both "b" and "w"26 # parameters. No more manual computation of gradients!27 # b_grad = 2 * error.mean()28 # w_grad = 2 * (x_tensor * error).mean()29 # We just tell PyTorch to work its way BACKWARDS30 # from the specified loss!31 loss.backward()3233 # Step 4 - Updates parameters using gradients and34 # the learning rate. But not so fast...35 # FIRST ATTEMPT - just using the same code as before36 # AttributeError: 'NoneType' object has no attribute 'zero_'37 # b = b - lr * b.grad 138 # w = w - lr * w.grad 139 # print(b) 14041 # SECOND ATTEMPT - using in-place Python assignment42 # RuntimeError: a leaf Variable that requires grad43 # has been used in an in-place operation.44 # b -= lr * b.grad 245 # w -= lr * w.grad 24647 # THIRD ATTEMPT - NO_GRAD for the win!48 # We need to use NO_GRAD to keep the update out of49 # the gradient computation. Why is that? It boils50 # down to the DYNAMIC GRAPH that PyTorch uses...51 with torch.no_grad(): 352 b -= lr * b.grad 353 w -= lr * w.grad 35455 # PyTorch is "clingy" to its computed gradients; we90 | Chapter 1: A Simple Regression Problem
- Page 64 and 65: Output-3.044811379650508 -1.8337537
- Page 66 and 67: each parameter using the chain rule
- Page 68 and 69: What’s the impact of one update o
- Page 70 and 71: gradients, we know we need to take
- Page 72 and 73: Very High Learning RateWait, it may
- Page 74 and 75: true_b = 1true_w = 2N = 100# Data G
- Page 76 and 77: Let’s look at the cross-sections
- Page 78 and 79: Zero Mean and Unit Standard Deviati
- Page 80 and 81: Sure, in the real world, you’ll n
- Page 82 and 83: computing the loss, as shown in the
- Page 84 and 85: • visualizing the effects of usin
- Page 86 and 87: If you’re using Jupyter’s defau
- Page 88 and 89: Notebook Cell 1.1 - Splitting synth
- Page 90 and 91: Step 2# Step 2 - Computing the loss
- Page 92 and 93: Output[0.49671415] [-0.1382643][0.8
- Page 94 and 95: Notebook Cell 1.2 - Implementing gr
- Page 96 and 97: # Sanity Check: do we get the same
- Page 98 and 99: Outputtensor(3.1416)tensor([1, 2, 3
- Page 100 and 101: Outputtensor([[1., 2., 1.],[1., 1.,
- Page 102 and 103: dummy_array = np.array([1, 2, 3])du
- Page 104 and 105: n_cudas = torch.cuda.device_count()
- Page 106 and 107: back_to_numpy = x_train_tensor.nump
- Page 108 and 109: I am assuming you’d like to use y
- Page 110 and 111: Outputtensor([0.1940], device='cuda
- Page 112 and 113: print(error.requires_grad, yhat.req
- Page 116 and 117: 56 # need to tell it to let it go..
- Page 118 and 119: computation.If you chose "Local Ins
- Page 120 and 121: Figure 1.6 - Now parameter "b" does
- Page 122 and 123: There are many optimizers: SGD is t
- Page 124 and 125: 41 optimizer.zero_grad() 34243 prin
- Page 126 and 127: Notebook Cell 1.8 - PyTorch’s los
- Page 128 and 129: Outputarray(0.00804466, dtype=float
- Page 130 and 131: Let’s build a proper (yet simple)
- Page 132 and 133: "What do we need this for?"It turns
- Page 134 and 135: 1 Instantiating a model2 What IS th
- Page 136 and 137: In the __init__() method, we create
- Page 138 and 139: LayersA Linear model can be seen as
- Page 140 and 141: There are MANY different layers tha
- Page 142 and 143: We use magic, just like that:%run -
- Page 144 and 145: • Step 1: compute model’s predi
- Page 146 and 147: RecapFirst of all, congratulations
- Page 148 and 149: Chapter 2Rethinking the Training Lo
- Page 150 and 151: Let’s take a look at the code onc
- Page 152 and 153: Higher-Order FunctionsAlthough this
- Page 154 and 155: def exponentiation_builder(exponent
- Page 156 and 157: Apart from returning the loss value
- Page 158 and 159: Our code should look like this; see
- Page 160 and 161: There is no need to load the whole
- Page 162 and 163: but if we want to get serious about
Output
(tensor([0.], device='cuda:0'),
tensor([0.], device='cuda:0'))
What does the underscore (_) at the end of the method’s name
mean? Do you remember? If not, go back to the previous section
and find out.
So, let’s ditch the manual computation of gradients and use both the backward()
and zero_() methods instead.
That’s it? Well, pretty much … but there is always a catch, and this time it has to do
with the update of the parameters.
Updating Parameters
"One does not simply update parameters…"
Boromir
Unfortunately, our Numpy's code for updating parameters is not enough. Why
not?! Let’s try it out, simply copying and pasting it (this is the first attempt), changing
it slightly (second attempt), and then asking PyTorch to back off (yes, it is PyTorch’s
fault!).
Notebook Cell 1.6 - Updating parameters
1 # Sets learning rate - this is "eta" ~ the "n"-like Greek letter
2 lr = 0.1
3
4 # Step 0 - Initializes parameters "b" and "w" randomly
5 torch.manual_seed(42)
6 b = torch.randn(1, requires_grad=True, \
7 dtype=torch.float, device=device)
8 w = torch.randn(1, requires_grad=True, \
9 dtype=torch.float, device=device)
10
11 # Defines number of epochs
12 n_epochs = 1000
13
Autograd | 89