Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub
Output[0.49671415] [-0.1382643][0.80119529] [0.04511107]Step 5 - Rinse and Repeat!Now we use the updated parameters to go back to Step 1 and restart the process.Definition of EpochAn epoch is complete whenever every point in the training set(N) has already been used in all steps: forward pass, computingloss, computing gradients, and updating parameters.During one epoch, we perform at least one update, but no morethan N updates.The number of updates (N/n) will depend on the type of gradientdescent being used:• For batch (n = N) gradient descent, this is trivial, as it uses allpoints for computing the loss—one epoch is the same as oneupdate.• For stochastic (n = 1) gradient descent, one epoch means Nupdates, since every individual data point is used to performan update.• For mini-batch (of size n), one epoch has N/n updates, since amini-batch of n data points is used to perform an update.Repeating this process over and over for many epochs is, in a nutshell, training amodel.Linear Regression in NumpyIt’s time to implement our linear regression model using gradient descent andNumpy only.Linear Regression in Numpy | 67
"Wait a minute … I thought this book was about PyTorch!" Yes, it is,but this serves two purposes: first, to introduce the structure ofour task, which will remain largely the same and, second, to showyou the main pain points so you can fully appreciate how muchPyTorch makes your life easier :-)For training a model, there is a first initialization step (line numbers refer toNotebook Cell 1.2 code below):• Random initialization of parameters / weights (we have only two, b andw)—lines 3 and 4• Initialization of hyper-parameters (in our case, only learning rate and number ofepochs)—lines 9 and 11Make sure to always initialize your random seed to ensure thereproducibility of your results. As usual, the random seed is 42 [42] ,the (second) least random [43] of all random seeds one couldpossibly choose.For each epoch, there are four training steps (line numbers refer to Notebook Cell1.2 code below):• Compute model’s predictions—this is the forward pass—line 15• Compute the loss, using predictions and labels and the appropriate loss functionfor the task at hand—lines 20 and 22• Compute the gradients for every parameter—lines 25 and 26• Update the parameters—lines 30 and 31For now, we will be using batch gradient descent only, meaning, we’ll use all datapoints for each one of the four steps above. It also means that going once throughall of the steps is already one epoch. Then, if we want to train our model over 1,000epochs, we just need to add a single loop.In Chapter 2, we’ll introduce mini-batch gradient descent, andthen we’ll have to include a second inner loop.68 | Chapter 1: A Simple Regression Problem
- Page 42 and 43: (pytorchbook)$ pip install torchviz
- Page 44 and 45: 7. JupyterAfter cloning the reposit
- Page 46 and 47: Part IFundamentals| 21
- Page 48 and 49: notebook. If not, just click on Cha
- Page 50 and 51: Also, let’s say that, on average,
- Page 52 and 53: There is one exception to the "alwa
- Page 54 and 55: Random Initialization1 # Step 0 - I
- Page 56 and 57: Batch, Mini-batch, and Stochastic G
- Page 58 and 59: Outputarray([[-2. , -1.94, -1.88, .
- Page 60 and 61: one matrix for each data point, eac
- Page 62 and 63: Sure, different values of b produce
- Page 64 and 65: Output-3.044811379650508 -1.8337537
- Page 66 and 67: each parameter using the chain rule
- Page 68 and 69: What’s the impact of one update o
- Page 70 and 71: gradients, we know we need to take
- Page 72 and 73: Very High Learning RateWait, it may
- Page 74 and 75: true_b = 1true_w = 2N = 100# Data G
- Page 76 and 77: Let’s look at the cross-sections
- Page 78 and 79: Zero Mean and Unit Standard Deviati
- Page 80 and 81: Sure, in the real world, you’ll n
- Page 82 and 83: computing the loss, as shown in the
- Page 84 and 85: • visualizing the effects of usin
- Page 86 and 87: If you’re using Jupyter’s defau
- Page 88 and 89: Notebook Cell 1.1 - Splitting synth
- Page 90 and 91: Step 2# Step 2 - Computing the loss
- Page 94 and 95: Notebook Cell 1.2 - Implementing gr
- Page 96 and 97: # Sanity Check: do we get the same
- Page 98 and 99: Outputtensor(3.1416)tensor([1, 2, 3
- Page 100 and 101: Outputtensor([[1., 2., 1.],[1., 1.,
- Page 102 and 103: dummy_array = np.array([1, 2, 3])du
- Page 104 and 105: n_cudas = torch.cuda.device_count()
- Page 106 and 107: back_to_numpy = x_train_tensor.nump
- Page 108 and 109: I am assuming you’d like to use y
- Page 110 and 111: Outputtensor([0.1940], device='cuda
- Page 112 and 113: print(error.requires_grad, yhat.req
- Page 114 and 115: Output(tensor([0.], device='cuda:0'
- Page 116 and 117: 56 # need to tell it to let it go..
- Page 118 and 119: computation.If you chose "Local Ins
- Page 120 and 121: Figure 1.6 - Now parameter "b" does
- Page 122 and 123: There are many optimizers: SGD is t
- Page 124 and 125: 41 optimizer.zero_grad() 34243 prin
- Page 126 and 127: Notebook Cell 1.8 - PyTorch’s los
- Page 128 and 129: Outputarray(0.00804466, dtype=float
- Page 130 and 131: Let’s build a proper (yet simple)
- Page 132 and 133: "What do we need this for?"It turns
- Page 134 and 135: 1 Instantiating a model2 What IS th
- Page 136 and 137: In the __init__() method, we create
- Page 138 and 139: LayersA Linear model can be seen as
- Page 140 and 141: There are MANY different layers tha
"Wait a minute … I thought this book was about PyTorch!" Yes, it is,
but this serves two purposes: first, to introduce the structure of
our task, which will remain largely the same and, second, to show
you the main pain points so you can fully appreciate how much
PyTorch makes your life easier :-)
For training a model, there is a first initialization step (line numbers refer to
Notebook Cell 1.2 code below):
• Random initialization of parameters / weights (we have only two, b and
w)—lines 3 and 4
• Initialization of hyper-parameters (in our case, only learning rate and number of
epochs)—lines 9 and 11
Make sure to always initialize your random seed to ensure the
reproducibility of your results. As usual, the random seed is 42 [42] ,
the (second) least random [43] of all random seeds one could
possibly choose.
For each epoch, there are four training steps (line numbers refer to Notebook Cell
1.2 code below):
• Compute model’s predictions—this is the forward pass—line 15
• Compute the loss, using predictions and labels and the appropriate loss function
for the task at hand—lines 20 and 22
• Compute the gradients for every parameter—lines 25 and 26
• Update the parameters—lines 30 and 31
For now, we will be using batch gradient descent only, meaning, we’ll use all data
points for each one of the four steps above. It also means that going once through
all of the steps is already one epoch. Then, if we want to train our model over 1,000
epochs, we just need to add a single loop.
In Chapter 2, we’ll introduce mini-batch gradient descent, and
then we’ll have to include a second inner loop.
68 | Chapter 1: A Simple Regression Problem