Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Output[0.49671415] [-0.1382643][0.80119529] [0.04511107]Step 5 - Rinse and Repeat!Now we use the updated parameters to go back to Step 1 and restart the process.Definition of EpochAn epoch is complete whenever every point in the training set(N) has already been used in all steps: forward pass, computingloss, computing gradients, and updating parameters.During one epoch, we perform at least one update, but no morethan N updates.The number of updates (N/n) will depend on the type of gradientdescent being used:• For batch (n = N) gradient descent, this is trivial, as it uses allpoints for computing the loss—one epoch is the same as oneupdate.• For stochastic (n = 1) gradient descent, one epoch means Nupdates, since every individual data point is used to performan update.• For mini-batch (of size n), one epoch has N/n updates, since amini-batch of n data points is used to perform an update.Repeating this process over and over for many epochs is, in a nutshell, training amodel.Linear Regression in NumpyIt’s time to implement our linear regression model using gradient descent andNumpy only.Linear Regression in Numpy | 67

"Wait a minute … I thought this book was about PyTorch!" Yes, it is,but this serves two purposes: first, to introduce the structure ofour task, which will remain largely the same and, second, to showyou the main pain points so you can fully appreciate how muchPyTorch makes your life easier :-)For training a model, there is a first initialization step (line numbers refer toNotebook Cell 1.2 code below):• Random initialization of parameters / weights (we have only two, b andw)—lines 3 and 4• Initialization of hyper-parameters (in our case, only learning rate and number ofepochs)—lines 9 and 11Make sure to always initialize your random seed to ensure thereproducibility of your results. As usual, the random seed is 42 [42] ,the (second) least random [43] of all random seeds one couldpossibly choose.For each epoch, there are four training steps (line numbers refer to Notebook Cell1.2 code below):• Compute model’s predictions—this is the forward pass—line 15• Compute the loss, using predictions and labels and the appropriate loss functionfor the task at hand—lines 20 and 22• Compute the gradients for every parameter—lines 25 and 26• Update the parameters—lines 30 and 31For now, we will be using batch gradient descent only, meaning, we’ll use all datapoints for each one of the four steps above. It also means that going once throughall of the steps is already one epoch. Then, if we want to train our model over 1,000epochs, we just need to add a single loop.In Chapter 2, we’ll introduce mini-batch gradient descent, andthen we’ll have to include a second inner loop.68 | Chapter 1: A Simple Regression Problem

"Wait a minute … I thought this book was about PyTorch!" Yes, it is,

but this serves two purposes: first, to introduce the structure of

our task, which will remain largely the same and, second, to show

you the main pain points so you can fully appreciate how much

PyTorch makes your life easier :-)

For training a model, there is a first initialization step (line numbers refer to

Notebook Cell 1.2 code below):

• Random initialization of parameters / weights (we have only two, b and

w)—lines 3 and 4

• Initialization of hyper-parameters (in our case, only learning rate and number of

epochs)—lines 9 and 11

Make sure to always initialize your random seed to ensure the

reproducibility of your results. As usual, the random seed is 42 [42] ,

the (second) least random [43] of all random seeds one could

possibly choose.

For each epoch, there are four training steps (line numbers refer to Notebook Cell

1.2 code below):

• Compute model’s predictions—this is the forward pass—line 15

• Compute the loss, using predictions and labels and the appropriate loss function

for the task at hand—lines 20 and 22

• Compute the gradients for every parameter—lines 25 and 26

• Update the parameters—lines 30 and 31

For now, we will be using batch gradient descent only, meaning, we’ll use all data

points for each one of the four steps above. It also means that going once through

all of the steps is already one epoch. Then, if we want to train our model over 1,000

epochs, we just need to add a single loop.

In Chapter 2, we’ll introduce mini-batch gradient descent, and

then we’ll have to include a second inner loop.

68 | Chapter 1: A Simple Regression Problem

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!