Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

np.concatenate([dummy_points[:5].numpy(),dummy_sbs.predict(dummy_points)[:5]], axis=1)Outputarray([[-0.9012059 , -0.9012059 ],[ 0.56559485, 0.56559485],[-0.48822638, -0.48822638],[ 0.75069577, 0.75069577],[ 0.58925384, 0.58925384]], dtype=float32)It looks like the model actually learned the identity function … or did it? Let’scheck its parameters:dummy_model.state_dict()OutputOrderedDict([('linear.weight', tensor([[0.1488]], device='cuda:0')),('linear.bias', tensor([-0.3326], device='cuda:0'))])For an input value equal to zero, the output of the linear layer will be -0.3326,which, in turn, will be chopped off by the ReLU activation. Now I have a questionfor you:"Which input values produce outputs greater than zero?"The answer: Input values above 2.2352 (=0.3326/0.1488) will produce positiveoutputs, which, in turn, will pass through the ReLU activation. But I have anotherquestion for you:"Guess what is the highest input value in our dataset?"Residual Connections | 549

Close enough! I am assuming you answered 2.2352, but it is just a little bit less thanthat:dummy_points.max()Outputtensor(2.2347)"So what? Does it actually mean anything?"It means the model learned to stay out of the way of the inputs! Now that themodel has the ability to use the raw inputs directly, its linear layer learned toproduce only negative values, so its nonlinearity (ReLU) produces only zeros. Cool,right?The Power of ShortcutsThe residual connection works as a shortcut, enabling the modelto skip the nonlinearities when it pays off to do so (if it yields alower loss). For this reason, residual connections are also knownas skip connections."I’m still not convinced … what’s the big deal about this?"The big deal is, these shortcuts make the loss surface smoother, so gradientdescent has an easier job finding a minimum. Don’t take my word for it—go andcheck the beautiful loss landscape visualizations produced by Li et al. in their paper"Visualizing the Loss Landscape of Neural Nets." [129]Awesome, right? These are projections of a multi-dimensional loss surface for theResNet model, with and without skip connections. Guess which one is easier totrain? :-)If you’re curious to see more landscapes like these, make sure tocheck their website: "Visualizing the Loss Landscape of NeuralNets." [130]550 | Chapter 7: Transfer Learning

np.concatenate([dummy_points[:5].numpy(),

dummy_sbs.predict(dummy_points)[:5]], axis=1)

Output

array([[-0.9012059 , -0.9012059 ],

[ 0.56559485, 0.56559485],

[-0.48822638, -0.48822638],

[ 0.75069577, 0.75069577],

[ 0.58925384, 0.58925384]], dtype=float32)

It looks like the model actually learned the identity function … or did it? Let’s

check its parameters:

dummy_model.state_dict()

Output

OrderedDict([('linear.weight', tensor([[0.1488]], device='cuda:0')),

('linear.bias', tensor([-0.3326], device='cuda:0'))])

For an input value equal to zero, the output of the linear layer will be -0.3326,

which, in turn, will be chopped off by the ReLU activation. Now I have a question

for you:

"Which input values produce outputs greater than zero?"

The answer: Input values above 2.2352 (=0.3326/0.1488) will produce positive

outputs, which, in turn, will pass through the ReLU activation. But I have another

question for you:

"Guess what is the highest input value in our dataset?"

Residual Connections | 549

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!