22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Close enough! I am assuming you answered 2.2352, but it is just a little bit less than

that:

dummy_points.max()

Output

tensor(2.2347)

"So what? Does it actually mean anything?"

It means the model learned to stay out of the way of the inputs! Now that the

model has the ability to use the raw inputs directly, its linear layer learned to

produce only negative values, so its nonlinearity (ReLU) produces only zeros. Cool,

right?

The Power of Shortcuts

The residual connection works as a shortcut, enabling the model

to skip the nonlinearities when it pays off to do so (if it yields a

lower loss). For this reason, residual connections are also known

as skip connections.

"I’m still not convinced … what’s the big deal about this?"

The big deal is, these shortcuts make the loss surface smoother, so gradient

descent has an easier job finding a minimum. Don’t take my word for it—go and

check the beautiful loss landscape visualizations produced by Li et al. in their paper

"Visualizing the Loss Landscape of Neural Nets." [129]

Awesome, right? These are projections of a multi-dimensional loss surface for the

ResNet model, with and without skip connections. Guess which one is easier to

train? :-)

If you’re curious to see more landscapes like these, make sure to

check their website: "Visualizing the Loss Landscape of Neural

Nets." [130]

550 | Chapter 7: Transfer Learning

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!