Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Another advantage of these shortcuts is that they provide ashorter path for the gradients to travel back to the initial layers,thus preventing the vanishing gradients problem.Residual BlocksWe’re finally ready to tackle the main component of the ResNet model (the topperformer of ILSVRC-2015), the residual block.Figure 7.10 - Residual blockThe residual block isn’t so different from our own DummyResidual model, except forthe fact that the residual block has two consecutive weight layers and a ReLUactivation at the end. Moreover, it may have more than two consecutive weightlayers, and the weight layers do not necessarily need to be linear.For image classification, it makes much more sense to use convolutional instead oflinear layers, right? Right! And why not throw some batch normalization layers inthe mix? Sure! The residual block looks like this now:Residual Connections | 551

class ResidualBlock(nn.Module):def __init__(self, in_channels, out_channels, stride=1):super(ResidualBlock, self).__init__()self.conv1 = nn.Conv2d(in_channels, out_channels,kernel_size=3, padding=1, stride=stride,bias=False)self.bn1 = nn.BatchNorm2d(out_channels)self.relu = nn.ReLU(inplace=True)self.conv2 = nn.Conv2d(out_channels, out_channels,kernel_size=3, padding=1,bias=False)self.bn2 = nn.BatchNorm2d(out_channels)self.downsample = Noneif out_channels != in_channels:self.downsample = nn.Conv2d(in_channels, out_channels,kernel_size=1, stride=stride)def forward(self, x):identity = x# First "weight layer" + activationout = self.conv1(x)out = self.bn1(out)out = self.relu(out)# Second "weight layer"out = self.conv2(out)out = self.bn2(out)# What is that?!if self.downsample is not None:identity = self.downsample(identity)# Adding inputs before activationout = out + identityout = self.relu(out)return out552 | Chapter 7: Transfer Learning

Another advantage of these shortcuts is that they provide a

shorter path for the gradients to travel back to the initial layers,

thus preventing the vanishing gradients problem.

Residual Blocks

We’re finally ready to tackle the main component of the ResNet model (the top

performer of ILSVRC-2015), the residual block.

Figure 7.10 - Residual block

The residual block isn’t so different from our own DummyResidual model, except for

the fact that the residual block has two consecutive weight layers and a ReLU

activation at the end. Moreover, it may have more than two consecutive weight

layers, and the weight layers do not necessarily need to be linear.

For image classification, it makes much more sense to use convolutional instead of

linear layers, right? Right! And why not throw some batch normalization layers in

the mix? Sure! The residual block looks like this now:

Residual Connections | 551

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!