16.03.2021 Views

Advanced Deep Learning with Keras

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 2

Figure 2.2.2: A comparison between a block in a typical CNN and a block in ResNet.

To prevent degradation in gradients during backpropagation, a shortcut connection is introduced.

To alleviate the degradation of the gradient in deep networks, ResNet introduced

the concept of a deep residual learning framework. Let's analyze a block, a small

segment of our deep network.

The preceding figure shows a comparison between a typical CNN block and

a ResNet residual block. The idea of ResNet is that in order to prevent the gradient

from degrading, we'll let the information flow through the shortcut connections

to reach the shallow layers.

Next, we're going to look at more details within the discussion of the differences

between the two blocks. Figure 2.2.3 shows more details of the CNN block of another

commonly used deep network, VGG[3], and ResNet. We'll represent the layer feature

maps as x. The feature maps at layer l are x

l

. The operations in the CNN layer are

Conv2D-Batch Normalization (BN)-ReLU.

Let's suppose we represent this set of operations in the form of H() = Conv2D-Batch

Normalization(BN)-ReLU, that will then mean that:

( )

x = l 1

H x

(Equation 2.2.1)

− l -2

x

( x )

l

= H

(Equation 2.2.2)

l-1

In other words, the feature maps at layer l - 2 are transformed to x

l−1

by H() =

Conv2D-Batch Normalization(BN)-ReLU. The same set of operations is applied

to transform x

l−1

to x

l

. To put this another way, if we have an 18-layer VGG, then

there are 18 H() operations before the input image is transformed to the 18 th layer

feature maps.

[ 51 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!