pdfcoffee

soumyasankar99
from soumyasankar99 More from this publisher
09.05.2023 Views

Chapter 4Pooling layersLet's suppose that we want to summarize the output of a feature map. Again, wecan use the spatial contiguity of the output produced from a single feature mapand aggregate the values of a sub-matrix into one single output value syntheticallydescribing the "meaning" associated with that physical region.Max poolingOne easy and common choice is the so-called max-pooling operator, which simplyoutputs the maximum activation as observed in the region. In Keras, if we wantto define a max pooling layer of size 2×2, we write:model.add(layers.MaxPooling2D((2, 2)))An example of the max-pooling operation is given in Figure 4:Figure 4: An example of max poolingAverage poolingAnother choice is average pooling, which simply aggregates a region into the averagevalues of the activations observed in that region.Note that Keras implements a large number of pooling layers and a complete list isavailable online (https://keras.io/layers/pooling/). In short, all the poolingoperations are nothing more than a summary operation on a given region.ConvNets summarySo far, we have described the basic concepts of ConvNets. CNNs apply convolutionand pooling operations in 1 dimension for audio and text data along the timedimension, in two dimensions for images along the (height × width) dimensionsand in three dimensions for videos along the (height × width × time) dimensions.For images, sliding the filter over an input volume produces a map that providesthe responses of the filter for each spatial position.[ 113 ]

Convolutional Neural NetworksIn other words, a CNN has multiple filters stacked together that learn to recognizespecific visual features independently from the location in the image itself. Thosevisual features are simple in the initial layers of the network and become moreand more sophisticated deeper in the network. Training of a CNN requires theidentification of the right values for each filter so that an input, when passed throughmultiple layers, activates certain neurons of the last layer so that it will predict thecorrect values.An example of DCNN ‒ LeNetYann LeCun, who very recently won the Turing Award, proposed [1] a familyof convnets named LeNet trained for recognizing MNIST handwritten characterswith robustness to simple geometric transformations and distortion. The core ideaof LeNets is to have lower layers alternating convolution operations with maxpoolingoperations. The convolution operations are based on carefully chosen localreceptive fields with shared weights for multiple feature maps. Then, higher levelsare fully connected based on a traditional MLP with hidden layers and softmax asoutput layer.LeNet code in TensorFlow 2.0To define a LeNet in code we use a convolutional 2D module:layers.Convolution2D(20, (5, 5), activation='relu', input_shape=input_shape))Note that tf.keras.layers.Conv2D is an alias of tf.keras.layers.Convolution2D so the two can be used in aninterchangeable way. See https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D.Where the first parameter is the number of output filters in the convolution, andthe next tuple is the extension of each filter. An interesting optional parameter ispadding. There are two options: padding='valid' means that the convolution isonly computed where the input and the filter fully overlap and therefore the outputis smaller than the input, while padding='same' means that we have an outputwhich is the same size as the input, for which the area around the input is paddedwith zeros.In addition, we use a MaxPooling2D module:layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2))[ 114 ]

Chapter 4

Pooling layers

Let's suppose that we want to summarize the output of a feature map. Again, we

can use the spatial contiguity of the output produced from a single feature map

and aggregate the values of a sub-matrix into one single output value synthetically

describing the "meaning" associated with that physical region.

Max pooling

One easy and common choice is the so-called max-pooling operator, which simply

outputs the maximum activation as observed in the region. In Keras, if we want

to define a max pooling layer of size 2×2, we write:

model.add(layers.MaxPooling2D((2, 2)))

An example of the max-pooling operation is given in Figure 4:

Figure 4: An example of max pooling

Average pooling

Another choice is average pooling, which simply aggregates a region into the average

values of the activations observed in that region.

Note that Keras implements a large number of pooling layers and a complete list is

available online (https://keras.io/layers/pooling/). In short, all the pooling

operations are nothing more than a summary operation on a given region.

ConvNets summary

So far, we have described the basic concepts of ConvNets. CNNs apply convolution

and pooling operations in 1 dimension for audio and text data along the time

dimension, in two dimensions for images along the (height × width) dimensions

and in three dimensions for videos along the (height × width × time) dimensions.

For images, sliding the filter over an input volume produces a map that provides

the responses of the filter for each spatial position.

[ 113 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!