pdfcoffee

soumyasankar99
from soumyasankar99 More from this publisher
09.05.2023 Views

ConvolutionalNeural NetworksIn the previous chapters we have discussed DenseNets, in which each layer isfully connected to the adjacent layers. We looked at one application of these densenetworks in classifying the MNIST handwritten characters dataset. In that context,each pixel in the input image has been assigned to a neuron with a total of 784(28 × 28 pixels) input neurons. However, this strategy does not leverage the spatialstructure and relationships between each image. In particular, this piece of codeis a DenseNet that transforms the bitmap representing each written digit into a flatvector where the local spatial structure is removed. Removing the spatial structureis a problem because important information is lost:#X_train is 60000 rows of 28x28 values --> reshaped in 60000 x 784X_train = X_train.reshape(60000, 784)X_test = X_test.reshape(10000, 784)Convolutional neural networks (in short, convnets or CNNs) leverage spatialinformation, and they are therefore very well-suited for classifying images. Thesenets use an ad hoc architecture inspired by biological data taken from physiologicalexperiments performed on the visual cortex. As we discussed in Chapter 2, TensorFlow1.x and 2.x, our vision is based on multiple cortex levels, each one recognizing moreand more structured information. First, we see single pixels, then from that werecognize simple geometric forms and then more and more sophisticated elementssuch as objects, faces, human bodies, animals, and so on.Convolutional neural networks are a fascinating subject. Over a short period of time,they have shown themselves to be a disruptive technology, breaking performancerecords in multiple domains from text, to video, to speech, going well beyond theinitial image processing domain where they were originally conceived.[ 109 ]

Convolutional Neural NetworksIn this chapter we will introduce the idea of CNNs a particular type of neuralnetworks that have large importance for deep learning.Deep Convolutional Neural Network(DCNN)A Deep Convolutional Neural Network (DCNN) consists of many neuralnetwork layers. Two different types of layers, convolutional and pooling (that is,subsampling), are typically alternated. The depth of each filter increases from left toright in the network. The last stage is typically made of one or more fully connectedlayers:Figure 1: An example of a DCNNThere are three key underlying concepts for convnets: local receptive fields, sharedweights, and pooling. Let's review them together.Local receptive fieldsIf we want to preserve the spatial information of an image or other form of data, thenit is convenient to represent each image with a matrix of pixels. Given this, a simpleway to encode the local structure is to connect a submatrix of adjacent input neuronsinto one single hidden neuron belonging to the next layer. That single hidden neuronrepresents one local receptive field. Note that this operation is named convolution,and this is where the name for this type of network is derived. You can think aboutconvolution as the treatment of a matrix by another matrix, referred to as a kernel.Of course, we can encode more information by having overlapping submatrices. Forinstance, let's suppose that the size of each single submatrix is 5×5 and that thosesubmatrices are used with MNIST images of 28×28 pixels, then we will be able togenerate 24×24 local receptive field neurons in the hidden layer.[ 110 ]

Convolutional Neural Networks

In this chapter we will introduce the idea of CNNs a particular type of neural

networks that have large importance for deep learning.

Deep Convolutional Neural Network

(DCNN)

A Deep Convolutional Neural Network (DCNN) consists of many neural

network layers. Two different types of layers, convolutional and pooling (that is,

subsampling), are typically alternated. The depth of each filter increases from left to

right in the network. The last stage is typically made of one or more fully connected

layers:

Figure 1: An example of a DCNN

There are three key underlying concepts for convnets: local receptive fields, shared

weights, and pooling. Let's review them together.

Local receptive fields

If we want to preserve the spatial information of an image or other form of data, then

it is convenient to represent each image with a matrix of pixels. Given this, a simple

way to encode the local structure is to connect a submatrix of adjacent input neurons

into one single hidden neuron belonging to the next layer. That single hidden neuron

represents one local receptive field. Note that this operation is named convolution,

and this is where the name for this type of network is derived. You can think about

convolution as the treatment of a matrix by another matrix, referred to as a kernel.

Of course, we can encode more information by having overlapping submatrices. For

instance, let's suppose that the size of each single submatrix is 5×5 and that those

submatrices are used with MNIST images of 28×28 pixels, then we will be able to

generate 24×24 local receptive field neurons in the hidden layer.

[ 110 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!