pdfcoffee

soumyasankar99
from soumyasankar99 More from this publisher
09.05.2023 Views

Advanced ConvolutionalNeural NetworksIn this chapter we will see some more advanced uses for convolutional neuralnetworks (CNNs). We will explore how CNNs can be applied within the areas ofcomputer vision, video, textual documents, audio, and music. We'll conclude witha section summarizing convolution operations. We'll begin our look into CNNs withimage processing.Computer visionIn this section we'll look at the ways in which CNN architecture can be utilizedwhen applied to the area of imagine processing, and the interesting results that canbe generated.Composing CNNs for complex tasksWe have discussed CNNs quite extensively in the previous chapter, and at this pointyou are probably convinced about the effectiveness of the CNN architecture forimage classification tasks. What you may find surprising, however, is that the basicCNN architecture can be composed and extended in various ways to solve a varietyof more complex tasks.[ 139 ]

Advanced Convolutional Neural NetworksIn this section, we will look at the computer vision tasks in the following diagramand show how they can be solved by composing CNNs into larger and morecomplex architectures:Figure 1: Different computer vision tasks. Source: Introduction to Artificial Intelligence and ComputerVision Revolution (https://www.slideshare.net/darian_f/introduction-to-the-artificial-intelligence-andcomputer-vision-revolution).Classification and localizationIn the classification and localization task not only do you have to report the class ofobject found in the image, but also the coordinates of the bounding box where theobject appears in the image. This type of task assumes that there is only one instanceof the object in an image.This can be achieved by attaching a "regression head" in addition to the"classification head" in a typical classification network. Recall that in a classificationnetwork, the final output of convolution and pooling operations, called thefeature map, is fed into a fully connected network that produces a vector of classprobabilities. This fully connected network is called the classification head, and it istuned using a categorical loss function (L c) such as categorical cross entropy.Similarly, a regression head is another fully connected network that takes the featuremap and produces a vector (x, y, w, h) representing the top-left x and y coordinates,width and height of the bounding box. It is tuned using a continuous loss function(L r) such as mean squared error. The entire network is tuned using a linearcombination of the two losses, that is:LL = ααLL CC + (1 − αα)LL rr[ 140 ]

Advanced Convolutional Neural Networks

In this section, we will look at the computer vision tasks in the following diagram

and show how they can be solved by composing CNNs into larger and more

complex architectures:

Figure 1: Different computer vision tasks. Source: Introduction to Artificial Intelligence and Computer

Vision Revolution (https://www.slideshare.net/darian_f/introduction-to-the-artificial-intelligence-andcomputer-vision-revolution).

Classification and localization

In the classification and localization task not only do you have to report the class of

object found in the image, but also the coordinates of the bounding box where the

object appears in the image. This type of task assumes that there is only one instance

of the object in an image.

This can be achieved by attaching a "regression head" in addition to the

"classification head" in a typical classification network. Recall that in a classification

network, the final output of convolution and pooling operations, called the

feature map, is fed into a fully connected network that produces a vector of class

probabilities. This fully connected network is called the classification head, and it is

tuned using a categorical loss function (L c

) such as categorical cross entropy.

Similarly, a regression head is another fully connected network that takes the feature

map and produces a vector (x, y, w, h) representing the top-left x and y coordinates,

width and height of the bounding box. It is tuned using a continuous loss function

(L r

) such as mean squared error. The entire network is tuned using a linear

combination of the two losses, that is:

LL = ααLL CC + (1 − αα)LL rr

[ 140 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!