pdfcoffee
Chapter 5Here αα is a hyperparameter and can take a value between 0 and 1. Unless the valueis determined by some domain knowledge about the problem, it can be set to 0.5.The following figure shows a typical classification and localization networkarchitecture. As you can see, the only difference with respect to a typical CNNclassification network is the additional regression head on the top right:Figure 2: Network architecture for image classification and localizationSemantic segmentationAnother class of problem that builds on the basic classification idea is "semanticsegmentation." Here the aim is to classify every single pixel on the image asbelonging to a single class.An initial method of implementation could be to build a classifier network for eachpixel, where the input is a small neighborhood around each pixel. In practice, thisapproach is not very performant, so an improvement over this implementationmight be to run the image through convolutions that will increase the feature depth,while keeping the image width and height constant. Each pixel then has a featuremap that can be sent through a fully connected network that predicts the class of thepixel. However, in practice, this is also quite expensive, and it is not normally used.A third approach is to use a CNN encoder-decoder network, where the encoderdecreases the width and height of the image but increases its depth (number offeatures), while the decoder uses transposed convolution operations to increase itssize and decrease depth. Transpose convolution (or upsampling) is the process ofgoing in the opposite direction of a normal convolution. The input to this network isthe image and the output is the segmentation map.[ 141 ]
Advanced Convolutional Neural NetworksA popular implementation of this encoder-decoder architecture is the U-Net (agood implementation is available at: https://github.com/jakeret/tf_unet),originally developed for biomedical image segmentation, which has additional skipconnectionsbetween corresponding layers of the encoder and decoder. The U-Netarchitecture is shown in the following figure:Figure 3: U-Net architecture. Source: Pattern Recognition and Image Processing(https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/).Object detectionThe object detection task is similar to the classification and localization tasks. The bigdifference is that now there are multiple objects in the image, and for each one weneed to find the class and bounding box coordinates. In addition, neither the numberof objects nor their size is known in advance. As you can imagine, this is a difficultproblem and a fair amount of research has gone into it.A first approach to the problem might be to create many random crops of theinput image and for each crop, apply the classification and localization networkswe described earlier. However, such an approach is very wasteful in terms ofcomputing and unlikely to be very successful.[ 142 ]
- Page 125 and 126: RegressionTake a look at the last t
- Page 127 and 128: Regression3. Now, we calculate the
- Page 129 and 130: RegressionIn the next section we wi
- Page 131 and 132: Regression2. Now, we define the fea
- Page 133 and 134: Regression2. Download the dataset:(
- Page 135 and 136: RegressionThe following is the Tens
- Page 137 and 138: RegressionIn regression the aim is
- Page 139 and 140: RegressionThe Estimator outputs the
- Page 141 and 142: RegressionThe following is the grap
- Page 143 and 144: RegressionReferencesHere are some g
- Page 145 and 146: Convolutional Neural NetworksIn thi
- Page 147 and 148: Convolutional Neural NetworksIn thi
- Page 149 and 150: Convolutional Neural NetworksIn oth
- Page 151 and 152: Convolutional Neural NetworksThen w
- Page 153 and 154: Convolutional Neural NetworksHoweve
- Page 155 and 156: Convolutional Neural NetworksPlotti
- Page 157 and 158: Convolutional Neural NetworksIn gen
- Page 159 and 160: Convolutional Neural NetworksOur ne
- Page 161 and 162: Convolutional Neural NetworksThese
- Page 163 and 164: Convolutional Neural NetworksSo, we
- Page 165 and 166: Convolutional Neural NetworksEach i
- Page 167 and 168: Convolutional Neural NetworksVery d
- Page 169 and 170: Convolutional Neural NetworksRecogn
- Page 171 and 172: Convolutional Neural NetworksIf we
- Page 173 and 174: Convolutional Neural NetworksRefere
- Page 175: Advanced Convolutional Neural Netwo
- Page 179 and 180: Advanced Convolutional Neural Netwo
- Page 181 and 182: Advanced Convolutional Neural Netwo
- Page 183 and 184: Advanced Convolutional Neural Netwo
- Page 185 and 186: Advanced Convolutional Neural Netwo
- Page 187 and 188: Advanced Convolutional Neural Netwo
- Page 189 and 190: Advanced Convolutional Neural Netwo
- Page 191 and 192: Advanced Convolutional Neural Netwo
- Page 193 and 194: Advanced Convolutional Neural Netwo
- Page 195 and 196: Advanced Convolutional Neural Netwo
- Page 197 and 198: Advanced Convolutional Neural Netwo
- Page 199 and 200: Advanced Convolutional Neural Netwo
- Page 201 and 202: Advanced Convolutional Neural Netwo
- Page 203 and 204: Advanced Convolutional Neural Netwo
- Page 205 and 206: Advanced Convolutional Neural Netwo
- Page 207 and 208: Advanced Convolutional Neural Netwo
- Page 209 and 210: Advanced Convolutional Neural Netwo
- Page 211 and 212: Advanced Convolutional Neural Netwo
- Page 213 and 214: Advanced Convolutional Neural Netwo
- Page 215 and 216: Advanced Convolutional Neural Netwo
- Page 217 and 218: Advanced Convolutional Neural Netwo
- Page 219 and 220: Advanced Convolutional Neural Netwo
- Page 221 and 222: Advanced Convolutional Neural Netwo
- Page 223 and 224: Advanced Convolutional Neural Netwo
Advanced Convolutional Neural Networks
A popular implementation of this encoder-decoder architecture is the U-Net (a
good implementation is available at: https://github.com/jakeret/tf_unet),
originally developed for biomedical image segmentation, which has additional skipconnections
between corresponding layers of the encoder and decoder. The U-Net
architecture is shown in the following figure:
Figure 3: U-Net architecture. Source: Pattern Recognition and Image Processing
(https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/).
Object detection
The object detection task is similar to the classification and localization tasks. The big
difference is that now there are multiple objects in the image, and for each one we
need to find the class and bounding box coordinates. In addition, neither the number
of objects nor their size is known in advance. As you can imagine, this is a difficult
problem and a fair amount of research has gone into it.
A first approach to the problem might be to create many random crops of the
input image and for each crop, apply the classification and localization networks
we described earlier. However, such an approach is very wasteful in terms of
computing and unlikely to be very successful.
[ 142 ]