09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The output of this network is combined with the feature map and passed in

through a similar pipeline to the Fast R-CNN network, as shown in Figure 7. The

Faster R-CNN network is about 10x faster than the Fast R-CNN network, making

it approximately 250x faster than an R-CNN network:

Chapter 5

Figure 7: Faster R-CNN network architecture

Another somewhat different class of object detection networks are Single Shot

Detectors (SSD) such as You Only Look Once (YOLO). In these cases, each image is

split into a predefined number of parts using a grid. In the case of YOLO, a 7×7 grid

is used, resulting in 49 subimages. A predetermined set of crops with different aspect

ratios are applied to each sub-image. Given B bounding boxes and C object classes,

the output for each image is a vector of size (7 * 7 * (5B + C)). Each bounding box has

a confidence and coordinates (x, y, w, h), and each grid has prediction probabilities

for the different objects detected within them.

The YOLO network is a CNN that does this transformation. The final predictions

and bounding boxes are found by aggregating the findings from this vector. In

YOLO a single convolutional network predicts the bounding boxes and the related

class probabilities. YOLO is the faster solution for object detection, but the algorithm

might fail to detect smaller objects (an implementation can be found at https://

www.kaggle.com/aruchomu/yolo-v3-object-detection-in-tensorflow).

Instance segmentation

Instance segmentation is similar to semantic segmentation – the process of

associating each pixel of an image with a class label – with a few important

distinctions. First, it needs to distinguish between different instances of the same

class in an image. Second, it is not required to label every single pixel in the

image. In some respects, instance segmentation is also similar to object detection,

except that instead of bounding boxes, we want to find a binary mask that covers

each object.

[ 145 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!