03.09.2020 Views

Getting started with Computer Vision

A guide to the knowledge and application of visual systems

A guide to the knowledge and application of visual systems

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1 An introduction to<br />

computer vision<br />

a. FAQs<br />

What is computer vision?<br />

Of the five human senses, vision is the one that provides most<br />

of the data we receive and is considered our dominant sense.<br />

It provides us <strong>with</strong> a detailed description of the surrounding<br />

world which is constantly changing. Although vision involves<br />

a huge amount of information and complex processing, the<br />

human visual system can interpret this information easily.<br />

The ability to see, process and then act on visual input is<br />

something that most humans take for granted.<br />

<strong>Computer</strong> vision engineering is the practice of using<br />

technology and machines to replicate, and even improve upon,<br />

human vision. The technology captures and stores images<br />

before transforming them into information that can be further<br />

acted upon.<br />

This requires expertise across a range of fields, including sensor<br />

technology, image and signal processing, computer graphics,<br />

computer architecture, algorithms and machine learning.<br />

What are the fundamental computer vision<br />

techniques?<br />

Image classifcation gives a computer the ability to interpret<br />

the input from an image sensor and categorise what it ‘sees’.<br />

Object Detection detecting instances of a certain class (such<br />

as vehicles, humans, buildings) in images or videos.<br />

Object Tracking detecting and recognizing a defined item<br />

in each frame of a video to distinguish it from other objects in<br />

the scene.<br />

3D Image Reconstruction the process of capturing the shape<br />

and appearance of real objects.<br />

Semantic Image Segmentation when specific regions of an<br />

image are labelled according to what the object is.<br />

What’s the difference between image<br />

processing, computer vision and machine<br />

learning?<br />

Each of these fields is based on the input of an image. They<br />

process the pixels and give us an altered output in return. While<br />

their names imply their goals and methodologies, these fields<br />

depend substantially on one another.<br />

Relationship between AI,<br />

machine learning and deep learning<br />

Artificial intelligence<br />

Machine learning<br />

Deep<br />

learning<br />

Artifcial intelligence:<br />

AI is the theory and<br />

development of computer<br />

systems to perform tasks<br />

normally requiring human<br />

intelligence.<br />

Machine learning: is an<br />

application of AI based<br />

around the idea of giving<br />

machines access to data<br />

and letting them learn for<br />

themselves.<br />

Deep learning: is a special<br />

type of machine learning<br />

algorithm, multiple layers of<br />

neural networks that mimic<br />

the connectivity of the<br />

human brain in processing<br />

data and creating patterns<br />

for use in decision making.<br />

As a minimum an AI system must be able to reproduce aspects<br />

of human intelligence<br />

Image processing takes an image as an input and provides a<br />

processed image as an output. The purpose of the processing<br />

is usually to improve the quality of the image. Typical methods<br />

used are filtering, noise removal, sharpening and edge detection.<br />

<strong>Computer</strong> vision broadens the purpose of image processing<br />

to include quantitative and qualitative information from visual<br />

data. Similar to the process of human visual reasoning, computer<br />

vision can distinguish between objects, classify them and sort<br />

them according to their attributes. <strong>Computer</strong> vision, like image<br />

processing, takes an image as an input. However, it returns an<br />

output <strong>with</strong> additional information interpreted from the image<br />

such as size, colour, number, location or orientation.<br />

This can be extended beyond the extraction of meaningful<br />

information from a single image to multiple images or video,<br />

for example, to count the number of cars passing by a point on<br />

the street as they are recorded by a video camera. Temporal<br />

information therefore plays a role in computer vision, much as<br />

it does <strong>with</strong> our own understanding of the world.<br />

Machine learning is the application of intelligence that<br />

provides the computer system <strong>with</strong> the ability to automatically<br />

learn and improve from experience <strong>with</strong>out having to be<br />

programmed. In computer vision terms, this means ‘training’ a<br />

system. Algorithms and statistical models are used to perform<br />

image analysis using patterns and inference trained on data sets<br />

of many thousands of images for automatic learning, rather<br />

than using explicit instructions as image processing would.<br />

3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!