Getting started with Computer Vision

Getting 

started with 

Computer 

Vision 

A guide to the knowledge and application 

of visual systems 

censis.org.uk

The goal of computer vision is to extract 

meaning from pixels and perform visual 

tasks similar to the human visual system. Interest in 

how machines ‘see’ and how computer vision 

can be used to build products for consumers and 

businesses is growing rapidly. 

ication 

Identif 

Recognition 

Capabilities of 

computer vision 

Tracking 

Real-time analysis 

If you are reading the printed version of this brochure, you can download a hyperlinked pdf at censis.org.uk/brochures 

1

Contents 

1 An introduction to computer vision 3 

a. FAQs 3 

b. A brief history of computer vision 4 

c. The evolution of computer vision 5 

d. Deep learning breakthrough 5 

2 Application examples 7 

a. Smart homes 7 

b. Smart cities 7 

c. Industry 7 

d. Healthcare 8 

e. Agriculture 8 

f. Security 8 

g. Autonomous vehicles 8 

h. AR/VR & immersive technologies 8 

3 How is computer vision used in business? 9 

a. Benefits for business, industry and society 9 

b. Technical challenges 9 

c. Privacy 9 

4 How to set up a computer vision system 10 

a. Basic components 10 

b. Hardware platforms 10 

c. Software tools 10 

d. Digital imaging system stack 10 

5 How to process and interpret images 11 

a. Image as an array 11 

b. Image processing 12 

c. Machine learning 12 

d. Deep learning 13 

e. Choosing machine learning or deep learning 13 

f. Image processing libraries 13 

g. Machine learning frameworks 14 

6 Embedded vision 15 

a. Embedded vision platforms 15 

b. Camera modules 16 

c. Interfaces 16 

7 Computer vision & IoT 17 

a. Cloud vs edge processing 17 

b. Cloud platform and machine learning vendors 18 

c. Machine vision and IoT 18 

8 Implementing computer vision 19 

a. Your first prototype 19 

b. How CENSIS can help 20 

c. IoT2Go Vision kit 20 

9 Incubators and learning resources 21 

10 The computer vision community in Scotland 22 

a. Companies in Scotland 22 

b. Research in Scotland 23 

Glossary 24 

2

1 An introduction to 


a. FAQs 

What is computer vision? 

Of the five human senses, vision is the one that provides most 

of the data we receive and is considered our dominant sense. 

It provides us with a detailed description of the surrounding 

world which is constantly changing. Although vision involves 

a huge amount of information and complex processing, the 

human visual system can interpret this information easily. 

The ability to see, process and then act on visual input is 

something that most humans take for granted. 

Computer vision engineering is the practice of using 

technology and machines to replicate, and even improve upon, 

human vision. The technology captures and stores images 

before transforming them into information that can be further 

acted upon. 

This requires expertise across a range of fields, including sensor 

technology, image and signal processing, computer graphics, 

computer architecture, algorithms and machine learning. 

What are the fundamental computer vision 

techniques? 

Image classifcation gives a computer the ability to interpret 

the input from an image sensor and categorise what it ‘sees’. 

Object Detection detecting instances of a certain class (such 

as vehicles, humans, buildings) in images or videos. 

Object Tracking detecting and recognizing a defined item 

in each frame of a video to distinguish it from other objects in 

the scene. 

3D Image Reconstruction the process of capturing the shape 

and appearance of real objects. 

Semantic Image Segmentation when specific regions of an 

image are labelled according to what the object is. 

What’s the difference between image 

processing, computer vision and machine 

learning? 

Each of these fields is based on the input of an image. They 

process the pixels and give us an altered output in return. While 

their names imply their goals and methodologies, these fields 

depend substantially on one another. 

Relationship between AI, 

machine learning and deep learning 

Artificial intelligence 

Machine learning 

Deep 

learning 

Artifcial intelligence: 

AI is the theory and 

development of computer 

systems to perform tasks 

normally requiring human 

intelligence. 

Machine learning: is an 

application of AI based 

around the idea of giving 

machines access to data 

and letting them learn for 

themselves. 

Deep learning: is a special 

type of machine learning 

algorithm, multiple layers of 

neural networks that mimic 

the connectivity of the 

human brain in processing 

data and creating patterns 

for use in decision making. 

As a minimum an AI system must be able to reproduce aspects 

of human intelligence 

Image processing takes an image as an input and provides a 

processed image as an output. The purpose of the processing 

is usually to improve the quality of the image. Typical methods 

used are filtering, noise removal, sharpening and edge detection. 

Computer vision broadens the purpose of image processing 

to include quantitative and qualitative information from visual 

data. Similar to the process of human visual reasoning, computer 

vision can distinguish between objects, classify them and sort 

them according to their attributes. Computer vision, like image 

processing, takes an image as an input. However, it returns an 

output with additional information interpreted from the image 

such as size, colour, number, location or orientation. 

This can be extended beyond the extraction of meaningful 

information from a single image to multiple images or video, 

for example, to count the number of cars passing by a point on 

the street as they are recorded by a video camera. Temporal 

information therefore plays a role in computer vision, much as 

it does with our own understanding of the world. 

Machine learning is the application of intelligence that 

provides the computer system with the ability to automatically 

learn and improve from experience without having to be 

programmed. In computer vision terms, this means ‘training’ a 

system. Algorithms and statistical models are used to perform 

image analysis using patterns and inference trained on data sets 

of many thousands of images for automatic learning, rather 

than using explicit instructions as image processing would. 

3

What is artificial intelligence (AI)? 

Artificial intelligence is intelligence demonstrated by machines, 

where any device can perceive its environment and mimic 

human functions such as ‘learning’ and ‘problem solving’. 

Artificial intelligence, or AI, is the broad concept of machines 

being able to carry out tasks in a way that is considered ‘smart’. 

What are neural networks? 

Neural networks are a means of machine learning, where 

a computer learns to perform a task by analysing training 

examples or datasets. Usually, the dataset examples have been 

manually labelled in advance. An object recognition system 

might be fed thousands of labelled images of cars, houses, 

cups and would find visual patterns in the images that correlate 

consistently with the particular label. 

What is deep learning? 

Deep learning is the use of neural network methods to perform 

image analysis, moving away from statistical methods to neural 

network algorithms which are developed to mimic the neurons 

of the human brain. 

What applications can computer vision 

be used for? 

Applications of computer vision are many and varied. 

Common applications you may be familiar with include 

augmented reality, facial recognition, gesture and handwriting 

recognition, machine vision, remote sensing, robotics, 

autonomous vehicles, people counting and iris recognition. 

What business sectors use computer vision? 

Computer vision has numerous applications such as 

remote sensing, healthcare (particularly around medical 

imaging such as MRI scans or ultrasound imaging), security, 

manufacturing, automotive, transport, robotics, sports, 

gaming and many others. 

The computer vision 

market is expected 

to reach close to $22 billion 

by 2026 

https://www.verifiedmarketresearch.com/product/globalcomputer-vision-market-size-and-forecast-to-2025/ 

b. A brief history of 


Computer vision has a long history in commercial and 

government use where light wave sensors in various spectrum 

ranges have been deployed in many applications such as: 

• Remote sensing for environmental observation 

and management 

• High resolution cameras that collect intelligence over 

battlefields 

• Thermal imagers to detect people during police operations 

• X-ray sensors for airport security. 

The sensors can be stationary or attached to moving objects, 

such as satellites, drones and vehicles. When combined with 

connectivity technologies such as Wi-Fi, Bluetooth or 3G/4G/5G, 

they create a new set of applications that were not possible before. 

Computer vision, coupled with connectivity, advanced data 

analytics and artificial intelligence, are catalysts for each other, 

giving rise to revolutionary leaps in IoT innovations 

and applications. 

4

c. The evolution of computer vision 

1960s 

Computer vision technology started in the early 1960s with 

the aim of trying to mimic human vision systems and to ask 

computers to tell us what they see. 

Computers ‘see’ the world differently from humans 

• They capture an image as an array of pixels 

• Borders between objects are discerned by measuring 

shades of colour 

• Spatial relations between objects can be estimated. 

3D models and representations of the environment from 

2D images began to be developed. Research continued by 

developing ways to analyse real world images which led to 

techniques such as edge detection and segmentation. These 

were the foundations for low-level scene understanding and 

steps towards automating the process of image analysis. 

1970s 

The 1970s saw the first commercial application of computer 

vision technology, which was an optical character recognition 

program. Combined with text-to-speech technology it provided 

the first print-to-speech reading machine for the blind. 

1980s 

In 1980 the precursor of modern convolutional neural 

networks was developed. As neural networks evolved 

throughout the 1980s, algorithms started to be programmed 

to solve individual challenges 

2000s 

Face detection in real-time was first developed in 2001, by 

Viola & Jones and was the first object detection framework to 

successfully perform in real time. 

Robot cars 

tested on 

roads by Google 

in 2010 

2010s 

• Hardware technology evolution 

Throughout the 2010s, single board computers with 

increasingly powerful GPUs, FPGAs and mobile hardware 

platforms have been designed, built and adapted to accelerate 

machine learning based computer vision algorithms. 

Increased power and efficiency at lower costs have allowed 

breakthroughs in using machine learning for computer vision 

and deployment is increasing at an exponential rate. 

• Sensor technology developments 

Advancements are also happening rapidly in many areas 

beyond conventional camera sensors. For example, infrared 

sensors and lasers combine to sense depth and distance, 

which are one of the critical enablers of self-driving cars and 

3D mapping applications. 

• Data generation 

One of the driving factors behind the growth of computer 

vision is the amount of data generated which can be used to 

create datasets to train and make computer vision better. 

d. Deep learning breakthrough 

Although computer vision techniques started in the late 1950s 

and many of the machine learning algorithms were developed 

in the 1980s, computer vision has grown exponentially 

in the last decade due to the increased computational 

power offered by processing chips, cloud technologies and 

other advancements. Alongside the dedicated hardware 

developments, in recent years, the emergence of deep learning 

algorithms has reinvigorated computer vision. Throughout 

the 2010s, computer performance, accelerated by graphics 

processing units (GPUs), have grown powerful enough for us to 

realise the capabilities of neural network algorithms. 

Computer vision has 

grown exponentially 

in the last decade 

5

1950: 


emerges 1957: 

pixel invented, first digital 

image 

1966: 

MIT Artificial 

intelligence lab 

1969: 

CCD invented 

1990s: 

computer graphics 

& computer vision 

(image morphing, 

view interpolation, 

panoramic image 

stitching) 

2001: 

real-time face 

detection 

1970s: 

first commercial 


application (OCR) 

1975: 

first commercial digital 

camera 

1980s: 

mathematical and 

quantitative analysis 

developments 

2012 

AlexNet, deep neural 

network for image 

recognition 

1990s: 

projective 3D reconstructions, 

stereo imaging, statistical 

learning techniques for facial 

recognition 

2010s: 

GPUs/neural networks 

6

2 Application examples 

a. 

Smart 

homes 

Computer 

vision-based user data 

will increasingly become 

a feature of the home. 

When systems can 

detect and recognise 

objects, they can deliver 

smart actions according 

to what they were 

programmed 

to do 

Facial recognition 

will be used to 

unlock the door, or 

to remain locked if 

an unfamiliar person 

approaches 

Indoor security 

cameras will 

send an alert to 

a smartphone if 

an elderly family 

member falls, or if a 

toddler is climbing 

up stairs 

Person detection 

can be used to adjust 

lighting and temperature 

to the number of people 

in any room to ensure a 

comfortable 

environment and save 

on electricity and a TV 

box that recognises 

individuals can turn on a 

tailored interface for 

entertainment 

b. 

Smart 

cities 

Smart cities employ 

a combination of 

low power sensors, 

cameras and 

machine learning 

software to monitor 

the efficient working 

of the city 

Computer vision 

and related 

technologies 

can play a significant 

role in managing 

smart cities as they 

serve as the ‘eyes’ of 

the city 

Smart city 

applications include 

monitoring of traffic 

and pedestrian 

flows using energy 

efficient, intelligent 

street lighting 

with ambient light 

sensors 

Smart parking 

systems could also 

direct motorists to 

a free parking spot 

c. 

Industry 

Computer vision can be 

combined with methods 

and technologies to 

provide applications 

in industry. Computer 

vision used in this field is 

often referred to as 

‘machine vision’ 

Automated 

applications such as 

package inspection, 

barcode reading, 

3D inspection, 

track and trace are 

commonly used 

Machine vision 

combined with 

robotics provides 

applications 

such as product 

and component 

assembly 

Predictive 

maintenance and 

defect reduction 

also typically use 

machine vision 

technology 

d. 

Healthcare 


applications in 

healthcare have been 

developed to aid 

healthcare professionals 

with medical imaging 

diagnosis, surgery and 

health monitoring 

These can be used 

to detect if elderly 

people have fallen 

or require other 

forms of assistance 

Healthcare 

robotics can 

help with assisting 

nurses to clean 

hospitals 

Robots will need 

to be able to 

navigate the world 

around them 

through 3D computer 

vision 

7

The agriculture 

industry is 

increasingly using 


technology for 

applications 

This can help with 

better productivity, 

crop monitoring, 

precision agriculture 

and locating weeds 

and pests 

The quality of food 

products can be 

assessed and graded 

into specific grades, 

while detecting 

defects 

Properties such as 

colour, shape, size, 

surface defects and 

contamination can 

also be estimated 

e. 

Agriculture 

Intelligent scene 

monitoring systems 

are playing an 

increasingly 

significant role in 

society 

Examples include 

Automatic Number 

Plate Recognition 

(ANPR), people and 

vehicle tracking, 

crowd analysis and 

zone detection for 

health & safety 

Cameras can be 

placed in offices, 

hospitals, banks, 

ports, car parks, 

stadiums, shopping 

centres, airports 

and more 

The challenge 

is to identify the 

scene and context, 

understanding what 

demands immediate 

attention, what is 

valuable and what 

can be ignored 

f. 

Security 

Self-driving vehicles 

can be made 

intelligent, self-reliant 

and reliable using 


technology 


technology is 

being applied 

to autonomous 

vehicles to make it 

safe for passengers 

and pedestrians 

Self-driving vehicles 

must be able to 

capture visual data in 

real time to create 3D 

maps to understand 

the surroundings, 

while detecting and 

classifying objects 

in their path such 

as traffic lights and 

pedestrians 

High quality 

images and videos 

must be obtained 

in low light 

conditions as well 

as daylight, using 

LiDAR sensors and 

thermal cameras 

alongside visible 

camera sensors 

g. 

Autonomous 

vehicles 

Computer vision aids 

virtual reality with 

vision capabilities like 

SLAM (Simultaneous 

localisation and 

mapping), user body 

tracking and gaze 

tracking 

Computer visionbased 

AR overlays 

imagery or audio 

onto existing realworld 

scenery 

AR/VR applications in 

e-commerce allow 

the user to visualise 

products within their 

homes or virtually try 

on clothes to find the 

perfect fit 

AR/VR applications 

in the healthcare 

industry empower 

professionals to 

provide better 

diagnosis and make 

surgery safer 

h. 

AR/VR and 

immersive 

technologies 

8

3 How is computer vision 

used in business? 

Facial recognition 

Financial institutions 

Autonomous 

vehicles 

Medicine 

There are a huge 

range of applications 

where the ability to 

extract meaning from 

‘seeing’ visual data 

is useful 

Manufacturing 

Agriculture 

Digital marketing 

Handwriting extraction 

and analysis 

a. Benefts for business, industry and society 

Computer vision has the potential to revolutionise many 

everyday aspects of our lives. Having the ability to see and 

interpret a scene reliably and without tiring, computer vision 

systems automate tasks without needing human intervention. 

As a result, business users can have benefits such as 

• Faster and simpler processes – computer vision systems 

can carry out monotonous, repetitive tasks at a faster rate, 

making the entire process simpler 

• Accurate outcomes – computer vision systems can 

provide high quality image processing capabilities 

• Cost-reductions – errors and therefore faulty products 

or services can be minimised, so companies can save 

a lot of money that would otherwise be spent on 

fixing flawed processes and products 

b. Technical challenges 

There is a high level of technical understanding 

required to create software that collects and interprets 

visual data. To train a computer vision system 

powered by machine learning, companies need to have a 

team of professionals with technical expertise. 

Companies may also need to have a dedicated team 

for regular monitoring and evaluation of the vision 

system performance. 

c. Privacy 

Privacy is the biggest social threat that computer vision poses. 

The capabilities of computer vision – identification, recognition, 

tracking and real-time analysis – impact directly with individual 

rights for privacy. With computers learning from thousands and 

thousands of images and videos, computers are getting better 

at recognising individuals by their facial features, by identifying 

their behaviour or monitoring their habits. Everyone’s 

information is stored on a cloud. 

It is important to understand the potential negative effects of 

computer vision applications on society. This is crucial to ensure 

that computer vision applications make our lives more comfortable 

and efficient and not for purposes of constrain and control. 

9

4 How to set up a computer 

vision system 

Almost everyone has experienced computer vision and machine learning, often without even knowing. 

This section explains how to set up a computer vision system. 

a. Basic components 

The components of a standard computer vision system are: 

• Digital camera/image sensor 

At the heart of any camera is the sensor. Modern sensors 

are solid-state electronic devices containing up to millions 

of discrete photodetector sites called pixels. 

• Lighting devices 

Many computer vision systems are optimised by illuminating 

the scene to be captured, and may require filters to enhance 

the sensor characteristics. 

• Lens 

To focus or enhance the scene 

• Frame grabber 

To capture individual frames 

• Image processing software 

To analyse the captured scene 

• Machine learning algorithms 

For pattern recognition 

b. Hardware platforms 

CPU 

GPU 

The central processing unit of a computer used to 

perform arithmetic computations. Most modern CPUs 

have 2 to 256 cores. 

The graphics processing unit of a computer used to 

process graphics. GPUs start at a couple of hundred cores 

FPGA 

and can run in to the thousands. The greater number of 

cores allows multiple calculations to be worked on at the 

same time which allows image processing to be 

performed efficiently. 

Field programmable gate arrays have parallel processing 

capabilities which make them suitable for image processing. 

c. Software tools 

There are many software tools with the necessary techniques to perform image and video processing tasks as well as machine 

learning algorithms. 

CPU 

GPU 

• OpenCV • Scilab • Octave • R • Matlab • Tensorflow • PyTorch • Keras • Caffe 

d. Digital imaging system stack 

Software 

processing 

6 

5 

Visualisation and reproduction 

Image post-processor 

Viewing image in visual format 

Image data optimisation 

Presentation 

4 

Image storage 

Formatting and storing image data 

Numeric 

presentation 

Hardware 

processing 

3 

2 

Digital signal processor 

Sensor 

Manipulation of digital signal 

Converting light to electrical signal 

1 

Optics 

Gathering Light 

Light 

10

5 How to process and 

interpret images 

a. Image as an array 

A digital image is an array of pixels where each pixel is a 

combination of numerical values representing the colours 

and intensities at a particular point on the image. 

A pixel, or picture element, is the smallest visual element of an 

image and typically contains three component intensities, or 

channels, such as red, green and blue. Colour digital images 

are created by combining the channels to reproduce the broad 

range of colours seen by the human eye. 

A grayscale image refers to the number of different shades, or 

depth, of a particular colour. A grayscale image can be created 

from any single channel or colour of the image. 

The image resolution gives the number of pixels and the aspect 

ratio gives the width:height pixel ratio. 

The channel contains the number of samples per point, which for 

grayscale images is a single sample per pixel, whereas for colour 

images is three samples per pixel (red, green, blue). 

Pixel 

Smallest visual element 

Digital Image 

A multidimensional array 

of numbers 

Aspect Ratio 

Width:Height 

Resolution 

Width x Height 

Channel: No. of samples per point 

Single plane: Grayscale / B&W images 

10 

9 

15 

32 

10 

65 

90 

32 

85 

23 

65 

54 

16 

70 

96 

99 

90 

43 

60 

85 

87 

85 

65 

32 

28 

56 

67 

70 

96 

92 

90 

43 

99 

85 

87 

65 

43 

56 

67 

96 

92 

43 

99 

87 

78 

67 

92 

99 

Three Planes: Colour images 

©maxEmbedded.com2012 

11

. Image processing 

The main purpose of image processing is to improve the 

quality of the image by sharpening and restoration; extract 

the features of an image to help discriminate objects and/or 

classes of objects; classify objects, locate their position and 

get an overall understanding of the scene. 

Standard methods and algorithms include edge detection, 

corner detection, blobs, correlation and thresholding. 

These techniques are used to extract as many features from 

images of a specific class of object (e.g., bicycles, horses, 

etc.) and treat those features as a ‘definition’ of the object. 

These ‘definitions’ are then searched for in other images. If 

a significant number of features from one type of object are 

found in another image, the image can then be classified as 

containing that specific object (bicycle, horse, etc.). 

When the number of classes go up or the image clarity goes 

down, traditional computer vision algorithms find it harder 

to cope and machine learning techniques become more 

suitable. 

The main steps for image processing are: 

Image acquisition 

Captures the image with a sensor or camera and convert it into 

a manageable format 

Image enhancement 

The input image quality is enhanced and important 

details extracted 

Image restoration 

Any corruption such as blur, noise, or camera misfocus 

is removed to get a cleaner image 

Colour image processing 

The coloured images are processed with RGB or other 

colour space methods 

Image compression and decompression 

To allow for changes in image resolution and size, to 

reduce or restore images depending on the requirement 

Morphological processing 

Defines the object structure and shape in the image 

Feature extraction 

For a particular object, the specific features are identified 

in the image and techniques like object detection are used 

Representation and description 

Store and visualise the processed data with a suitable file 

format and output 

c. Machine learning 

Machine learning uses patterns in large data to perform tasks 

without being explicitly told what to do. 

There are mainly three different ways machines can learn: 

Supervised learning algorithms 

These are designed to learn by example. When training a 

supervised learning algorithm, the training data will consist 

of inputs paired with the correct outputs. During training, the 

algorithm will search for patterns in the data that correlate 

with the desired outputs. After training, a supervised learning 

algorithm will take in new unseen inputs and will determine 

which label the new inputs will be classified as, based on prior 

training data. The objective of a supervised learning model is 

to predict the correct label for newly presented input data. 

Unsupervised learning 

Give the machine unlabelled data and it will find patterns in 

the data. The algorithm will pick up the difference between 

objects as they find logical patterns. 

Reinforcement learning 

The algorithm is trained in a reward and punishment 

mechanism. The agent is rewarded for correct moves and 

punished for the wrong ones. In doing so, the algorithm tries 

to minimize wrong moves and maximize the right ones. 

Unsupervised 

Supervised 

Machine 

learning 

• No labels 

• No feedback 

• ‘Find hidden structure’ 

• Labelled data 

• Direct feedback 

• Predict outcome/future 

Reinforcement 

• Decision process 

• Reward system 

• Learn series of actions 

12

d. Deep learning 

Deep learning is a special subset of machine learning and has 

revolutionised computer vision. Many problems that once 

seemed improbable to be solved are solved to the point 

where machines are getting better results than humans. 

Deep learning introduced the concept of end-to-end learning 

where the machine is just given a dataset of images which 

have been annotated with what class of object is present in 

each image. 

e. Choosing machine learning or deep learning 

Classic computer vision analysis excels at measurements, 

finding defects or matching patterns. It is the ideal solution 

for repeatable dimension measurements of an object in a 

controlled environment, such as examining machined parts or 

printed circuit boards. Traditional techniques work very well in 

constrained environments. However they don’t handle novel 

situations very well. 

In comparison, machine learning is trainable, and as it gains 

access to a wider data set, it’s able to locate, identify and 

segment a wider number of objects or faults with more 

variable appearance or perspective, such as identifying and 

counting foods such as broccoli on a conveyor belt. 

Breakthroughs in the field of artificial neural networks in recent 

years have driven companies across industries to implement 

deep learning solutions, from chatbots in customer service 

to image and object recognition in retail, and many more. 

Deep learning has unlocked a myriad of sophisticated new AI 

applications. 

The performance of deep learning algorithms with complex 

tasks have made it particularly appealing as a solution. 

However, it is not always the best approach to computer 

vision and machine learning related problems. Deep learning 

methods are ideal for replacing human eyes for object 

classification problems, or to emulate expertise by interpreting 

images such as medical x-rays. Deep learning algorithms take 

a long time to train however, requiring a lot of code compared 

to relatively few lines of classic computer vision code. 

While each use case is unique and will depend on business 

objectives, AI maturity, timescale, data and resources, among 

other things, are general considerations to take into account 

before deciding whether or not to use deep learning to solve 

a given problem. 

f. Image processing libraries 

OpenCV 

The Open Source Computer Vision Library (OpenCV) 

is one of the most popular computer vision libraries that 

provides many algorithms and functions. It includes 

modules such as image processing, object detection and 

deep learning to name just a few. 

The library is written in C++ and supports C++, Java, Python 

and MATLAB interfaces. 

Scilab An open source software similar to MATLAB, 

with a computer vision and image processing module. 

Octave An open source software also similar to MATLAB, 

with a computer vision and image processing module. 

R An open source data analysis library with packages for 

image processing. 

13

g. Machine learning frameworks 

Computers learn by viewing thousands of labelled images to understand the traits of what’s being visualised. They learn to 

associate characteristics they detect in the images with each label. This method of machine learning means that the same 

principle can be applied to diverse areas such as: 

• Evaluating the quality of packages in a factory 

• Diagnosing organ function from an MRI scan 

• Identifying trends in the stock market 

• Locating traffic signs and many more. 

There are a great variety of free open-source tools to help to get started with machine learning tasks. 

TensorFlow 

An open-source platform for machine learning created by 

Google. It has a comprehensive, flexible ecosystem of tools, 

libraries and community resources that lets researchers push the 

state-of-the-art in ML and developers easily build and deploy 

ML-powered applications. Tensorflow works best for image 

classification, image recognition, image segmentation, image 

to image translation. Tensorflow includes a set of libraries for 

creating and training custom deep learning models and neural 

networks. Tensorflow supports several popular programming 

languages, including C++, Python, and Java. 

PyTorch 

A Python based scientific computing package that uses the 

power of GPUs, currently one of the preferred deep learning 

research platforms built to provide maximum flexibility and 

speed. 

Keras 

Keras is an open-source Python library for creating deep 

learning models. It’s a great solution for those who only begin 

to use machine learning algorithms in their projects as it 

simplifies the creation of a deep learning model from scratch. 

Accord.NET 

Accord also includes a .Net machine learning framework 

combined with audio and image processing libraries written in 

C#. It is a good framework for both creative and general tasks. 

The image processing algorithms can be used for tasks such 

as face recognition, image joining, or tracking moving objects. 

Accord also include libraries that provide a more traditional 

range of machine learning functions starting from neural 

networks and ending with decision tree systems. 

Caffe 

Convolutional Architecture for Fast Feature Embedding (Caffe) 

is an open-source framework that can be used for creating and 

training popular types of deep learning architectures. Caffe is 

good for tasks such as image classification, segmentation and 

recognition. Caffe is written in C++ but it also has a Python 

interface. 

Google Colab 

Google Colaboratory, or simply Colab, is one of the top 

image processing services. While it’s a cloud service rather 

than a framework, it can still be used for building custom 

deep learning applications from scratch. Tasks such as image 

classification, segmentation and object detection can be 

performed. Google Colab offers free usage of both CPU- and 

GPU-based acceleration. 

NVIDIA DeepStream SDK 

To build and deploy AI-powered Intelligent Video Analytics 

apps and services. DeepStream offers a multi-platform 

scalable framework with TLS security to deploy on the edge 

and connect to any cloud. https://developer.nvidia.com/ 

deepstream-sdk 

Computer vision developments are evolving very quickly with new frameworks being written, new networks and datasets being 

released and new chips being designed at increasing pace. There are many more frameworks and platforms available, both opensource 

or subscription. Picking the right framework for the machine learning application is an important step of project development. 

14

6 Embedded vision 

Computer vision systems have traditionally relied on a PC due to the processing power required to perform image analysis. 

A frame grabber or interface card sends image data from the camera to the computer which then analyses the images and 

relays information to another part of the system. These systems can be bulky or complex, however they offer good 

performance specifications. 

The industry now is using more and more single-board 

computers and camera electronics have also become smaller. 

New camera and computer systems for applications 

are now 

• Highly compact 

• Powerful 

• Low-cost 

• Large memory 

• Energy-efficient 

Driven by the need to integrate small cameras into mobile 

phones, embedded vision technology advances are now at 

the stage where it is practical to incorporate computer vision 

capabilities almost anywhere. 

Embedded vision systems are usually easier to use and 

integrate than PC-based systems. They often only include a 

small camera without a housing connected to a processing 

board (embedded board/module) via a connector. The 

components are combined into one device and images 

sent from the camera are processed directly on the system’s 

processing board. 

a. Embedded vision platforms 

There are many popular devices that are commonly used for running computer vision algorithms 

Provider Board CPU GPU RAM Price 

Raspberry Pi Zero / Zero W 1GHz, Single Core - 2,4 or 8GB $5 and $10 

NVIDIA Jetson Nano Quad-core ARM A57 128-core 4GB 64-bit $99 

Maxwell GPU 

Raspberry Pi RPi 4 Quad core Broadcom 1, 2 or 4GB $35 - $55 

Cortex-A72 

VideoCore VI 

Google Coral dev board NXP i.MX 8M SOC Integrated GC7000 

(quad Cortex-A53, Lite Graphics 

Cortex-M4F) 

Seeed Studio Rock Pi N10 Dual Cortex-A72, Mali T860MP4 4/6/8GB $99 - $169 

1.8GHz, quad 

Cortex-A53 

Cheapest: Raspberry Pi Zero / Zero W 

Best for beginners: Raspberry Pi 4 

Best flexibility: NVIDIA Jetson Nano Dev Kit 

Best for machine learning with Tensorflow: Google Coral Dev Board 

15

. Camera modules 

As image sensor components become smaller, cheaper and more efficient, the range of applications they can be 

applied to increases. 

• Image quality, with true colours, clear contrast and resolution as important factors 

• Easy operation and prototyping capabilities, often with a development kit and plug and play interfaces 

• Easy system integration with well-defined interfaces and software protocols 

c. Interfaces 

Choosing the right interface is crucial for any imaging application. Understanding the applications requirements in terms of 

resolution, frame rates, transfer speed requirements, among others, will determine the best interface to use. A comparison of 

popular digital camera interfaces is shown in the table below. 

Comparison of popular digital camera interfaces 

Interface FireWire 1394.b Camera Link® USB 2.0 USB 3.0 GigE 

Data Transfer Rate 800 Mb/s 3.6 Gb/s 480 Mb/s 5Gb/s 1000 Mb/s 

Max Cable Length 100m 10m 5m 3m 100m 

No. devices Up to 63 1 Up to 127 Up to 127 Unlimited 

Connector 9pin-9pin 26pin USB USB Rj45/Cat53 or 6 

Capture board Optional Required Optional Optional Not required 

Power Optional Required Optional Optional Required 

Source: Edmund Optics: https://www.edmundoptics.co.uk/knowledge-center/application-notes/imaging/camera-types-and-interfaces-for-machine-vision-applications/ 

16

7 Computer vision & IoT 

Connecting computer vision systems to the Internet of Things 

(IoT) creates a powerful network capability. Being able to 

identify objects from cameras allows the local node to be 

more intelligent and have greater autonomy, thus reducing 

the processing load on central servers and allowing a more 

distributed control architecture. 

Devices such as smartphones and IoT sensors are generating 

data that needs to be analysed in real time using machine 

learning or used to train deep learning models. However, 

machine learning inference and training require substantial 

computational and memory resources to run quickly. 

Edge computing, where computer nodes are placed close to 

end devices, is a viable way to meet the high computation and 

low-latency requirements of deep learning on edge devices 

and also provides additional benefits in terms of privacy, 

bandwidth efficiency and scalability. 

a. Cloud vs edge processing 

Computer vision tasks typically require fast processing capabilities – particularly for real-time image and scene understanding. 

Vision processing in the cloud 

To use cloud resources, data must be moved from the data source location on the network edge (i.e. camera modules, 

smartphones) to a centralised location on the cloud using a remote server or data centre. Moving data from the source to the 

cloud can introduce several challenges 

Latency 

There is a time lag between the collection and processing of data in the cloud, which is unnoticeable in many use cases. 

However, in time-sensitive applications, this time lag, which may only be milliseconds, becomes essential. Real-time inference 

is critical to many applications such as autonomous vehicles or voice-based assistance solutions. Sending data to the cloud for 

inference or training may incur delays from the network. 

Scalability 

Sending data from the sources to the cloud consumes significant bandwidth, which in turn increases data processing and 

transfer times, introducing scalability issues, as network access to the cloud can become a bottleneck as the number of 

connected devices increases. 

Privacy 

Sending data to the cloud risks privacy concerns from users who own the data or whose behaviours are captured in the data. 

Users may be wary of uploading sensitive information to the cloud and how an application may use that data. 

17

Vision processing at the edge 

Cloud processing is not ideal for real time and mission-critical applications. Once sensors detect an anomaly in a high volume 

continuous manufacturing process, for example, the system must take corrective action immediately, otherwise the defect will 

propagate. The time from detection to correction must be in seconds. 

In the case of a self-driving car, the response time must be in milliseconds. For these applications, the round trip from device to 

gateway to the cloud and back takes too long. A different architecture is needed where the data collection and processing are 

closer to the devices (or edge). 

Edge processing moves the computer, storage and networking closer to the source of the data, significantly reducing travel time 

and latency. Embedded smart devices enable more sophisticated processing at the sensor level. 

Key to this has been the introduction of lower-cost, compact embedded boards with processing power required for real-time 

image analysis. Placing the processing at the edge of the network allows for real-time results, low power consumption, strong 

privacy and is a viable solution to meet the challenges introduced by cloud processing. Embedded smart devices are ideal for 

repeated and automated robotic processes, such as edge detection in a pick-and-place system. 

b. Cloud platform and machine learning vendors 

Cloud capabilities and resources for machine learning are increasingly significantly. Today, cloud computing providers increasingly 

offer GPU and FPGA co-processors to accelerate processing workloads, including: 

Amazon Rekognition: https://aws.amazon.com/rekognition/ 

Amazon Sagemaker: https://aws.amazon.com/sagemaker/ 

Google Cloud Vision API: https://cloud.google.com/vision 

IBM Watson Visual Recognition: https://www.ibm.com/uk-en/ 

cloud/watson-visual-recognition 

Microsoft Azure Computer Vision API: https://azure.microsoft. 

com/en-gb/services/cognitive-services/computer-vision/ 

c. Machine vision and IoT 

Machine vision systems, which is a general term for computer 

vision used for industrial applications, connected to the IoT can 

create a powerful network capability. Allowing the local node 

to be more intelligent and have greater autonomy, reducing 

the processing load on central servers, can provide efficient 

operations and open up a wide range of applications, 

offering valuable insights into the operation of industrial 

systems. This in turn is opening up new ways of monitoring 

equipment and connecting autonomous robotic systems 

to the IoT infrastructure. 

18

8 Implementing 


As the demand for intelligent vision solutions grows, tools must integrate computer vision, processing, analytics, 

machine learning and connectivity into applications to help translate visual data into meaningful insights. 

a. Your frst prototype 

To develop a computer vision prototype, there are many important considerations to be made regarding choices 

of hardware and software to suit the application requirements. A brief summary of the main areas are: 

Cameras 

• Image sensor performance 

• Camera features and characteristics 

• Data rate/transfer 

• Camera/PC interfaces 

Optics 

• Focal length 

• Field of view 

• Magnification 

• Image quality 

Illumination 

• How to enhance features of interest 

• Monochrome or colour image 

Processing 

• Suitable development environment 

• Software development kits 

• Language flexibility 

• Single processor/multi-core support 

• Image processing tools 

• GPU utilisation 

• Cloud-based analytics 

• Memory requirements 

Output/Display 

• Processed image or video 

• Data analytics of object count or location 

• System information or monitoring 

• Angle of illumination 

19

. How CENSIS can help 

CENSIS launched the Vision Lab, a dedicated facility to help 

businesses adopt or deliver innovative computer vision or 

imaging solutions. 

CENSIS is uniquely positioned to help kickstart or accelerate 

businesses’ use of computer vision due to our connections 

with academia and industry and the funding we can bring to 

innovative technology projects. Our in-house technical and 

business development teams can also provide engineering 

support and consultancy. 

The hardware and software we have in the Vision Lab greatly 

develops our technical capabilities in computer vision and 

related fields and includes: 

• Development kits for image sensing, machine learning 

• 3D time-of-flight sensor for industrial machine 

vision applications 

• Machine vision cameras 

• Camera modules for embedded vision 

• MVTec Halcon, a comprehensive standard software 

package for machine vision industries, with capabilities 

in areas such as analysis, matching, measuring, 

identification, 3D vision and deep learning algorithms 

The CENSIS Vision Lab can help SMEs with product 

development or product enhancement around new 

technology, through funding, collaboration, consultancy 

and access to equipment and expertise. 

c. IoT2Go Vision Kit 

CENSIS has created IoT2Go, a series of plug and play IoT 

development kits for organisations to try out an IoT solution in 

their own premises. IoT2Go was developed as part of a Scottish 

Government programme to raise awareness of IoT technologies. 

The kits are quick and easy to set up and can be used by people 

with no technical or coding experience. 

The IoT2Go Vision kit can capture a real-time count of 

people and objects and has image classification and object 

detection demos. 

20

9 Incubators & learning 

resources 

Incubators 

• NVIDIA Inception 

https://www.nvidia.com/en-us/deep-learning-ai/startups/ 

• NVIDIA Deep Learning Institute 

https://www.nvidia.com/en-us/deep-learning-ai/education/ 

• Intel Edge AI Incubator 

https://www.siliconrepublic.com/start-ups/intel-edge-ai-incubator-ireland-computer-vision-start-up 

• Imagimob AI Early Access Program 

https://www.imagimob.com/imagimob-ai-early-access-program 

Learning Resources 

Links to useful online courses,videos and resources: 

• Introduction to Computer Vision on Udacity, free course 

https://www.udacity.com/course/introduction-to-computer-vision--ud810 

• Awesome Computer Vision, a list of resources on Github 

https://github.com/jbhuang0604/awesome-computer-vision 

• Computer Vision course by Subhransu Maji 

https://sites.google.com/view/cmpsci670/lecture-slides 

• Video Tutorial by Alberto Romay 

https://www.youtube.com/playlist?list=PL7v9EfkjLswLfjcI-qia-Z-e3ntl9l6vp 

21

10 The computer vision 

community in Scotland 

Computer vision research and development in Scotland has a long history 

going back to the 1960s with the Department of Machine Intelligence 

and Perception at the University of Edinburgh. Research robot Freddy, 

built in the 1960s, was one of the earliest systems to integrate perception 

and action. Freddy utilised a heavy robot arm fixed to an overhead gantry 

with adaptive grippers. A binocular vision system was also mounted to 

the gantry. Freddy was able to recognise a variety of objects and could be 

instructed to assemble simple artefacts, such as a toy car, from a random 

heap of components. 

http://www.aiai.ed.ac.uk/project/freddy/ 

The Department of Machine Intelligence and Perception, later the 

Department of Artificial Intelligence, was the forerunner to both the Turing 

Institute in Glasgow, formed in 1983 and developed to combine research 

in AI with technology transfer to industry, and also to the current School of 

Informatics at Edinburgh which has leading research expertise in computer 

vision and machine learning. 

a. Companies in Scotland 

Advances in computer vision and machine learning are making it possible to build exciting new solutions for a range of industrial 

applications. Scotland has a strong base of computer vision companies - a selection is listed below. 

Company City Specialist Areas Link 

Odos Imaging Edinburgh 3D sensing and imaging solutions www.odos-imaging.com 

Peacock Stirling Robotics, automation, image processing, www.peacocktech.co.uk 

Technology 

machine vision 

Five AI Edinburgh Autonomous vehicles www.five.ai 

Machines Edinburgh Highly accurate train positioning system www.machineswithvision.com 

with Vision 

for continuously monitoring track condition 

Optos Dunfermline Retina imaging devices and development www.optos.com 

STMicroelectronics Edinburgh CMOS image sensor development, imaging systems, www.st.com 

Design Centre 

optical engineering, semiconductor solutions for 

autonomous driving and IoT 

NCTech Edinburgh High-resolution 360deg imagery and LiDAR www.nctechimaging.com 

Sense Edinburgh 3D perception systems for mobility, www.sensephotonics.com 

Photonics 

industrial and robotics autonomy 

22

. Research in Scotland 

There is an important and growing imaging and computer vision research community in universities throughout Scotland. 

A selection of research areas and universities are listed in the table below. 

University Research Group Areas of Research Link 

Heriot-Watt Visionlab Robotics, http://visionlab.eps.hw.ac.uk/ 

University 

Automotive driver assistance, 

Surveillance, 

Human behaviour inference, 

Detection & tracking, 

Analysis of shape in 2D, 

Range and LiDAR analysis 

Signal & Image MRI, Ultrasound imaging, Novel imaging https://www.hw.ac.uk/uk/schools/ 

Processing modalities, Imaging techniques in radio engineering-physical-sciences/institutes/ 

Laboratory and optical astronomy sensors-signals-systems/siplab.htm 

Dundee Computer Healthcare and biomedical imaging, https://cvip.computing.dundee.ac.uk/ 

University Vision & Image Visual perception of people and places. 

Processing 

The University Edinburgh Centre Virtual reality environments https://www.edinburgh-robotics.org/ 

of Edinburgh for Robotics 

Machine Iconic vision in 2D http://www.ipab.inf.ed.ac.uk/mvu/ 

Vision Unit 

Institute of Statistical machine learning, computer https://www.inf.ed.ac.uk/research/ipab/ 

Perception, Action vison, mobile and humanoid robotics, 

and Behaviour motor control, graphics and visualisation 

University Computer Vision 3D vision systems https://www.gla.ac.uk/schools/computing/ 

of Glasgow & Autonomous research/researchsections/ida-section/ 

Systems 

computervisionandautonomoussystems/ 

Computer Human body modelling in 3D http://www.dcs.gla.ac.uk/cvg/ 

Vision & Graphics 

University Vision & Image Text and language processing, http://vip.cs.stir.ac.uk/ 

of Stirling Processing Special visual perception 

Interest Group 

Glasgow School of Neural networks applied to condition https://www.gcu.ac.uk/cebe/ 

Caledonian Computing, monitoring 

University Engineering & 

Built Environment 

Glasgow School of Real-time 3D visualisation, interaction http://www.gsa.ac.uk/research/research- 

School of Art Simulation and technologies centres/school-of-simulation-and- 

Visualisation 

visualisation/ 

SINAPSE Scottish Imaging Medical Imaging – MRI, PET, SPECT, EEG, http://www.sinapse.ac.uk/ 

Network 

deep learning in medical imaging 

A consortium of 

Aberdeen, Dundee, 

Edinburgh, Glasgow, 

St Andrews, Stirling, 

Strathclyde 

23

Glossary 

TERM 

MEANING 

AI 

Artificial Intelligence 

CMOS 

Complementary Metal-Oxide Semiconductor 

CPU 

Central Processing Unit 

FPGA 

Field Programmable Gate Array 

GPU 

Graphics Processing Unit 

IoT 

Internet of Things 

IoT2Go 

CENSIS IoT starter kit 

LiDAR 

Light Detection And Ranging 

SDK 

Software Development Kit 

SME 

Small and Medium Enterprises 

Join our 

community at 

censis.org.uk 

24

CENSIS is the centre of excellence for sensor and imaging 

systems (SIS) and Internet of Things (IoT) technologies. 

We help organisations of all sizes explore innovation 

and overcome technology barriers to achieve business 

transformation. 

As one of Scotland’s Innovation Centres, our focus is not 

only creating sustainable economic value in the Scottish 

economy, but also generating social benefit. Our industryexperienced 

engineering and project management teams 

work with companies or in collaborative teams with university 

research experts. 

We act as independent trusted advisers, allowing organisations 

to implement quality, efficiency and performance 

improvements and fast-track the development of new 

products and services for global markets. 

Contact details: 

CENSIS 

The Inovo Building 

121 George Street 

Glasgow 

G1 1RD 

Contact details: 

Contact CENSIS details: 

Tel: 0141 330 3876 

Email: info@censis.org.uk 

The Inovo Building 

CENSIS 


The Glasgow Inovo Building 

G1 1RD 


Glasgow 

Tel: 0141 330 3876 

Email: info@censis.org.uk 

G1 1RD 

Tel: 0141 330 3876 

Email: info @censis.org.uk 

Join the CENSIS mailing list at www.censis.org.uk 

Join the CENSIS mailing list at: www.censis.org.uk 

Join the CENSIS mailing list at www.censis.org.uk 

Follow us us on on us Twitter Twitter on Twitter 

@CENSIS121 

@CENSIS121 

Interest in how 

machines ‘see’ 

and how computer 

vision can be used 

is growing. 

20.9.v1.Vis

Getting started with Computer Vision

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?