Slides - Tamara L Berg

Advanced Multimedia

• HW2 due today. Questions?

The image cannot be displayed. Your computer may not have enough memory to open 

the image, or the image may have been corrupted. Restart your computer, and then 

open the file again. If the red x still appears, you may have to delete the image and 

then insert it again. 

Source: L. Lazebnik

• To perceive the story behind the picture 

What we see What a computer sees 

Source: S. Narasimhan

Source: C. Fowlkes

Source: C. Fowlkes

Controlling processes (e.g. an industrial robot or an 

autonomous vehicle). 

Detecting events (e.g. for visual surveillance). 

Organizing information (e.g. for indexing and retrieval from 

collections of images and videos). 

Modeling objects or environments (e.g. industrial inspection, 

or medical image analysis). 

Interaction (e.g. as the input to a device for human 

computer interaction). 

Source: L. Lazebnik

• To perceive the story behind the picture 

• What exactly does this mean? 

– Vision as a source of metric 3D information 

– Vision as a source of semantic information 

Source: L. Lazebnik

Real-time stereo Structure from motion 

NASA Mars Rover 

Pollefeys et al. 

Multi-view stereo for 

community photo collections 

Goesele et al. 

Source: L. Lazebnik

Vision as a source of semantic information 

slide credit: Fei-Fei, Fergus & Torralba

Object categorization 

sky 

flag 

banner 

bus 

face 

building 

cars 

street lamp 

bus 

wall 


Scene and context categorization 

• outdoor 

• city 

• traffic 

• … 


Qualitative spatial information 

rigid moving 

object 

vertical 

slanted 

horizontal 

non-rigid moving 

object 

rigid moving 

object 


• Vision is useful: Images and video are everywhere! 

Personal photo albums 

Surveillance and security 

Movies, news, sports 

Medical and scientific images 

Source: L. Lazebnik

• Vision is useful 

• Vision is interesting 

• Vision is difficult 

– Half of primate cerebral cortex is devoted to visual processing 

– Achieving human-level visual perception is probably “AI-complete” 

Source: L. Lazebnik

Source: L. Lazebnik

Challenges: viewpoint variation 

Michelangelo 1475-1564 slide credit: Fei-Fei, Fergus & Torralba

Challenges: illumination 

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again. 

image credit: J. Koenderink

Challenges: scale 


Challenges: deformation 

Xu, Beihong 1943 


Challenges: occlusion 

Magritte, 1957 slide credit: Fei-Fei, Fergus & Torralba

Challenges: background clutter 

Source: L. Lazebnik

Challenges: Motion 

Source: L. Lazebnik

Challenges: object intra-class 

variation 


Challenges: local ambiguity 

The image cannot be 

displayed. Your 

computer may not 

have enough 

memory to open the 

image, or the image 

may have been 


Challenges: local ambiguity 

The image cannot be 

displayed. Your 

computer may not 

have enough 

memory to open the 

image, or the image 

may have been 

The image cannot be displayed. Your computer may not have enough memory to open the image, or the 

image may have been corrupted. Restart your computer, and then open the file again. If the red x still 

appears, you may have to delete the image and then insert it again. 





• Images are confusing, but they also reveal the structure of 

the world through numerous cues 

• Our job is to interpret the cues! 

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. 

Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again. 

Image source: J. Koenderink

Source: L. Lazebnik

Source: J. Koenderink

Source: L. Lazebnik




Source: L. Lazebnik

Image credit: Arthus-Bertrand (via F. Durand)

• Perception is an inherently ambiguous problem 

– Many different 3D scenes could have given rise to a particular 2D picture 

Image source: F. Durand

• Perception is an inherently ambiguous problem 

– Many different 3D scenes could have given rise to a particular 2D picture 

• Possible solutions 

– Bring in more constraints (more images) 

– Use prior knowledge about the structure of the world 

• Need a combination of different methods 

Image source: F. Durand

Robotics 

Computer Graphics 

Artificial Intelligence 

Computer Vision 

Image Processing 

Machine Learning 

Psychology 

Neuroscience 

Source: L. Lazebnik

L. G. Roberts, Machine Perception 

of Three Dimensional Solids, 

Ph.D. thesis, MIT Department of 

Electrical Engineering, 1963. 

Source: L. Lazebnik

• Basic image forma4on and processing 




Cameras and sensors 

Light and color 

* 

Feature extrac4on: corner and blob detec4on 

= 

Linear filtering 

Edge detec4on 

source: Svetlana Lazebnik

3D world 2D image 

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may 

have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to 

delete the image and then insert it again. 

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. 

Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the 

image and then insert it again.

Segmenta4on and grouping 


• Separate image into coherent “objects” 

image human segmentation 

Berkeley segmentation database: 

http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/ 

segbench/ 


• Segmenta2on, grouping, perceptual 

organiza2on: gathering features that belong 

together 

• Top-‐down segmenta2on: pixels belong together 

because they come from the same object 

• Bo

• Grouping is key to visual percep4on 

• Elements in a collec4on can have proper4es that 

result from rela4onships 

• “The whole is greater than the sum of its parts” 

subjective contours 

http://en.wikipedia.org/wiki/Gestalt_psychology 

occlusion 

familiar 

configuration 


The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If 

the red x still appears, you may have to delete the image and then insert it again. 

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been 

corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then 

insert it again. 

• These factors make intui4ve sense, but are very 

difficult to translate into algorithms 


Source: K. Grauman

• Divide data points into subsets (clusters) so that the data 

in each subset share some common trait (often proximity 

or appearance) 

• Need some distance/similarity measure

• Want to minimize sum of squared Euclidean 

distances between points x i and their nearest 

cluster centers m k 

Algorithm: 

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red 

x still appears, you may have to delete the image and then insert it again. 

• Randomly ini4alize K cluster centers 

• Iterate un4l convergence: 

• Assign each data point to the nearest center 

• Recompute each cluster center as the mean of all points 

assigned to it 


Source: K. Grauman

• Represent features and their rela4onships using 

a graph 

• Cut the graph to get subgraphs with strong 

interior links and weaker exterior links 


• Node for every pixel 

• Edge between every pair of pixels (or every pair of 

“sufficiently close” pixels) 

• Each edge is weighted by the affinity or similarity 

of the two nodes 

i 

w ij 

j 

Source: S. Seitz

A B C 

• Break Graph into Segments 

• Delete links that cross between segments 

• Easiest to break links that have low affinity 

– similar pixels should be in the same segments 

– dissimilar pixels should be in different segments 

i 

w ij 

j 

Source: S. Seitz

A 

• Set of edges whose removal makes a graph 

disconnected 

• Cost of a cut: sum of weights of cut edges 

• A graph cut gives us a segmenta4on 

B 

Source: S. Seitz



Finding correspondences Clustering and visual vocabularies 

Bag-‐of-‐features models 

Classifica4on 

Sources: D. Lowe, L. Fei-‐Fei

sky 

flag 

banner 

bus 

face 

building 

cars 

street lamp 

bus 

wall 

source: Fei-‐Fei, Fergus & Torralba

source: Svetlana Lazebnik 

Biederman 1987

So what does object recognition involve? 


Verification: is that a lamp? 


Detection: where are the people? 


Identification: is that Potala Palace? 


Object categorization 

tree 

banner 

people 

mountain 

building 

street lamp 

vendor 


Scene and context categorization 

• outdoor 

• city 

• … 


Progress to date 

The next slides show some examples of what 

current vision systems can do 

Source: L. Lazebnik

Optical character recognition (OCR) 

Technology to convert scanned docs to text 

• If you have a scanner, it probably came with OCR software 

Digit recognition, AT&T labs 

http://www.research.att.com/~yann/ 

Also used for zipcode reading by the USPS 

License plate readers 

http://en.wikipedia.org/wiki/Automatic_number_plate_recognition 

Source: S. Seitz

Face detection 

Many new digital cameras now detect faces 

• Canon, Sony, Fuji, … 

Source: S. Seitz

Face Detection for Privacy 

Face Blurring for Google Streetview

Face Detection for Privacy 

Face Blurring for Google Streetview

Smile detection? 



Sony Cyber-shot® T70 Digital Still Camera Source: S. Seitz

Object recognition (in supermarkets) 

LaneHawk by EvolutionRobotics 

“A smart camera is flush-mounted in the checkout lane, continuously watching 

for items. When an item is detected and recognized, the cashier verifies the 

quantity of items that were found under the basket, and continues to close the 

transaction. The item can remain under the basket, and with LaneHawk,you are 

assured to get paid for it… “ 

Source: S. Seitz

Face recognition 

Who is she? Source: S. Seitz

Vision-based biometrics 

“How the Afghan Girl was Identified by Her Iris Patterns” Read the story 

Source: S. Seitz

Login without a password… 

Fingerprint scanners on 

many new laptops, 

other devices 

Face recognition systems now 

beginning to appear more widely 

http://www.sensiblevision.com/ 

Source: S. Seitz

Object recognition (in mobile phones) 

This is becoming real: 

• Microsoft Research 

• Point & Find 

• Google goggles 

Source: S. Seitz

iPhone Apps: (www.kooaba.com) 

Source: L. Lazebnik

iPhone Apps: (www.snaptell.com) 

Source: L. Lazebnik

Special effects: shape capture 

The Matrix movies, ESC Entertainment, XYZRGB, NRC 

Source: S. Seitz

Special effects: motion capture 

Pirates of the Carribean, Industrial Light and Magic 

Source: S. Seitz

Sports 

Sportvision first down line 

Nice explanation on www.howstuffworks.com 

Source: S. Seitz

Smart cars 

Mobileye 

Slide content courtesy of Amnon Shashua 

• Vision systems currently in high-end BMW, GM, Volvo models 

• By 2010: 70% of car manufacturers. 

Source: S. Seitz

Source: C. Fowlkes

Vision-based interaction (and games) 

Nintendo Wii has camera-based IR 

tracking built in. See Lee’s work at 

CMU on clever tricks on using it to 

create a multi-touch display! 

The image cannot be displayed. Your computer may not have enough memory to open the image, 

or the image may have been corrupted. Restart your computer, and then open the file again. If the 

red x still appears, you may have to delete the image and then insert it again. 

Assistive technologies 

Sony EyeToy 

Source: S. Seitz

Vision in space 

NASA'S Mars Exploration Rover Spirit captured this westward view from atop 

a low plateau where Spirit spent the closing months of 2007. 

Vision systems (JPL) used for several tasks 

• Panorama stitching 

• 3D terrain modeling 

• Obstacle detection, position tracking 

• For more, read “Computer Vision on Mars” by Matthies et al. 

Source: S. Seitz

Robotics 

NASA’s Mars Spirit Rover 

http://en.wikipedia.org/wiki/Spirit_rover 

http://www.robocup.org/ 

Source: S. Seitz

Source: C. Fowlkes

Earth viewers (3D modeling) 

Image from Microsoft’s Virtual Earth 

(see also: Google Earth) 

Source: S. Seitz

Photosynth 

Photosynth 

Source: S. Seitz

• A list of companies here: 

http://www.cs.ubc.ca/spider/lowe/vision.html 

Source: L. Lazebnik

Slides - Tamara L Berg

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?