14.08.2013 Views

Slides - Tamara L Berg

Slides - Tamara L Berg

Slides - Tamara L Berg

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Advanced Multimedia


• HW2 due today. Questions?


The image cannot be displayed. Your computer may not have enough memory to open<br />

the image, or the image may have been corrupted. Restart your computer, and then<br />

open the file again. If the red x still appears, you may have to delete the image and<br />

then insert it again.<br />

Source: L. Lazebnik


• To perceive the story behind the picture<br />

What we see What a computer sees<br />

Source: S. Narasimhan


Source: C. Fowlkes


Source: C. Fowlkes


Controlling processes (e.g. an industrial robot or an<br />

autonomous vehicle).<br />

Detecting events (e.g. for visual surveillance).<br />

Organizing information (e.g. for indexing and retrieval from<br />

collections of images and videos).<br />

Modeling objects or environments (e.g. industrial inspection,<br />

or medical image analysis).<br />

Interaction (e.g. as the input to a device for human<br />

computer interaction).<br />

Source: L. Lazebnik


• To perceive the story behind the picture<br />

• What exactly does this mean?<br />

– Vision as a source of metric 3D information<br />

– Vision as a source of semantic information<br />

Source: L. Lazebnik


Real-time stereo Structure from motion<br />

NASA Mars Rover<br />

Pollefeys et al.<br />

Multi-view stereo for<br />

community photo collections<br />

Goesele et al.<br />

Source: L. Lazebnik


Vision as a source of semantic information<br />

slide credit: Fei-Fei, Fergus & Torralba


Object categorization<br />

sky<br />

flag<br />

banner<br />

bus<br />

face<br />

building<br />

cars<br />

street lamp<br />

bus<br />

wall<br />

slide credit: Fei-Fei, Fergus & Torralba


Scene and context categorization<br />

• outdoor<br />

• city<br />

• traffic<br />

• …<br />

slide credit: Fei-Fei, Fergus & Torralba


Qualitative spatial information<br />

rigid moving<br />

object<br />

vertical<br />

slanted<br />

horizontal<br />

non-rigid moving<br />

object<br />

rigid moving<br />

object<br />

slide credit: Fei-Fei, Fergus & Torralba


• Vision is useful: Images and video are everywhere!<br />

Personal photo albums<br />

Surveillance and security<br />

Movies, news, sports<br />

Medical and scientific images<br />

Source: L. Lazebnik


• Vision is useful<br />

• Vision is interesting<br />

• Vision is difficult<br />

– Half of primate cerebral cortex is devoted to visual processing<br />

– Achieving human-level visual perception is probably “AI-complete”<br />

Source: L. Lazebnik


Source: L. Lazebnik


Challenges: viewpoint variation<br />

Michelangelo 1475-1564 slide credit: Fei-Fei, Fergus & Torralba


Challenges: illumination<br />

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.<br />

image credit: J. Koenderink


Challenges: scale<br />

slide credit: Fei-Fei, Fergus & Torralba


Challenges: deformation<br />

Xu, Beihong 1943<br />

slide credit: Fei-Fei, Fergus & Torralba


Challenges: occlusion<br />

Magritte, 1957 slide credit: Fei-Fei, Fergus & Torralba


Challenges: background clutter<br />

Source: L. Lazebnik


Challenges: Motion<br />

Source: L. Lazebnik


Challenges: object intra-class<br />

variation<br />

slide credit: Fei-Fei, Fergus & Torralba


Challenges: local ambiguity<br />

The image cannot be<br />

displayed. Your<br />

computer may not<br />

have enough<br />

memory to open the<br />

image, or the image<br />

may have been<br />

slide credit: Fei-Fei, Fergus & Torralba


Challenges: local ambiguity<br />

The image cannot be<br />

displayed. Your<br />

computer may not<br />

have enough<br />

memory to open the<br />

image, or the image<br />

may have been<br />

The image cannot be displayed. Your computer may not have enough memory to open the image, or the<br />

image may have been corrupted. Restart your computer, and then open the file again. If the red x still<br />

appears, you may have to delete the image and then insert it again.<br />

The image cannot be displayed. Your computer may not have enough memory to open the image, or the<br />

image may have been corrupted. Restart your computer, and then open the file again. If the red x still<br />

appears, you may have to delete the image and then insert it again.<br />

slide credit: Fei-Fei, Fergus & Torralba


• Images are confusing, but they also reveal the structure of<br />

the world through numerous cues<br />

• Our job is to interpret the cues!<br />

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted.<br />

Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.<br />

Image source: J. Koenderink


Source: L. Lazebnik


Source: J. Koenderink


Source: L. Lazebnik


Source: J. Koenderink


The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.<br />

Source: J. Koenderink


Source: L. Lazebnik


Image credit: Arthus-Bertrand (via F. Durand)


• Perception is an inherently ambiguous problem<br />

– Many different 3D scenes could have given rise to a particular 2D picture<br />

Image source: F. Durand


• Perception is an inherently ambiguous problem<br />

– Many different 3D scenes could have given rise to a particular 2D picture<br />

• Possible solutions<br />

– Bring in more constraints (more images)<br />

– Use prior knowledge about the structure of the world<br />

• Need a combination of different methods<br />

Image source: F. Durand


Robotics<br />

Computer Graphics<br />

Artificial Intelligence<br />

Computer Vision<br />

Image Processing<br />

Machine Learning<br />

Psychology<br />

Neuroscience<br />

Source: L. Lazebnik


L. G. Roberts, Machine Perception<br />

of Three Dimensional Solids,<br />

Ph.D. thesis, MIT Department of<br />

Electrical Engineering, 1963.<br />

Source: L. Lazebnik


• Basic image forma4on and processing<br />

The image cannot be displayed. Your computer may not have enough memory to open the image, or the<br />

image may have been corrupted. Restart your computer, and then open the file again. If the red x still<br />

appears, you may have to delete the image and then insert it again.<br />

Cameras and sensors<br />

Light and color<br />

*<br />

Feature extrac4on: corner and blob detec4on<br />

=<br />

Linear filtering<br />

Edge detec4on<br />

source: Svetlana Lazebnik


3D world 2D image<br />

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may<br />

have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to<br />

delete the image and then insert it again.<br />

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted.<br />

Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.


The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the<br />

image and then insert it again.


Segmenta4on and grouping<br />

source: Svetlana Lazebnik


• Separate image into coherent “objects”<br />

image human segmentation<br />

Berkeley segmentation database:<br />

http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/<br />

segbench/<br />

source: Svetlana Lazebnik


• Segmenta2on, grouping, perceptual<br />

organiza2on: gathering features that belong<br />

together<br />

• Top-­‐down segmenta2on: pixels belong together<br />

because they come from the same object<br />

• Bo


• Grouping is key to visual percep4on<br />

• Elements in a collec4on can have proper4es that<br />

result from rela4onships<br />

• “The whole is greater than the sum of its parts”<br />

subjective contours<br />

http://en.wikipedia.org/wiki/Gestalt_psychology<br />

occlusion<br />

familiar<br />

configuration<br />

source: Svetlana Lazebnik


The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If<br />

the red x still appears, you may have to delete the image and then insert it again.<br />

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been<br />

corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then<br />

insert it again.<br />

• These factors make intui4ve sense, but are very<br />

difficult to translate into algorithms<br />

source: Svetlana Lazebnik


Source: K. Grauman


• Divide data points into subsets (clusters) so that the data<br />

in each subset share some common trait (often proximity<br />

or appearance)<br />

• Need some distance/similarity measure


• Want to minimize sum of squared Euclidean<br />

distances between points x i and their nearest<br />

cluster centers m k<br />

Algorithm:<br />

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red<br />

x still appears, you may have to delete the image and then insert it again.<br />

• Randomly ini4alize K cluster centers<br />

• Iterate un4l convergence:<br />

• Assign each data point to the nearest center<br />

• Recompute each cluster center as the mean of all points<br />

assigned to it<br />

source: Svetlana Lazebnik


Source: K. Grauman


• Represent features and their rela4onships using<br />

a graph<br />

• Cut the graph to get subgraphs with strong<br />

interior links and weaker exterior links<br />

source: Svetlana Lazebnik


• Node for every pixel<br />

• Edge between every pair of pixels (or every pair of<br />

“sufficiently close” pixels)<br />

• Each edge is weighted by the affinity or similarity<br />

of the two nodes<br />

i<br />

w ij<br />

j<br />

Source: S. Seitz


A B C<br />

• Break Graph into Segments<br />

• Delete links that cross between segments<br />

• Easiest to break links that have low affinity<br />

– similar pixels should be in the same segments<br />

– dissimilar pixels should be in different segments<br />

i<br />

w ij<br />

j<br />

Source: S. Seitz


A<br />

• Set of edges whose removal makes a graph<br />

disconnected<br />

• Cost of a cut: sum of weights of cut edges<br />

• A graph cut gives us a segmenta4on<br />

B<br />

Source: S. Seitz


The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.<br />

source: Svetlana Lazebnik


Finding correspondences Clustering and visual vocabularies<br />

Bag-­‐of-­‐features models<br />

Classifica4on<br />

Sources: D. Lowe, L. Fei-­‐Fei


sky<br />

flag<br />

banner<br />

bus<br />

face<br />

building<br />

cars<br />

street lamp<br />

bus<br />

wall<br />

source: Fei-­‐Fei, Fergus & Torralba


source: Svetlana Lazebnik<br />

Biederman 1987


So what does object recognition involve?<br />

source: Svetlana Lazebnik


Verification: is that a lamp?<br />

source: Svetlana Lazebnik


Detection: where are the people?<br />

source: Svetlana Lazebnik


Identification: is that Potala Palace?<br />

source: Svetlana Lazebnik


Object categorization<br />

tree<br />

banner<br />

people<br />

mountain<br />

building<br />

street lamp<br />

vendor<br />

source: Svetlana Lazebnik


Scene and context categorization<br />

• outdoor<br />

• city<br />

• …<br />

source: Svetlana Lazebnik


Progress to date<br />

The next slides show some examples of what<br />

current vision systems can do<br />

Source: L. Lazebnik


Optical character recognition (OCR)<br />

Technology to convert scanned docs to text<br />

• If you have a scanner, it probably came with OCR software<br />

Digit recognition, AT&T labs<br />

http://www.research.att.com/~yann/<br />

Also used for zipcode reading by the USPS<br />

License plate readers<br />

http://en.wikipedia.org/wiki/Automatic_number_plate_recognition<br />

Source: S. Seitz


Face detection<br />

Many new digital cameras now detect faces<br />

• Canon, Sony, Fuji, …<br />

Source: S. Seitz


Face Detection for Privacy<br />

Face Blurring for Google Streetview


Face Detection for Privacy<br />

Face Blurring for Google Streetview


Smile detection?<br />

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.<br />

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.<br />

Sony Cyber-shot® T70 Digital Still Camera Source: S. Seitz


Object recognition (in supermarkets)<br />

LaneHawk by EvolutionRobotics<br />

“A smart camera is flush-mounted in the checkout lane, continuously watching<br />

for items. When an item is detected and recognized, the cashier verifies the<br />

quantity of items that were found under the basket, and continues to close the<br />

transaction. The item can remain under the basket, and with LaneHawk,you are<br />

assured to get paid for it… “<br />

Source: S. Seitz


Face recognition<br />

Who is she? Source: S. Seitz


Vision-based biometrics<br />

“How the Afghan Girl was Identified by Her Iris Patterns” Read the story<br />

Source: S. Seitz


Login without a password…<br />

Fingerprint scanners on<br />

many new laptops,<br />

other devices<br />

Face recognition systems now<br />

beginning to appear more widely<br />

http://www.sensiblevision.com/<br />

Source: S. Seitz


Object recognition (in mobile phones)<br />

This is becoming real:<br />

• Microsoft Research<br />

• Point & Find<br />

• Google goggles<br />

Source: S. Seitz


iPhone Apps: (www.kooaba.com)<br />

Source: L. Lazebnik


iPhone Apps: (www.snaptell.com)<br />

Source: L. Lazebnik


Special effects: shape capture<br />

The Matrix movies, ESC Entertainment, XYZRGB, NRC<br />

Source: S. Seitz


Special effects: motion capture<br />

Pirates of the Carribean, Industrial Light and Magic<br />

Source: S. Seitz


Sports<br />

Sportvision first down line<br />

Nice explanation on www.howstuffworks.com<br />

Source: S. Seitz


Smart cars<br />

Mobileye<br />

Slide content courtesy of Amnon Shashua<br />

• Vision systems currently in high-end BMW, GM, Volvo models<br />

• By 2010: 70% of car manufacturers.<br />

Source: S. Seitz


Source: C. Fowlkes


Vision-based interaction (and games)<br />

Nintendo Wii has camera-based IR<br />

tracking built in. See Lee’s work at<br />

CMU on clever tricks on using it to<br />

create a multi-touch display!<br />

The image cannot be displayed. Your computer may not have enough memory to open the image,<br />

or the image may have been corrupted. Restart your computer, and then open the file again. If the<br />

red x still appears, you may have to delete the image and then insert it again.<br />

Assistive technologies<br />

Sony EyeToy<br />

Source: S. Seitz


Vision in space<br />

NASA'S Mars Exploration Rover Spirit captured this westward view from atop<br />

a low plateau where Spirit spent the closing months of 2007.<br />

Vision systems (JPL) used for several tasks<br />

• Panorama stitching<br />

• 3D terrain modeling<br />

• Obstacle detection, position tracking<br />

• For more, read “Computer Vision on Mars” by Matthies et al.<br />

Source: S. Seitz


Robotics<br />

NASA’s Mars Spirit Rover<br />

http://en.wikipedia.org/wiki/Spirit_rover<br />

http://www.robocup.org/<br />

Source: S. Seitz


Source: C. Fowlkes


Earth viewers (3D modeling)<br />

Image from Microsoft’s Virtual Earth<br />

(see also: Google Earth)<br />

Source: S. Seitz


Photosynth<br />

Photosynth<br />

Source: S. Seitz


• A list of companies here:<br />

http://www.cs.ubc.ca/spider/lowe/vision.html<br />

Source: L. Lazebnik

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!