20.08.2013 Views

Study and Implementation of Stereo Vision Systems for ... - Utopia

Study and Implementation of Stereo Vision Systems for ... - Utopia

Study and Implementation of Stereo Vision Systems for ... - Utopia

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Study</strong> <strong>and</strong> <strong>Implementation</strong><br />

<strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong> <strong>Systems</strong><br />

<strong>for</strong> Robotic Applications<br />

Lazaros Nalpantidis<br />

Thesis submitted <strong>for</strong> the degree <strong>of</strong> Doctor <strong>of</strong> Philosophy<br />

Department <strong>of</strong> Production <strong>and</strong> Management Engineering<br />

Democritus University <strong>of</strong> Thrace, Greece<br />

Xanthi, September 2010


Title: <strong>Study</strong> <strong>and</strong> <strong>Implementation</strong> <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong> <strong>Systems</strong> <strong>for</strong> Robotic<br />

Applications<br />

Author: Lazaros Nalpantidis<br />

Thesis submitted <strong>for</strong> the degree <strong>of</strong> Doctor <strong>of</strong> Philosophy<br />

to the<br />

Production <strong>and</strong> Management Engineering Department<br />

Democritus University <strong>of</strong> Thrace, Greece<br />

Advising Committee:<br />

Chairman: Assistant Pr<strong>of</strong>essor Antonios Gasteratos, P.&M.E. Dept., DUTH<br />

Member: Pr<strong>of</strong>essor Vassilios Tourassis, P.&M.E. Dept., DUTH<br />

Member: Associate Pr<strong>of</strong>essor Dimitrios Koulouriotis, P.&M.E. Dept., DUTH<br />

Xanthi, Greece<br />

September 2010


Dedicated to my wonderful parents<br />

Georgios <strong>and</strong> S<strong>of</strong>ia,<br />

<strong>and</strong> to my own "Penelope"<br />

Efi.


Summary - Contribution to the State <strong>of</strong> the Art<br />

<strong>Stereo</strong> vision has been chosen by natural selection as the most common way to estimate the depth <strong>of</strong><br />

objects. A pair <strong>of</strong> two-dimensional images is enough in order to retrieve the third dimension <strong>of</strong> the<br />

scene under observation. The importance <strong>of</strong> this method is great, apart from the living creatures, <strong>for</strong><br />

sophisticated machine systems, as well. During the last years robotics has made significant progress<br />

<strong>and</strong> the state <strong>of</strong> the art is now about achieving autonomous behaviors. In order to accomplish<br />

the target <strong>of</strong> robots being able to move <strong>and</strong> act autonomously, accurate representations <strong>of</strong> their<br />

environments are required. Both these fields, stereo vision <strong>and</strong> accomplishing autonomous robotic<br />

behaviors, have been in the center <strong>of</strong> this PhD thesis. The issue <strong>of</strong> robots using machine stereo<br />

vision is not a new one. The number <strong>and</strong> significance <strong>of</strong> the researchers that have been involved,<br />

as well as the publishing rate <strong>of</strong> relevant scientific papers indicates an issue that is interesting <strong>and</strong><br />

still open to solutions <strong>and</strong> fresh ideas rather than a banal <strong>and</strong> solved issue.<br />

The motivation <strong>of</strong> this PhD thesis has been the observation that the combination <strong>of</strong> stereo vision<br />

usage <strong>and</strong> autonomous robots is usually per<strong>for</strong>med in a simplistic manner <strong>of</strong> simultaneously using<br />

two independent technologies. This situation is owed to the fact that the two technologies have<br />

evolved independently <strong>and</strong> by different scientific communities. <strong>Stereo</strong> vision has mainly evolved<br />

within the field <strong>of</strong> computer vision. On the other h<strong>and</strong>, autonomous robots are a branch <strong>of</strong> the<br />

robotics <strong>and</strong> mechatronics field. Methods that have been proposed within the frame <strong>of</strong> computer<br />

vision are not generally satisfactory <strong>for</strong> use in robotic applications. This fact is due to that an<br />

autonomous robot places strict constraints concerning the dem<strong>and</strong>ed speed <strong>of</strong> calculations <strong>and</strong> the<br />

available computational resources. Moreover, their inefficiency is commonly owed to factors related<br />

to the environments <strong>and</strong> the conditions <strong>of</strong> operation. As a result, the used algorithms, in this case<br />

the stereo vision algorithms, should take into consideration these factors during their development.<br />

The required compromises have to retain the functionality <strong>of</strong> the integrated system.<br />

The objective <strong>of</strong> this PhD thesis is the development <strong>of</strong> stereo vision systems customized <strong>for</strong> use in<br />

autonomous robots. Initially, a literature survey was conducted concerning stereo vision algorithms<br />

<strong>and</strong> corresponding robotic applications. The survey revealed the state <strong>of</strong> the art in the specific<br />

field <strong>and</strong> pointed out issues that had not yet been answered in a satisfactory manner. Afterwards,<br />

novel stereo vision algorithms were developed, which satisfy the dem<strong>and</strong>s posed by robotic systems<br />

<strong>and</strong> propose solutions to the open issues indicated by the literature survey. Finally, systems that<br />

embody the proposed algorithms <strong>and</strong> treat open robotic applications’ issues have been developed.<br />

Within this dissertation there have been used <strong>for</strong> the first time <strong>and</strong> combined in a novel way<br />

various computational tools <strong>and</strong> ideas originating from different scientific fields. There have been<br />

vii


viii<br />

used biologically <strong>and</strong> psychologically inspired methods, such as the logarithmic response law (Weber-<br />

Fechner law) <strong>and</strong> the gestalt laws <strong>of</strong> perceptual organization (proximity, similarity <strong>and</strong> continuity).<br />

Furthermore, there have been used sophisticated computational methods, such as 2D <strong>and</strong> 3D cellular<br />

automata <strong>and</strong> fuzzy inference systems <strong>for</strong> computer vision applications. Additionally, ideas from the<br />

field <strong>of</strong> video coding have been incorporated in stereo vision applications. The resulting methods<br />

have been applied to basic computer vision depth extraction applications <strong>and</strong> even to advanced<br />

autonomous robotic behaviors.<br />

In more detail, the possibility <strong>of</strong> implementing effective hardware-implementable stereo correspondence<br />

algorithms has been investigated. Specifically, an algorithm that combines rapid execution,<br />

simple <strong>and</strong> straight-<strong>for</strong>ward structure, as well as high-quality <strong>of</strong> results is presented. These<br />

features render it as an ideal c<strong>and</strong>idate <strong>for</strong> hardware implementation <strong>and</strong> <strong>for</strong> real-time applications.<br />

The algorithm utilizes Gaussian aggregation weights <strong>and</strong> 3D cellular automata in order to achieve<br />

high-quality results. This algorithm comprised the basis <strong>of</strong> a multi-view stereo vision system. The<br />

final depth map is produced as a result <strong>of</strong> a certainty assessment procedure. Moreover, a new hierarchical<br />

correspondence algorithm is presented, inspired by motion estimation techniques originally<br />

used in video encoding. The algorithm per<strong>for</strong>ms a 2D correspondence search using a similar hierarchical<br />

search pattern <strong>and</strong> the intermediate results are refined by 3D cellular automata. This<br />

algorithm can process uncalibrated <strong>and</strong> non-rectified stereo image pairs, maintaining the computational<br />

load within reasonable levels. It is well known that non-ideal environmental conditions,<br />

such as differentiations in illumination depending on the viewpoint heavily affect the stereo algorithms’<br />

per<strong>for</strong>mance. In this PhD thesis a new illumination-invariant pixels’ dissimilarity measure<br />

is presented that can substitute the established intensity-based ones. The proposed measure can be<br />

adopted by almost any <strong>of</strong> the existing stereo algorithms, enhancing them with its robust features.<br />

The algorithm using the proposed dissimilarity measure has outper<strong>for</strong>med all the other examined<br />

algorithms, exhibiting tolerance to illumination differentiations <strong>and</strong> robust behavior. Moreover, a<br />

novel stereo correspondence algorithm that incorporates many biologically <strong>and</strong> psychologically inspired<br />

features to an adaptive weighted sum <strong>of</strong> absolute differences framework is presented. In<br />

addition to ideas already exploited, such as the color in<strong>for</strong>mation utilization, gestalt laws <strong>of</strong> proximity<br />

<strong>and</strong> similarity, new ones have been adopted. The algorithm introduces the use <strong>of</strong> circular<br />

support regions, the gestalt law <strong>of</strong> continuity, as well as the psychophysically-based logarithmic response<br />

law. All the a<strong>for</strong>ementioned perceptual tools act complementarily inside a straight-<strong>for</strong>ward<br />

computational algorithm.<br />

Furthermore, stereo correspondence algorithms have been further exploited as the basis <strong>of</strong> more<br />

advanced robotic behaviors. <strong>Vision</strong>-based obstacle avoidance algorithms <strong>for</strong> autonomous mobile<br />

robots are presented. These algorithms avoid, as much as possible, computationally complex processes.<br />

The only sensor required is a stereo camera. The algorithms consist <strong>of</strong> two building blocks.<br />

The first one is a stereo algorithm, able to provide reliable depth maps <strong>of</strong> the scenery in frame rates<br />

suitable <strong>for</strong> a robot to move autonomously. The second building block is either a simple decisionmaking<br />

algorithm or a fuzzy logic-based one, which analyze the depth maps <strong>and</strong> deduce the most<br />

appropriate direction <strong>for</strong> the robot to avoid any existing obstacles. Finally, a visual Simultaneous<br />

Localization <strong>and</strong> Mapping (SLAM) algorithm suitable <strong>for</strong> indoor applications is proposed. The<br />

algorithm is focused on computational effectiveness <strong>and</strong> the only sensor used is a stereo camera<br />

placed onboard a moving robot. The algorithm processes the acquired images calculating the depth<br />

<strong>of</strong> the scenery, detecting occupied areas <strong>and</strong> progressively building a map <strong>of</strong> the environment. The<br />

stereo vision-based SLAM algorithm embodies a custom-tailored stereo correspondence algorithm,<br />

the robust scale <strong>and</strong> rotation invariant feature detection <strong>and</strong> matching "Speeded Up Robust Fea-


tures" (SURF) method, a computationally effective v-disparity image calculation scheme, a novel<br />

map-merging module, as well as a sophisticated cellular automata-based enhancement stage.<br />

ix


Preface<br />

"It was as if God had decided to put to the test every<br />

capacity <strong>for</strong> surprise <strong>and</strong> was keeping the inhabitants <strong>of</strong><br />

Macondo in a permanent alternation between excitement<br />

<strong>and</strong> disappointment, doubt <strong>and</strong> revelation, to such an<br />

extreme that no one knew <strong>for</strong> certain where the limits <strong>of</strong><br />

reality lay."<br />

Gabriel Garcia Marquez<br />

"One Hundred Years <strong>of</strong> Solitude"<br />

This quote <strong>of</strong> Gabriel Garcia Marquez could have been a nice <strong>and</strong> brief summary <strong>of</strong> the impressions<br />

left by this (<strong>and</strong> maybe many other) PhD thesis. I could not have had the slightest idea <strong>of</strong> how<br />

expressive these few lines would sound to me, when I first read them. The alteration <strong>of</strong> emotions,<br />

from sunny moments <strong>of</strong> private "glory" <strong>and</strong> excitement to dark moments <strong>of</strong> frustration <strong>and</strong> doubt,<br />

is the predominant impression from a long way <strong>of</strong> ef<strong>for</strong>t. Hopefully, much less than "one hundred<br />

years <strong>of</strong> solitude" were required <strong>for</strong> this work to be concluded! Testing my strength <strong>and</strong> persistence<br />

were the personal reward from this work, which I keep <strong>for</strong> myself. Any scientific knowledge gained,<br />

I share it with the readers <strong>of</strong> this thesis.<br />

The subject <strong>of</strong> this thesis has to do with stereo vision systems <strong>for</strong> robotic applications. Although<br />

I had some experience in designing analog imaging sensors, it was my advisor who encouraged me to<br />

get involved with the specific topic. The importance <strong>of</strong> vision systems is indisputable in fields such<br />

as image processing, computer vision <strong>and</strong> robotics. After all, this thesis is being published while the<br />

latest (2009) Nobel prize in Physics has been awarded to "The masters <strong>of</strong> light" <strong>and</strong> especially by<br />

one half to Willard S. Boyle <strong>and</strong> George E. Smith "<strong>for</strong> the invention <strong>of</strong> an imaging semiconductor<br />

circuit - the CCD sensor". Using <strong>and</strong> exploiting the possibilities <strong>of</strong> such sensors comprises the basis<br />

<strong>of</strong> the fields that this thesis deals with. Even if we are too small to achieve the glory <strong>of</strong> such giants,<br />

we can always st<strong>and</strong> on their shoulders <strong>and</strong> present our own ef<strong>for</strong>ts.<br />

The workflow <strong>of</strong> these years is to a large extent mirrored to the structure <strong>of</strong> this thesis, which<br />

is organized in five chapters. The first chapter is introductory. It presents basic concepts that are<br />

used later in the thesis <strong>and</strong> provides a first contact with the issue <strong>of</strong> stereo vision. The second<br />

chapter bears a literature survey concerning stereo correspondence algorithms, their hardware implementations<br />

<strong>and</strong> robotic applications <strong>of</strong> stereo vision. Furthermore, it presents some open issues <strong>of</strong><br />

robotics-oriented stereo vision, as resulted from the literature analysis. The third chapter presents<br />

novel stereo correspondence algorithms that were developed within this dissertation, as well as<br />

experimental <strong>and</strong> comparative results. The fourth chapter presents robotic applications <strong>of</strong> stereo<br />

vision systems <strong>and</strong> the corresponding experimental results. Finally, the fifth chapter bears the conclusions<br />

that were reached during the course <strong>of</strong> this dissertation, discusses the results <strong>and</strong> describes<br />

further future directions <strong>of</strong> this work.<br />

The completion <strong>of</strong> this thesis would not be feasible without the help <strong>and</strong> support <strong>of</strong> many others.<br />

First <strong>of</strong> all, I would like to thank my advisor <strong>and</strong> chair <strong>of</strong> my advising committee Assistant Pr<strong>of</strong>essor<br />

Antonios Gasteratos. Antonios has been both an academic <strong>and</strong> personal tutor. Our relationship,<br />

which started with my doctoral studies, has grown to become a relationship <strong>of</strong> respect, trust <strong>and</strong><br />

xv


xvi<br />

finally friendship. Antonios has encouraged <strong>and</strong> guided my endeavors. His support, both scientific<br />

<strong>and</strong> moral, was influential <strong>and</strong> motivational <strong>for</strong> me. Antonios has trusted me to be his colleague in<br />

various funded research projects, in which he were scientific responsible. More specifically, he has<br />

enrolled me in the following projects: "<strong>Vision</strong> <strong>and</strong> Chemiresistor Equipped Web-connected Finding<br />

Robots (View-Finder), FP6-IST-2006-045541", "Autonomous Collaborative Robots to Swing<br />

<strong>and</strong> Work in Everyday EnviRonment (ACROBOTER), FP6-IST-2006-045530", "Innovative Novel<br />

First Responders Applications (INFRA), FP7-ICT-SEC-2007-1-225272", all funded by the European<br />

Commission.<br />

I would also like to express my gratitude to the rest two members <strong>of</strong> my advising committee<br />

vice-rector Pr<strong>of</strong>essor Vassilios Tourassis <strong>and</strong> Associate Pr<strong>of</strong>essor Dimitrios Koulouriotis <strong>for</strong> their<br />

interest concerning the progress <strong>of</strong> my ef<strong>for</strong>ts <strong>and</strong> their help whenever an issue concerning my<br />

doctoral studies arose.<br />

A special position among the people that I have met <strong>and</strong> worked with since my arrival to Xanthi<br />

belongs to Assistant Pr<strong>of</strong>essor Georgios Ch. Sirakoulis. Georgios has been truly supportive since<br />

my first steps <strong>and</strong> gave me his insight whenever I asked <strong>for</strong> it. I had the chance to work with<br />

him <strong>and</strong> I have learned many things by him at a scientific, academic <strong>and</strong> moral level. His deep<br />

scientific knowledge <strong>and</strong> his kind <strong>and</strong> polite manners during our <strong>of</strong>ten conversations have been<br />

rather influential <strong>for</strong> me.<br />

I also owe many thanks to the people that I worked within the same laboratory during my<br />

years in Xanthi. Dimitrios Chrysostomou, my closest <strong>and</strong> always helpful friend in Xanthi, Rigas<br />

Kouskouridas, Nikolaos Kyriakoulis <strong>and</strong> I shared the same worries <strong>and</strong> dreams <strong>and</strong> became valuable<br />

colleagues <strong>and</strong> close friends. The same st<strong>and</strong>s <strong>for</strong> the younger members <strong>of</strong> our laboratory, my good<br />

friends Vasileios Belagiannis <strong>and</strong> Ioannis Kostavelis.<br />

I kept this last place <strong>for</strong> my family. They were always there <strong>for</strong> me in a quiet <strong>and</strong> gentle manner.<br />

Their support <strong>and</strong> love has played the most important role <strong>for</strong> me. A lot <strong>of</strong> changes have happened<br />

during these years; good <strong>and</strong> bad emotional situations, each one having something to teach. I owe<br />

much respect <strong>and</strong> gratitude to the memory <strong>of</strong> my uncle Eleftherios whose encouragement, trust <strong>and</strong><br />

enlightened advices have motivated <strong>and</strong> guided me towards where I st<strong>and</strong> right now. I would also<br />

like to thank my other half, my beloved fiancee Efi, whose love, underst<strong>and</strong>ing <strong>and</strong> support has<br />

always been crucial <strong>for</strong> me. These last words I would like to dedicate to my wonderful parents who<br />

I love <strong>and</strong> respect endlessly; Georgios <strong>and</strong> S<strong>of</strong>ia.<br />

Xanthi,<br />

September 2010 Lazaros Nalpantidis


xviii<br />

2.4.3 Uncalibrated <strong>Stereo</strong> Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40<br />

2.4.4 Non-ideal Lighting Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40<br />

2.4.5 Biologically Inspired Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41<br />

3 <strong>Stereo</strong> Correspondence Algorithms .......................................... 43<br />

3.1 <strong>Stereo</strong> Correspondence Algorithm with Enhanced Disparity Selection . . . . . . . . . . . . 43<br />

3.1.1 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44<br />

3.1.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46<br />

3.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48<br />

3.2 Quad-view <strong>Stereo</strong> Correspondence Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49<br />

3.2.1 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49<br />

3.2.2 Experimental Results <strong>and</strong> Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51<br />

3.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55<br />

3.3 Hierarchical <strong>Stereo</strong> Correspondence Algorithm <strong>for</strong> Uncalibrated Images . . . . . . . . . . . 56<br />

3.3.1 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56<br />

3.3.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60<br />

3.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65<br />

3.4 Biologically <strong>and</strong> Psychophysically Inspired <strong>Stereo</strong> Correspondence Algorithm . . . . . . 67<br />

3.4.1 Novel Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68<br />

3.4.2 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70<br />

3.4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72<br />

3.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76<br />

3.5 Illumination-Invariant Dissimilarity Measure <strong>and</strong> <strong>Stereo</strong> Correspondence Algorithm 77<br />

3.5.1 Description <strong>of</strong> Illumination-Invariant Dissimilarity Measure . . . . . . . . . . . . . . . 77<br />

3.5.2 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78<br />

3.5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81<br />

3.5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91<br />

4 Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong> ....................................... 93<br />

4.1 <strong>Stereo</strong> <strong>Vision</strong>-based Obstacle Avoidance Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93<br />

4.1.1 Threshold Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94<br />

4.1.2 Experimental Results <strong>for</strong> the Threshold Algorithm. . . . . . . . . . . . . . . . . . . . . . . 97<br />

4.1.3 Fuzzy Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99<br />

4.1.4 Experimental Results <strong>for</strong> the Fuzzy Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 102<br />

4.1.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106<br />

4.2 <strong>Stereo</strong> <strong>Vision</strong>-based SLAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107<br />

4.2.1 <strong>Stereo</strong> <strong>Vision</strong> Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108<br />

4.2.2 Camera’s Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109<br />

4.2.3 Local Map Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112<br />

4.2.4 Global Map Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112<br />

4.2.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113<br />

4.2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116<br />

5 Conclusion <strong>and</strong> Future Work ................................................ 117<br />

5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117<br />

5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118


References ....................................................................... 121<br />

Abbreviations ................................................................... 131<br />

Thesis Publications .............................................................. 133<br />

xix


List <strong>of</strong> Figures<br />

1.1 RGB colorspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2<br />

1.2 HSL colorspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br />

1.3 HSV colorspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4<br />

1.4 CIELab colorspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4<br />

1.5 Views <strong>of</strong> the a 3D volume <strong>and</strong> <strong>of</strong> a 3D neighborhood defined in it . . . . . . . . . . . . . . . . 5<br />

1.6 Images whose parts are differentiated from the wholesome pattern, explainable<br />

through Gestalt theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br />

1.7 Rectification <strong>of</strong> a stereo pair. The two images Il, Ir <strong>of</strong> an object d (a) are replaced<br />

by two pictures Il,rect, Ir,rect that lie on a common plane P (b) . . . . . . . . . . . . . . . . . . 8<br />

1.8 Geometry <strong>of</strong> epipolar lines, where C1 <strong>and</strong> C2 are the left <strong>and</strong> right camera lens<br />

centers, respectively. (a) Point P1 in one image plane may have arisen from any <strong>of</strong><br />

the points in the line C1P1, <strong>and</strong>mayappearinthealternateimageplaneatany<br />

point on the epipolar line E2; (b) In the case <strong>of</strong> non-rectified images, point P1 may<br />

have arisen <strong>for</strong> any <strong>of</strong> the points inside the block B ............................. 9<br />

1.9 Human eye’s (left) <strong>and</strong> a typical camera’s (right) color sensitivities . . . . . . . . . . . . . . . 10<br />

1.10 DSI containing matching costs <strong>for</strong> every pixel <strong>of</strong> the image <strong>and</strong> <strong>for</strong> all its potential<br />

disparity values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11<br />

1.11 General structure <strong>of</strong> a stereo correspondence algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br />

2.1 Categorization <strong>of</strong> stereo vision algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14<br />

2.2 Left image <strong>of</strong> the stereo pair (left) <strong>and</strong> ground truth (right) <strong>for</strong> the Tsukuba (a),<br />

Sawtooth (b), Map (c), Venus (d), Cones (e) <strong>and</strong> Teddy (f) stereo pair. . . . . . . . . . . . 15<br />

2.3 Diagrammatic representation <strong>of</strong> the local methods’ categorization . . . . . . . . . . . . . . . . 16<br />

2.4 Diagrammatic representation <strong>of</strong> the global methods’ categorization . . . . . . . . . . . . . . . 20<br />

2.5 An ASIC chip (a) <strong>and</strong> a FPGA development board (b) . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />

2.6 Generalized block diagram <strong>of</strong> a hardware implementable stereo correspondence<br />

algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32<br />

2.7 Mobile robots in real environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38<br />

2.8 A real-life stereo pair suffering from different illumination . . . . . . . . . . . . . . . . . . . . . . . 41<br />

3.1 Block diagram <strong>of</strong> the presented stereo correspondence algorithm . . . . . . . . . . . . . . . . . . 44<br />

3.2 2D Gaussian mask producing the weight <strong>for</strong> the pixel summation . . . . . . . . . . . . . . . . . 45<br />

xxi


xxii<br />

3.3 Results <strong>for</strong> the Middlebury data sets. From left to right: the Tsukuba, Venus, Teddy<br />

<strong>and</strong> Cones image From top to bottom: the reference (left) images (a), the provided<br />

ground truth disparity maps (b), the disparity maps calculated by the presented<br />

method (c), maps <strong>of</strong> signed disparity error (d), <strong>and</strong> maps <strong>of</strong> pixels with absolute<br />

computed disparity error bigger than 1 (e) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47<br />

3.4 Self-recorded scenes. (a) outdoor scene, (b) indoor scene. From left to right: left<br />

image, right image, calculated disparity map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48<br />

3.5 (a) The quad-camera configuration <strong>and</strong> (b) the results (up-left) <strong>and</strong> scene capturing<br />

(right) using the quad-camera configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50<br />

3.6 Algorithm’s steps <strong>and</strong> results <strong>for</strong> the Tsukuba data set. (column 1) the reference<br />

image (up-left), (column 2) the three target images (up-right, down-left, down-right),<br />

(column 3) the certainty maps <strong>for</strong> the horizontal, vertical <strong>and</strong> diagonal pair,<br />

(column 4) the computed disparity map <strong>for</strong> each stereo pair, (column 5) the fused<br />

(top) <strong>and</strong> the ground truth (bottom) disparity maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52<br />

3.7 Results <strong>of</strong> the presented fusion system (left), the computationally equivalent simple<br />

stereo algorithm (middle) <strong>and</strong> the preliminary simple stereo algorithm applied on<br />

the horizontal image pair (right). From top to bottom: the computed disparity<br />

maps, pixels with absolute computed disparity error bigger than 1 <strong>and</strong> maps <strong>of</strong><br />

signed disparity error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53<br />

3.8 Algorithm’s steps <strong>and</strong> results <strong>for</strong> a synthetic room scene. (column 1) The<br />

reference image (up-left), (column 2) the three target images (up-right, down-left,<br />

down-right), (column 3) the certainty maps <strong>for</strong> the horizontal, vertical <strong>and</strong> diagonal<br />

pair, (column 4) the computed disparity map <strong>for</strong> each stereo pair, (column 5) the<br />

final fused depth map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54<br />

3.9 Application results obtained using the calculated depth maps. (a) View <strong>of</strong> the<br />

reconstructed Tsukuba scene <strong>and</strong> (b) obstacle detection in the virtual room scene . . 55<br />

3.10 Quadruple, double <strong>and</strong> single pixel sample matching algorithm . . . . . . . . . . . . . . . . . . . 58<br />

3.11 General scheme <strong>of</strong> the presented hierarchical matching disparity algorithm. The<br />

search block is enlarged <strong>for</strong> viewing purposes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58<br />

3.12 Block diagram <strong>of</strong> the hierarchical disparity search algorithm . . . . . . . . . . . . . . . . . . . . . 59<br />

3.13 (a), (b) The uncalibrated, diagonally captured input images <strong>and</strong> the resulting<br />

disparity maps <strong>of</strong> the presented algorithm <strong>for</strong> (c) the quadruple, (d) double <strong>and</strong> (e)<br />

single pixel estimation respectively. The result <strong>of</strong> (f) Ogale & Aloimonos (2007) <strong>and</strong><br />

(g) Yoon & Kweon (2006a) <strong>for</strong>thesameinputimages........................... 62<br />

3.14 From left to right: the left <strong>and</strong> right 10% distorted input images <strong>and</strong> the calculated<br />

final disparity map <strong>for</strong> the (from up to down:) Tsukuba, Venus, Teddy <strong>and</strong> Cones<br />

image sets respectively . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63<br />

3.15 (from left to right:) The left <strong>and</strong> right distorted input images <strong>and</strong> the calculated<br />

final disparity maps <strong>for</strong> various percentages <strong>of</strong> the induced lens distortion . . . . . . . . . . 64<br />

3.16 The NMSE <strong>for</strong> the Tsukuba image pair <strong>for</strong> various distortion percentages . . . . . . . . . . 65<br />

3.17 (a), (b) The self-captured input images <strong>of</strong> an alley, <strong>and</strong> the resulting disparity maps<br />

<strong>for</strong> (c) the quadruple, (d) double <strong>and</strong> (e) single pixel estimation respectively . . . . . . . 66<br />

3.18 (a), (b) The self-captured input images <strong>of</strong> a building, <strong>and</strong> the resulting disparity<br />

maps <strong>for</strong> (c) the quadruple, (d) double <strong>and</strong> (e) single pixel estimation respectively . . 66<br />

3.19 (a), (b) The self-captured input images <strong>of</strong> a corner, <strong>and</strong> the resulting disparity maps<br />

<strong>for</strong> (c) the quadruple, (d) double <strong>and</strong> (e) single pixel estimation respectively . . . . . . . 67


xxiii<br />

3.20 Perceived intensity response according to the Weber-Fechner law . . . . . . . . . . . . . . . . . 69<br />

3.21 Block diagram <strong>of</strong> the algorithm’s structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70<br />

3.22 Results <strong>for</strong> the Middlebury data sets. From left to right: the Tsukuba, Venus, Teddy<br />

<strong>and</strong> Cones image. From top to bottom: the reference (left) images, the provided<br />

ground truth disparity maps, the disparity maps calculated by the presented<br />

method, maps <strong>of</strong> signed disparity error <strong>and</strong> maps <strong>of</strong> pixels with absolute computed<br />

disparity error bigger than 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74<br />

3.23 Results <strong>for</strong> new data sets. From top to bottom: Aloe, Babe3, Bowling2, Cloth1,<br />

Cloth3, Cloth4 <strong>and</strong> Flowerpots. From left to right: the reference (left) image <strong>of</strong><br />

the stereo pair, the provided ground truth, the disparity map computed by the<br />

presented algorithm <strong>and</strong> error map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76<br />

3.24 Views <strong>of</strong> the HSL color space representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78<br />

3.25 Block diagram <strong>of</strong> the utilized stereo correspondence algorithm . . . . . . . . . . . . . . . . . . . 80<br />

3.26 Results <strong>for</strong> the Middlebury data sets. From top to bottom: the Tsukuba, Venus,<br />

Teddy <strong>and</strong> Cones image sets. From left to right: the reference (left) input images,<br />

the right input images, the disparity maps calculated by the presented LCDM-based<br />

method, maps <strong>of</strong> pixels with absolute computed disparity error bigger than 1 <strong>and</strong><br />

maps <strong>of</strong> signed disparity error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82<br />

3.27 Left input images (a), right input images with altered luminosity (b) <strong>and</strong> calculated<br />

disparity maps <strong>for</strong> the presented (c), its RGB-based AD version (d), the ZNCC<br />

stereo (e) <strong>and</strong> the Ogale-Aloimonos (f) algorithms <strong>for</strong> various Lightness conditions . . 83<br />

3.28 Percentage <strong>of</strong> erroneously calculated pixels <strong>for</strong> the presented, its RGB-based AD<br />

version, the ZNCC <strong>and</strong> the Ogale-Aloimonos stereo algorithms <strong>for</strong> various lightness<br />

conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84<br />

3.29 From left to right: left input images with constant luminosity, right input images<br />

with luminosity grading from -50% to +50% along the horizontal direction,<br />

<strong>and</strong> calculated disparity maps <strong>for</strong> the st<strong>and</strong>ard image sets using the presented<br />

LCDM (c), the AD-based variant algorithm (d), the AD-based variant algorithm<br />

with histogram equalization (e), the AD-based variant algorithm with Retinex<br />

enhancement (f), the AD-based variant algorithm with enhanced pictures according<br />

to Vonikakis et al. (2008) (g) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86<br />

3.30 Percentage <strong>of</strong> pixels whose absolute disparity error is greater than 1 <strong>for</strong> st<strong>and</strong>ard<br />

image sets calculated using the presented LCDM, the AD-based variant algorithm,<br />

the AD-based variant algorithm with histogram equalization, the AD-based<br />

variant algorithm with Retinex enhancement, the AD-based variant algorithm with<br />

enhanced pictures according to Vonikakis et al. (2008) . . . . . . . . . . . . . . . . . . . . . . . . . . 87<br />

3.31 From left to right: left input images with constant luminosity, right input images<br />

with luminosity grading from -50% to +50% along the horizontal direction, <strong>and</strong><br />

calculated disparity maps <strong>for</strong> the st<strong>and</strong>ard image sets using the presented LCDM<br />

(c), the ZNCC algorithm (d), the ZNCC algorithm with histogram equalization (e),<br />

the ZNCC algorithm with Retinex enhancement (f), the ZNCC algorithm with<br />

enhanced pictures according to Vonikakis et al. (2008) (g) . . . . . . . . . . . . . . . . . . . . . . . 88


xxiv<br />

3.32 Percentage <strong>of</strong> pixels whose absolute disparity error is greater than 1 <strong>for</strong> st<strong>and</strong>ard<br />

image sets calculated using the presented LCDM, the ZNCC algorithm, the<br />

ZNCC algorithm with histogram equalization, the ZNCC algorithm with Retinex<br />

enhancement, the ZNCC algorithm with enhanced pictures according to Vonikakis<br />

et al. (2008) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88<br />

3.33 From left to right: left input images with constant luminosity, right input images<br />

with luminosity grading from -50% to +50% along the horizontal direction,<br />

<strong>and</strong> calculated disparity maps <strong>for</strong> the st<strong>and</strong>ard image sets using the presented<br />

LCDM (c), the Ogale-Aloimonos algorithm (d), the Ogale-Aloimonos algorithm<br />

with histogram equalization (e), the Ogale-Aloimonos algorithm with Retinex<br />

enhancement (f), the Ogale-Aloimonos algorithm with enhanced pictures according<br />

to Vonikakis et al. (2008) (g) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89<br />

3.34 Percentage <strong>of</strong> pixels whose absolute disparity error is greater than 1 <strong>for</strong> st<strong>and</strong>ard<br />

image sets calculated using the presented LCDM, the Ogale-Aloimonos algorithm,<br />

the Ogale-Aloimonos algorithm with histogram equalization, the Ogale-Aloimonos<br />

algorithm with Retinex enhancement, the Ogale-Aloimonos algorithm with<br />

enhanced pictures according to Vonikakis et al. (2008) . . . . . . . . . . . . . . . . . . . . . . . . . . 89<br />

3.35 Various self-recorded outdoor input image pairs <strong>and</strong> the resulting disparity maps.<br />

From left to right: the left <strong>and</strong> right input images <strong>and</strong> the disparity maps calculated<br />

with: the presented LCDM-based algorithm, the RGB AD-based algorithm applied<br />

on the raw images, the RGB AD-based algorithm applied on the histogram<br />

equalized images, the RGB AD-based algorithm applied on the Retinex enhanced<br />

images, the RGB AD-based algorithm applied on the images enhanced according to<br />

Vonikakis et al. (2008) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90<br />

4.1 Flow chart <strong>of</strong> the implemented threshold-based obstacle avoidance algorithm . . . . . . . 95<br />

4.2 Image enhancement steps <strong>of</strong> the presented stereo algorithm . . . . . . . . . . . . . . . . . . . . . . 95<br />

4.3 Depth map’s division in three windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96<br />

4.4 A sample outdoor route <strong>and</strong> the algorithm’s outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98<br />

4.5 Percentage <strong>of</strong> the algorithm’s correct decision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99<br />

4.6 Percentage <strong>of</strong> certainty <strong>for</strong> the algorithm’s decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99<br />

4.7 (a) <strong>Stereo</strong> camera equipped mobile robotic plat<strong>for</strong>m <strong>and</strong> (b) floor plan <strong>of</strong> the robot’s<br />

environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100<br />

4.8 Fuzzy membership functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101<br />

4.9 Test images <strong>and</strong> disparity maps where the algorithm chose to move <strong>for</strong>ward . . . . . . . . 103<br />

4.10 Test images <strong>and</strong> disparity maps where the algorithm chose to move left . . . . . . . . . . . 104<br />

4.11 Test images <strong>and</strong> disparity maps where the algorithm chose to move right . . . . . . . . . . 105<br />

4.12 Test images <strong>and</strong> disparity map where the algorithm chose to move backwards . . . . . . 106<br />

4.13 Outline <strong>of</strong> the presented SLAM algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107<br />

4.14 Reference image (a) <strong>of</strong> an indoor scene <strong>and</strong> sparse disparity map (b) obtained with<br />

the presented stereo correspondence algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109<br />

4.15 Depth vs. camera’s motion estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109<br />

4.16 Environment’s maps <strong>for</strong> the scene <strong>of</strong> Figure 4.14 obtained with the presented<br />

algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110<br />

4.17 V-disparity images <strong>for</strong> the image <strong>of</strong> Figure 4.14(a) <strong>and</strong> the corresponding disparity<br />

map <strong>of</strong> Figure 4.14(b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112


4.18 Features detected <strong>and</strong> matched using SURF <strong>for</strong> various consecutive images <strong>of</strong> the<br />

used dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114<br />

4.19 Experimental results after processing 1 (first row), 2 (second row), 6 (third row),<br />

<strong>and</strong> 10 (fourth row) image pairs <strong>of</strong> the scene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115<br />

xxv


xxvi


List <strong>of</strong> Tables<br />

2.1 Characteristics <strong>of</strong> the most common stereo image sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 15<br />

2.2 Characteristics <strong>of</strong> local algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19<br />

2.3 Characteristics <strong>of</strong> global algorithms that use global optimization . . . . . . . . . . . . . . . . . 24<br />

2.4 Characteristics <strong>of</strong> global algorithms that use DP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28<br />

2.5 Characteristics <strong>of</strong> the algorithms that cannot be clearly assigned to any category . . . 29<br />

2.6 Characteristics <strong>of</strong> the algorithms that produce sparse output . . . . . . . . . . . . . . . . . . . . . 31<br />

2.7 FPGA implementations’ characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34<br />

3.1 Percentage <strong>of</strong> pixels whose absolute disparity error is greater than 1 in various<br />

regions <strong>of</strong> the images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46<br />

3.2 Calculated NMSE <strong>for</strong> various versions <strong>of</strong> the presented algorithm . . . . . . . . . . . . . . . . . 48<br />

3.3 Percentage <strong>of</strong> pixels whose absolute disparity error is greater than 1 in various<br />

regions <strong>for</strong> the Tsukuba pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53<br />

3.4 Calculated NMSE <strong>for</strong> the presented algorithm <strong>for</strong> various pairs with constant<br />

distortion 10% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61<br />

3.5 Calculated NMSE <strong>for</strong> the presented algorithm <strong>for</strong> the Tsukuba pair with various<br />

distortion percentages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65<br />

3.6 Variation <strong>of</strong> the presented algorithm’s results <strong>for</strong> the Tsukuba image set when<br />

excluding one <strong>of</strong> the new concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73<br />

3.7 Evaluation <strong>of</strong> various ASW <strong>and</strong> local algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75<br />

3.8 Percentage <strong>of</strong> pixels whose absolute disparity error is greater than 1 <strong>for</strong> st<strong>and</strong>ard<br />

image sets using the presented LCDM-based algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 81<br />

4.1 Results <strong>for</strong> the cases where the algorithm chose to move <strong>for</strong>ward . . . . . . . . . . . . . . . . . 104<br />

4.2 Results <strong>for</strong> the cases where the algorithm chose to move left . . . . . . . . . . . . . . . . . . . . . 105<br />

4.3 Results <strong>for</strong> the cases where the algorithm chose to move right . . . . . . . . . . . . . . . . . . . . 105<br />

4.4 Results <strong>for</strong> the cases where the algorithm chose to move backwards . . . . . . . . . . . . . . . 106<br />

xxvii


Chapter 1<br />

Introduction<br />

<strong>Vision</strong> begins with light being captured by a sensor, i.e. the eye <strong>for</strong> living creatures or the camera<br />

sensor <strong>for</strong> machines. <strong>Stereo</strong> vision involves the simultaneous use <strong>of</strong> two such sensors <strong>and</strong> leads to<br />

the perception <strong>of</strong> depth. This fact was first realized by Sir Charles Wheatstone about two centuries<br />

ago who stated: "...the mind perceives an object <strong>of</strong> three dimensions by means <strong>of</strong> the two dissimilar<br />

pictures projected by it on the two retinae..." (Wheatstone 1838). In order to achieve successful<br />

operation <strong>of</strong> stereo vision systems various basic concepts have to contribute. This Chapter serves<br />

as an introduction to the most essential <strong>of</strong> these concepts <strong>and</strong> to the mechanisms <strong>of</strong> robotic stereo<br />

vision.<br />

1.1 Basic Concepts<br />

<strong>Stereo</strong> vision algorithms involve the use <strong>of</strong> more basic tools so as to provide accurate depth results.<br />

Many <strong>of</strong> those tools model aspects <strong>of</strong> the operation <strong>of</strong> the Human Visual System (HVS). The most<br />

essential <strong>of</strong> them have to do with the color representation <strong>and</strong> perceptual processing tools.<br />

1.1.1 Color Models<br />

Contemporary cameras are able to provide reliable color in<strong>for</strong>mation about the depicted objects<br />

<strong>and</strong> scenes. Color models, also referred to as colorspaces, provide coordinate systems that allow the<br />

specification <strong>of</strong> each color as a point within this coordinate system in a st<strong>and</strong>ard manner (Gonzalez<br />

&Woods1992).<br />

Color cameras commonly use a combination <strong>of</strong> red-, green-, <strong>and</strong> blue-sensitive photoreceptors<br />

in order to capture color in<strong>for</strong>mation in each <strong>of</strong> their pixels in accordance with the way the HVS<br />

per<strong>for</strong>ms the same task. As a result, the directly available color in<strong>for</strong>mation is coded in the popular<br />

RGB colorspace. However, many machine vision applications require different colorspaces in order<br />

to facilitate the dem<strong>and</strong>ed operations. Consequently, HSL, HSV, CIELab as well as numerous other<br />

color models have been developed <strong>and</strong> are being used.<br />

1


2 Chapter 1. Introduction<br />

The RGB color model<br />

The Red-Green-Blue (RGB) color model consider the Red, Green <strong>and</strong> Blue as the primary spectral<br />

components. The colorspace is represented as a cube on a Cartesian coordinate system, as shown in<br />

Figure 1.1. Each color is represented by a triad <strong>of</strong> values (r,g,b) or as a vector extending from the<br />

origin <strong>of</strong> the coordinate system. The normalized value <strong>of</strong> each variable lies within the range [0,1].<br />

Point (0,0,0) represents the black color <strong>and</strong> (1,1,1) the white. As a result, in the RGB color model<br />

each other color arises as a mixture <strong>of</strong> the three primary colors<br />

Fig. 1.1 RGB colorspace<br />

The HSL color model<br />

The Hue-Saturation-Luminosity/Lightness (HSL) <strong>of</strong>ten called Hue-Luminosity/Lightness-Saturation<br />

(HLS) color model representation is a double cone, as shown in Figure 1.2. In this colorspace H<br />

st<strong>and</strong>s <strong>for</strong> hue <strong>and</strong> determines the human impression about which color (red, green, blue, etc) is<br />

depicted. Each color is represented by an angular value ranging between 0 <strong>and</strong> 360 degrees (0 being<br />

red, 120 green <strong>and</strong> 240 blue). S st<strong>and</strong>s <strong>for</strong> saturation <strong>and</strong> determines how vivid or gray the particular<br />

color seems. Its value ranges from 0 <strong>for</strong> gray to 1 <strong>for</strong> fully saturated (pure) colors. The L<br />

channel <strong>of</strong> the HSL colorspace st<strong>and</strong>s <strong>for</strong> the Luminosity <strong>and</strong> determines the intensity <strong>of</strong> a specific<br />

color. It ranges from 0 <strong>for</strong> completely dark colors (black) to 1 <strong>for</strong> fully illuminated colors (white).<br />

The transition from the RGB colorspace, which is the usual output <strong>of</strong> contemporary cameras,<br />

to the HSL is straight<strong>for</strong>ward <strong>and</strong> does not involve any complicated mathematical computations<br />

(Gonzalez & Woods 1992) as shown in Eq. 1.1-1.3 <strong>for</strong> M = max(R, G, B) <strong>and</strong> m = min(R, G, B) :<br />

⎧<br />

0 if M = m,<br />

⎪⎨<br />

(60<br />

H =<br />

⎪⎩<br />

o × G−B<br />

M−m + 360o )mod360o if M = R,<br />

60o × G−B<br />

M−m + 120o if M = G,<br />

60o × G−B<br />

M−m + 240o if M = B.<br />

(1.1)<br />

L = 1<br />

(M + m) (1.2)<br />

2


4 Chapter 1. Introduction<br />

Fig. 1.3 HSV colorspace<br />

h<strong>and</strong>, the value <strong>of</strong> b* represents how blue or yellow the color is. Negative values <strong>of</strong> b* indicate blue,<br />

while positive values <strong>of</strong> b* indicate yellow.<br />

Fig. 1.4 CIELab colorspace<br />

1.1.2 Cellular Automata<br />

Cellular Automata (CA) were first introduced by von Neumann, who was thinking <strong>of</strong> imitating<br />

the behavior <strong>of</strong> a human brain in order to build a machine able to solve very complex problems<br />

(Von Neumann 1966). His ambitious project was to show that complex phenomena can, in principle,<br />

be reduced to the dynamics <strong>of</strong> many identical, very simple primitives, capable <strong>of</strong> interacting <strong>and</strong><br />

maintaining their identity (Chopard & Droz 1998). Following a suggestion by Ulam (Ulam 1952), von<br />

Neumann adopted a fully discrete approach, in which space, time <strong>and</strong> even the dynamical variables<br />

were defined to be discrete. Consequently, CA comprise a very effective computational tool in<br />

simulating physical systems <strong>and</strong> solving scientific problems, because they can capture the essential


Chapter 1. Introduction 5<br />

features <strong>of</strong> systems where global behavior arises from the collective effect <strong>of</strong> simple components<br />

which interact locally (Feynman 1982, Wolfram 1986).<br />

In CA analysis, physical processes <strong>and</strong> systems are described by a cell array <strong>and</strong> a local rule,<br />

which defines the new state <strong>of</strong> a cell depending on the states <strong>of</strong> its neighbors. All cells can work in<br />

parallel due to the fact that each cell can independently update its own state. There<strong>for</strong>e CA models<br />

are massively parallel <strong>and</strong> comprise ideal c<strong>and</strong>idates <strong>for</strong> hardware implementation (Mardiris et al.<br />

2008, Kotoulas, Gasteratos, Sirakoulis, Georgoulas & Andreadis 2005, Sirakoulis et al. 2003).<br />

CA can easily h<strong>and</strong>le complicated boundary <strong>and</strong> initial conditions (Von Neumann 1966, Wolfram<br />

1986). Using a more <strong>for</strong>mal definition, a CA system requires:<br />

1. aregularlattice<strong>of</strong>cellscoveringaportion<strong>of</strong>ad-dimensional space;<br />

2. asetC(r,t)={C1(r,t),C2(r,t),...,Cm(r,t)} <strong>of</strong> variables attached to each position r <strong>of</strong> the<br />

lattice giving the local state <strong>of</strong> each cell at the time t =0, 1, 2,...;<br />

3. aruleR = {R1,R2,...,Rm} which specifies the time evolution <strong>of</strong> the states C(r,t) in the following<br />

way:<br />

Cj(r,t+ 1) = Rj{C(r,t),C(r + δ1,t),<br />

C(r + δ2,t)..., C(r + δq,t)}<br />

where r + δk designate the cells belonging to a given neighborhood <strong>of</strong> cell r.<br />

Furthermore, the CA approach is consistent with the modern notion <strong>of</strong> unified space-time. In<br />

computer science, space corresponds to memory <strong>and</strong> time to processing unit. In CA, memory (CA<br />

cell state) <strong>and</strong> processing unit (CA local Rule) are inseparably related to a CA cell (Nalpantidis<br />

et al. 2008a).<br />

While CA are usually considered as tools applicable to two-dimensional structures, threedimensional<br />

structures, as the one shown in Figure 1.5(a) can be also processed. 3D CA include<br />

rules that involve some or all the three directions <strong>of</strong> such structures. As a result, the neighborhoods<br />

affecting the state <strong>of</strong> each cell are also 3D. Figure 1.5(b) shows such a 3 × 3 × 3 3D neighborhood.<br />

(a) A 3D volume (x, y, z) (b) A 3×3×3 neighborhood<br />

Fig. 1.5 Views <strong>of</strong> the a 3D volume <strong>and</strong> <strong>of</strong> a 3D neighborhood defined in it


6 Chapter 1. Introduction<br />

1.1.3 Gestalt Laws <strong>of</strong> Perceptual Organization<br />

Gestalt is a movement <strong>of</strong> psychology that deals with perceptual organization (Kohler 1969). Gestalt<br />

psychology examines the relationships that bond individual elements so as to <strong>for</strong>m a group (Forsyth<br />

& Ponce 2002). As a consequence, a pattern emerges instead <strong>of</strong> separate parts. This pattern has<br />

generally completely different characteristics to its parts, as shown in Figure 1.6.<br />

(a) (b) (c)<br />

Fig. 1.6 Images whose parts are differentiated from the wholesome pattern, explainable through Gestalt theory<br />

Some <strong>of</strong> the gestalt rules by which elements tend to be associated together <strong>and</strong> interpreted as a<br />

group are the following (Sternberg 2002):<br />

• Proximity: elements that are close to each other.<br />

• Similarity: elements similar in an attribute.<br />

• Continuity: elements that could belong to a smooth larger feature.<br />

• Common fate: elements that exhibit similar behavior.<br />

• Closure: elements that could provide closed curves.<br />

• Parallelism: elements that seem to be parallel.<br />

• Symmetry: elements that exhibit a larger symmetry.<br />

Gestalt laws have proven themselves to be precious tools in interpreting the way the human<br />

perceives his environment through vision (Scholl 2001). While all the laws are valuable in order<br />

to underst<strong>and</strong> the context <strong>of</strong> an image, basic image processing tasks could be restricted to using<br />

the most basic ones. In order to express an image processing task through the prism <strong>of</strong> the gestalt<br />

theory, pixels should be considered as the elements. The correlation degree between them should<br />

be treated as the bonding relationship <strong>of</strong> the elements.<br />

1.2 <strong>Stereo</strong>scopic <strong>Vision</strong><br />

Calculating the distance <strong>of</strong> various points, or any other primitive, in a scene relative to the position<br />

<strong>of</strong> a camera is one <strong>of</strong> the important tasks <strong>of</strong> a computer vision system. The most common method<br />

<strong>for</strong> extracting depth in<strong>for</strong>mation from intensity images is by means <strong>of</strong> a pair <strong>of</strong> synchronized camerasignals,<br />

acquired by a stereo rig. The point-by-point matching between the two images from the


Chapter 1. Introduction 7<br />

stereo setup (also known as the stereo correspondence problem) derives the depth images, or the so<br />

called disparity maps (Faugeras 1993).<br />

The estimation <strong>of</strong> the disparity map between two images <strong>of</strong> the same scene is a long-st<strong>and</strong>ing<br />

issue <strong>for</strong> the machine vision community (Marr & Poggio 1976). <strong>Stereo</strong>scopic vision is based on the<br />

principal, first utilized by nature, that two spatially differentiated views <strong>of</strong> the same scene provide<br />

enough in<strong>for</strong>mation so as to perceive the depth <strong>of</strong> the portrayed objects. Thus, the importance<br />

<strong>of</strong> stereo correspondence is apparent in the fields <strong>of</strong> machine vision (Jain et al. 1995), computer<br />

vision (Forsyth & Ponce 2002), virtual reality, robot navigation (Metta et al. 2004), simultaneous<br />

localization <strong>and</strong> mapping (Murray & Little 2000), (Murray & Jennings 1997), depth measurements<br />

(Manzotti et al. 2001) <strong>and</strong> 3D environment reconstruction (Jain et al. 1995).<br />

1.2.1 Image Rectification<br />

In the general case, the image planes <strong>of</strong> the two capturing cameras do not belong to the same<br />

plane. While stereo algorithms can deal with such cases, the dem<strong>and</strong>ed calculations are considerably<br />

simplified if the stereo image pair has been rectified. The process <strong>of</strong> rectification, as shown in Figure<br />

1.7 involves the replacement <strong>of</strong> the initial image pair Il, Ir by another projectively equivalent pair<br />

Il,rect, Ir,rect (Forsyth & Ponce 2002, Faugeras 1993). The initial images are reprojected on a<br />

common plane P that is parallel to the baseline B joining the optical centers <strong>of</strong> the initial images.<br />

1.2.2 Epipolar Geometry<br />

Epipolar geometry provides tools in order to solve the stereo correspondence problem, i.e. to recognize<br />

the same feature in both images. If no rectification is per<strong>for</strong>med, the matching procedure<br />

involves searching within two-dimensional regions <strong>of</strong> the target image, as shown in Figure 1.8(b).<br />

However, this matching can be done as a one-dimensional search if accurately rectified stereo pairs<br />

are assumed in which horizontal scan lines reside on the same epipolar line, as shown in Figure<br />

1.8(a). A point P1 in one image plane may have arisen from any <strong>of</strong> points in the line C1P1, <strong>and</strong>may<br />

appear in the alternate image plane at any point on the epipolar line E2 (Jain et al. 1995). Thus,<br />

the search is theoretically reduced within a scan line, since corresponding pair points reside on the<br />

same epipolar line. The difference <strong>of</strong> the horizontal coordinates <strong>of</strong> these points is the disparity value.<br />

The disparity map consists <strong>of</strong> all the disparity values <strong>of</strong> the image.<br />

1.2.3 Pixel Correlation<br />

Detecting conjugate pairs in stereo images is a challenging research problem known as the correspondence<br />

problem, i.e. to find <strong>for</strong> each pixel in the left image the corresponding pixel in the<br />

right one (Barnard & Thompson 1980). To determine that two pixels <strong>for</strong>m a conjugate pair, it is<br />

necessary to measure the similarity <strong>of</strong> these pixels. The pixel to be matched without any ambiguity<br />

should be distinctly different from its surrounding pixels. Several algorithms have been proposed in


10 Chapter 1. Introduction<br />

Fig. 1.9 Human eye’s (left) <strong>and</strong> a typical camera’s (right) color sensitivities<br />

aggregation is traditionally considered as time-consuming. However, Gong <strong>and</strong> his colleagues (Gong<br />

et al. 2007) study the per<strong>for</strong>mance <strong>of</strong> various aggregation approaches suitable <strong>for</strong> real-time methods.<br />

According to (Hua et al. 2005), using color in<strong>for</strong>mation instead <strong>of</strong> gray values during stereo<br />

matching significantly improves the accuracy. Recently, the use <strong>of</strong> the CIELab color space has been<br />

proved to yield impressive results (Yoon & Kweon 2006a). However, vision sensors produce color<br />

images in the RGB color space, due to their structure. This fact is generally in accordance with the<br />

way HVS perceives colors as shown in Figure 1.9.<br />

The conversion from the RGB to CIELab or similar color spaces dem<strong>and</strong>s non-linear trans<strong>for</strong>mations<br />

<strong>and</strong>, as a result, it is computationally dem<strong>and</strong>ing. The use <strong>of</strong> RGB color space’s chromatic<br />

components is the simplest solution. Thus, the absolute differences <strong>for</strong> each channel <strong>of</strong> the RGB<br />

color space are taken into consideration. However, there are at least two possible methodologies<br />

<strong>for</strong> combining the three color channels. International Telecommunications Union (ITU) in Recommendation<br />

BT.601-6 suggests that luminance (or intensity) in<strong>for</strong>mation, represented as Y in color<br />

spaces such as YCbCr, YUV <strong>and</strong> XYZ, can be calculated as a weighted linear combination <strong>of</strong> the<br />

available RGB components. The weight <strong>for</strong> the red, green <strong>and</strong> blue chromatic channel is 0.299,<br />

0.587 <strong>and</strong> 0.114 respectively. These values reflect photometric considerations <strong>and</strong> were derived from<br />

measurements <strong>of</strong> the response <strong>of</strong> the HVS to color stimuli. This equation is used in grayscale conversion<br />

by NTSC <strong>and</strong> JPEG. According to this, the a<strong>for</strong>ementioned linear combination <strong>of</strong> the absolute<br />

differences (AD) calculated <strong>for</strong> each RGB channel becomes:<br />

AD =0.299ADR +0.587ADG +0.114ADB<br />

where by AD is denoted the total absolute luminance (or intensity) difference <strong>of</strong> two pixels in<br />

two images <strong>and</strong> by ADR, ADG, ADB the absolute differences calculated <strong>for</strong> the red, green, blue<br />

chromatic channel only, respectively. In spite <strong>of</strong> being representative <strong>of</strong> the way HVS accounts <strong>for</strong><br />

each chromatic channel, this methodology is not the most credible one. The methodology most<br />

preferred in literature (Scharstein & Szeliski 2002, Muhlmann et al. 2002) indicates that the same<br />

weight should be assigned to each one <strong>of</strong> the three chromatic channels, since each one contain the<br />

same amount <strong>of</strong> in<strong>for</strong>mation. Thus, the total AD is a simple summation <strong>of</strong> the AD <strong>for</strong> each specific<br />

channel:<br />

AD = ADR + ADG + ADB<br />

(1.10)<br />

(1.9)


Chapter 1. Introduction 11<br />

This simpler treatment presents better per<strong>for</strong>mance than the more sophisticated one, as indicated<br />

by conducted tests.<br />

1.2.4 Structure <strong>of</strong> <strong>Stereo</strong> Correspondence Algorithms<br />

The majority <strong>of</strong> the reported stereo correspondence algorithms can be described using more or less<br />

the same structural set (Scharstein & Szeliski 2002). The basic building blocks are:<br />

1. Computation <strong>of</strong> a matching cost function <strong>for</strong> every pixel in both the input images.<br />

2. Aggregation <strong>of</strong> the computed matching cost inside a support region <strong>for</strong> every pixel in each image.<br />

3. Finding the optimum disparity value <strong>for</strong> every pixel <strong>of</strong> one picture.<br />

4. Refinement <strong>of</strong> the resulted disparity map.<br />

Every stereo correspondence algorithm makes use <strong>of</strong> a matching cost function so as to establish<br />

correspondence between two pixels, as discussed in Section 1.2.3. The results <strong>of</strong> the matching cost<br />

computation comprise the Disparity Space Image (DSI). DSI is a 3D matrix containing the computed<br />

matching costs <strong>for</strong> every pixel <strong>and</strong> <strong>for</strong> all its potential disparity values (Muhlmann et al. 2002). The<br />

structure <strong>of</strong> a DSI is illustrated in Figure 1.10.<br />

!<br />

Fig. 1.10 DSI containing matching costs <strong>for</strong> every pixel <strong>of</strong> the image <strong>and</strong> <strong>for</strong> all its potential disparity values<br />

Usually, the matching costs are aggregated over support regions. These regions could be 2D or<br />

even 3D (Zitnick & Kanade 2000, Brockers et al. 2005) ones within the DSI cube. The selection <strong>of</strong><br />

the appropriate disparity value <strong>for</strong> each pixel is per<strong>for</strong>med afterwards. It can be a simple Winner-<br />

Takes-All (WTA) process or a more sophisticated one. In many cases this is an iterative process as<br />

depicted in Figure 1.11. An additional disparity refinement step is frequently adopted. It is usually<br />

intended to interpolate the calculated disparity values, to give sub-pixel accuracy or to assign values<br />

to not calculated pixels. The general structure <strong>of</strong> the majority <strong>of</strong> stereo correspondence algorithms<br />

is shown in Figure 1.11.


12 Chapter 1. Introduction<br />

!<br />

Fig. 1.11 General structure <strong>of</strong> a stereo correspondence algorithm<br />

1.2.5 Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong> in Robotics<br />

Autonomous robots’ behavior greatly depends on the accuracy <strong>of</strong> their decision making algorithms.<br />

Reliable depth estimation is commonly needed in numerous autonomous behaviors; autonomous<br />

navigation (Hariyama et al. 2000), localization <strong>and</strong> mapping are just a few <strong>of</strong> them (Murray &<br />

Little 2000, Sim & Little 2009).<br />

<strong>Vision</strong>-based solutions are becoming more <strong>and</strong> more attractive due to their decreasing cost as<br />

well as their inherent coherence with human imposed mechanisms. In the case <strong>of</strong> stereo vision-based<br />

navigation, the accuracy <strong>and</strong> the refresh rate <strong>of</strong> the computed disparity maps are the cornerstone<br />

<strong>of</strong> success (Iocchi & Konolige 1998, Schreer 1998). However, robotic applications place strict requirements<br />

on the dem<strong>and</strong>ed speed <strong>and</strong> accuracy <strong>of</strong> vision depth-computing algorithms. Depth<br />

estimation using stereo vision, i.e. the stereo correspondence problem, is known to be very computational<br />

dem<strong>and</strong>ing. The computation <strong>of</strong> dense <strong>and</strong> accurate depth images, i.e. disparity maps, in<br />

frame rates suitable <strong>for</strong> robotic applications is an open problem <strong>for</strong> the scientific community. Most<br />

<strong>of</strong> the attempts to confront the dem<strong>and</strong> <strong>for</strong> accuracy focus on the development <strong>of</strong> sophisticated<br />

stereo correspondence algorithms, which usually increase the computational load exponentially. On<br />

the other h<strong>and</strong>, the need <strong>for</strong> real-time frame rates, inevitably, imposes compromises concerning<br />

the quality <strong>of</strong> the results. However, the reliability <strong>of</strong> the results is crucial <strong>for</strong> autonomous robotic<br />

applications <strong>and</strong> proper stereo algorithms are required.


Chapter 2<br />

State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong><br />

<strong>Stereo</strong> vision is a flourishing field attracting the attention <strong>of</strong> many researchers (Forsyth & Ponce<br />

2002, Hartley & Zisserman 2004). New approaches are presented frequently. Such an exp<strong>and</strong>ing<br />

volume <strong>of</strong> work makes it difficult <strong>for</strong> those interested to keep up with it. An up-to-date survey<br />

<strong>of</strong> the stereo vision matching algorithms <strong>and</strong> corresponding applications would be useful <strong>for</strong> those<br />

already engaged to the field, giving them a brief overview <strong>of</strong> the advances accomplished, as well as<br />

<strong>for</strong> the newly interested ones, allowing <strong>for</strong> a quick introduction to the state <strong>of</strong> the art. The stereo<br />

correspondence algorithms reviewed in this Chapter follow the taxonomy diagrammatically given<br />

in Figure 2.1.<br />

2.1 <strong>Stereo</strong> Correspondence Algorithms<br />

Since the excellent taxonomy presented by Scharstein <strong>and</strong> Szeliski (Scharstein & Szeliski 2002) <strong>and</strong><br />

the interesting work <strong>of</strong> Sunyoto, Mark <strong>and</strong> Gavrila (Sunyoto et al. 2004) many new stereo matching,<br />

i.e. stereo correspondence, algorithms have been proposed (Yoon & Kweon 2006a,Klausetal.2006).<br />

Latest trends in the field mainly pursue real-time execution speeds, as well as decent accuracy.<br />

<strong>Stereo</strong> correspondence algorithms can be grouped into those producing sparse output <strong>and</strong> those<br />

giving a dense result. Feature based methods stem from human vision studies <strong>and</strong> are based on<br />

matching segments or edges between two images, thus resulting in a sparse output. This disadvantage<br />

is counterbalanced by the accuracy <strong>and</strong> speed obtained. However, contemporary applications<br />

dem<strong>and</strong> more <strong>and</strong> more dense output. In order to categorize <strong>and</strong> evaluate the dense stereo correspondence<br />

algorithms a context has been proposed (Scharstein & Szeliski 2002). According to this,<br />

dense matching algorithms are classified in local <strong>and</strong> global ones. Local methods (area-based) trade<br />

accuracy <strong>for</strong> speed. They are also referred to as window-based methods because disparity computation<br />

at a given point depends only on intensity values within a finite support window. Global<br />

methods (energy-based) on the other h<strong>and</strong> are time consuming but very accurate. Their goal is to<br />

minimize a global cost function, which combines data <strong>and</strong> smoothness terms, taking into account<br />

the whole image. Of course, there are many other methods (Liu et al. 2006) that are not strictly<br />

included in either <strong>of</strong> these two broad classes. The issue <strong>of</strong> stereo matching has recruited a variation<br />

<strong>of</strong> computation tools. Advanced computational intelligence techniques are not uncommon <strong>and</strong><br />

13


14 Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong><br />

Fig. 2.1 Categorization <strong>of</strong> stereo vision algorithms.<br />

present interesting <strong>and</strong> promiscuous results (Binaghi et al. 2004, Kotoulas, Gasteratos, Sirakoulis,<br />

Georgoulas & Andreadis 2005).<br />

Contemporary research in stereo matching algorithms, in accordance with the ideas <strong>of</strong> Maimone<br />

<strong>and</strong> Shafer (Maimone & Shafer 1996), is reigned by the test bench available at the site maintained<br />

by Scharstein <strong>and</strong> Szeliski (Scharstein & Szeliski 2010). As numerous methods have been proposed<br />

since then, this Section aspires to review the most recent ones, i.e. mainly those published during<br />

<strong>and</strong> after 2004. Most <strong>of</strong> the results presented in the rest <strong>of</strong> this Section are based on the image<br />

sets (Scharstein & Szeliski 2002, 2003) <strong>and</strong> test provided there. The most common image sets are<br />

presented in Figure 2.2. Table 2.1 summarizes their size as well the number <strong>of</strong> disparity levels.<br />

Experimental results based on these image sets are given, where available. The preferred metric<br />

adopted in order to depict the quality <strong>of</strong> the resulting disparity maps, is the percentage <strong>of</strong> pixels<br />

whose absolute disparity error is greater than 1 in the non-occluded areas <strong>of</strong> the image. This metric,<br />

considered the most representative <strong>of</strong> the result’s quality, was used so as to make comparison easier.<br />

Other metrics, like error rate <strong>and</strong> root mean square error are also employed. The speed with which<br />

the algorithms process input image pairs is expressed in frames per second (fps). This metric has<br />

<strong>of</strong> course a lot to do with the used computational plat<strong>for</strong>m <strong>and</strong> the kind <strong>of</strong> the implementation.<br />

Inevitably, speed results are not directly comparable.


Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong> 15<br />

Fig. 2.2 Left image <strong>of</strong> the stereo pair (left) <strong>and</strong> ground truth (right) <strong>for</strong> the Tsukuba (a), Sawtooth (b), Map (c),<br />

Venus (d), Cones (e) <strong>and</strong> Teddy (f) stereo pair.<br />

Table 2.1 Characteristics <strong>of</strong> the most common stereo image sets<br />

Size in<br />

pixels<br />

Disparity<br />

levels<br />

Tsukuba Map Sawtooth Venus Cone Teddy<br />

384 × 288 284 × 216 434 × 380 434 × 383 450 × 375 450 × 375<br />

2.1.1 Dense Disparity Algorithms<br />

16 30 20 20 60 60<br />

Methods that produce dense disparity maps gain popularity as the computational power grows.<br />

Moreover, contemporary applications are benefited by <strong>and</strong> consequently dem<strong>and</strong> dense depth in<strong>for</strong>mation.<br />

There<strong>for</strong>e, during the latest years ef<strong>for</strong>ts towards this direction are being reported much<br />

more frequently than towards the direction <strong>of</strong> sparse results.


16 Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong><br />

Dense disparity stereo matching algorithms can be divided in two general classes, according to<br />

the way they assign disparities to pixels. Firstly, there are algorithms that decide the disparity<br />

<strong>of</strong> each pixel according to the in<strong>for</strong>mation provided by its local, neighboring pixels. There are,<br />

however, other algorithms which assign disparity values to each pixel depending on in<strong>for</strong>mation<br />

derived from the whole image. Consequently, the <strong>for</strong>mer ones are called local methods while the<br />

latter ones global.<br />

Local Methods<br />

Local methods are usually fast <strong>and</strong> can at the same time produce descent results. Several new<br />

methods have been presented. In Figure 2.3 a Venn diagram presents the main characteristics <strong>of</strong><br />

the below presented local methods. Under the term color usage we have grouped the methods that<br />

take advantage <strong>of</strong> the chromatic in<strong>for</strong>mation <strong>of</strong> the image pair. Any algorithm can process color<br />

images but not everyone can use it in a more beneficial way. Furthermore, in Figure 2.3 NCC st<strong>and</strong>s<br />

<strong>for</strong> the use <strong>of</strong> normalized cross correlation <strong>and</strong> SAD <strong>for</strong> the use <strong>of</strong> sum <strong>of</strong> absolute differences as<br />

the matching cost function. As expected, the use <strong>of</strong> SAD as matching cost is far more widespread<br />

than any other.<br />

Fig. 2.3 Diagrammatic representation <strong>of</strong> the local methods’ categorization<br />

Legend:<br />

1. Kotoulas, Gasteratos, Sirakoulis,<br />

Georgoulas & Andreadis<br />

(2005)<br />

2. Yoon & Kweon (2006a)<br />

3. Yoon & Kweon (2006b)<br />

4. Gu et al. (2008)<br />

5. Di Stefano et al. (2004)<br />

6. Zach et al. (2004)<br />

7. Ogale & Aloimonos (2005b)<br />

8. Yoon et al. (2005)<br />

9. Muhlmann et al. (2002)<br />

10. Hosni et al. (2009)<br />

11. Mordohai & Medioni (2006)<br />

12. Binaghi et al. (2004)<br />

Muhlmann <strong>and</strong> his colleagues (Muhlmann et al. 2002) describe a method that uses the SAD<br />

correlation measure <strong>for</strong> RGB color images. It achieves high speed <strong>and</strong> reasonable quality. It makes<br />

use <strong>of</strong> the left to right consistency <strong>and</strong> uniqueness constraints <strong>and</strong> applies a fast median filter to<br />

the results. It can achieve 20 fps <strong>for</strong> 160×120 pixels image size, making this method suitable <strong>for</strong><br />

real-time applications. The PC plat<strong>for</strong>m is Linux on a dual processor 800 MHz Pentium III system<br />

with 512 MB <strong>of</strong> RAM.


Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong> 17<br />

Another fast area-based stereo matching algorithm, which uses the SAD as error function, is<br />

presented in (Di Stefano et al. 2004). Based on the uniqueness constraint, it rejects previous matches<br />

as soon as better ones are detected. In contrast to bidirectional matching algorithms this one<br />

per<strong>for</strong>ms only one matching phase, having though similar results. The results obtained are tested<br />

<strong>for</strong> reliability <strong>and</strong> sub-pixel refined. It produces dense disparity maps in real-time using an Intel<br />

Pentium III processor running at 800 MHz. The algorithm achieves 39.59 fps speed <strong>for</strong> 320×240<br />

pixels <strong>and</strong> 16 disparity levels <strong>and</strong> the root mean square error <strong>for</strong> the st<strong>and</strong>ard Tsukuba pair is<br />

5.77%.<br />

On the contrary, Ogale <strong>and</strong> Aloimonos in (Ogale & Aloimonos 2005b) takeintoconsideration<br />

the shape <strong>of</strong> the objects depicted <strong>and</strong> demonstrate the importance <strong>of</strong> the vertical <strong>and</strong> horizontal<br />

slanted surfaces. The authors propose the replacement <strong>of</strong> the st<strong>and</strong>ard uniqueness constraint referred<br />

to pixels with a uniqueness constraint referred to line segments along a scanline. So the method<br />

per<strong>for</strong>ms interval matching instead <strong>of</strong> pixel matching. The slants <strong>of</strong> the surfaces are computed<br />

along a scanline, a stretching factor is then obtained <strong>and</strong> the matching is per<strong>for</strong>med based on the<br />

absolute intensity difference. The object is to achieve minimum segmentation. The experimental<br />

results indicate 1.77%, 0.61%, 3.00% <strong>and</strong> 7.63% error percentages <strong>for</strong> the Tsukuba, Sawtooth, Venus<br />

<strong>and</strong> Map stereo pairs, respectively. The execution speed <strong>of</strong> the algorithm varies from 1 to 0.2 fps<br />

on a 2.4 GHz processor.<br />

Another method that presents almost real-time per<strong>for</strong>mance is reported in (Yoon et al. 2005). It<br />

makes use <strong>of</strong> a refined implementation <strong>of</strong> the SAD method <strong>and</strong> a left-right consistency check. The<br />

errors in the problematic regions are reduced using differently sized correlation windows. Finally, a<br />

median filter is used in order to interpolate the results. The algorithm is able to process 7 fps <strong>for</strong><br />

320×240 pixels images <strong>and</strong> 32 disparity levels. These results are obtained using an Intel Pentium 4<br />

at 2.66 GHz Processor.<br />

A window-based method <strong>for</strong> correspondence search is presented in (Yoon & Kweon 2006a) that<br />

uses varying support-weights. The support-weights <strong>of</strong> the pixels in a given support window are<br />

adjusted based on color similarity <strong>and</strong> geometric proximity to reduce the image ambiguity. The<br />

difference between pixel colors is measured in the CIELab color space because the distance <strong>of</strong> two<br />

points in this space is analogous to the stimulus perceived by the human eye. The running time<br />

<strong>for</strong> the Tsukuba image pair with a 35x35 pixels support window is about 0.016 fps on an AMD<br />

2700+ processor. The error ratio is 1.29%, 0.97%, 0.99%, <strong>and</strong> 1.13% <strong>for</strong> the Tsukuba, Sawtooth,<br />

Venus <strong>and</strong> Map image sets, respectively. These figures can be further improved through a left-right<br />

consistency check.<br />

The same authors propose a pre-processing step <strong>for</strong> correspondence search in the presence <strong>of</strong><br />

specular highlights in (Yoon & Kweon 2006b). For given input images, specular-free two-b<strong>and</strong><br />

images are generated. The similarity between pixels <strong>of</strong> these input-image representations can be<br />

measured using various correspondence search methods such as the simple SAD-based method,<br />

the adaptive support-weights method (Yoon & Kweon 2006c) <strong>and</strong>thedynamicprogramming(DP)<br />

method. This pre-processing step can be per<strong>for</strong>med in real time <strong>and</strong> compensates satisfactory <strong>for</strong><br />

specular reflections.<br />

An extension <strong>of</strong> the previous works can be found in (Gu et al. 2008). Disparity is first approximated<br />

by an ASW <strong>and</strong> a rank trans<strong>for</strong>m method, <strong>and</strong> then a compact disparity calibration approach<br />

is designed to refine the initial disparity, so an accurate result can be acquired.<br />

The work <strong>of</strong> (Hosni et al. 2009) proposes a novel support aggregation approach <strong>for</strong> stereo matching.<br />

To derive support weights, the geodesic distances <strong>for</strong> all pixels <strong>of</strong> the support window to the


18 Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong><br />

window’s center point are computed. Based on the concept <strong>of</strong> connectivity the ASW algorithm<br />

proves to be effective <strong>for</strong> obtaining improved segmentation results.<br />

Binaghi (Binaghi et al. 2004) on the other h<strong>and</strong>, have chosen to use the zero mean normalized<br />

cross correlation (ZNCC) as matching cost. This method integrates a neural network (NN) model,<br />

which uses the least-mean-square delta rule <strong>for</strong> training. The NN decides on the proper window<br />

shape <strong>and</strong> size <strong>for</strong> each support region. The results obtained are satisfactory but the 0.024 fps<br />

running speed reported <strong>for</strong> the common image sets, on a Windows plat<strong>for</strong>m with a 300 MHz<br />

processor, renders this method as not suitable <strong>for</strong> real-time applications.<br />

Based on the same matching cost function a more complex area-based method is proposed in<br />

(Mordohai & Medioni 2006). A perceptual organization framework, considering both binocular <strong>and</strong><br />

monocular cues is utilized. An initial matching is per<strong>for</strong>med by a combination <strong>of</strong> NCC techniques.<br />

The correct matches are selected <strong>for</strong> each pixel using tensor voting. Matches are then grouped into<br />

smooth surfaces. Disparities <strong>for</strong> the unmatched pixels are assigned so as to ensure smoothness in<br />

terms <strong>of</strong> both surface orientation <strong>and</strong> color. The percentage <strong>of</strong> unoccluded pixels whose absolute<br />

disparity error is greater than 1 is 3.79, 1.23, 9.76 <strong>and</strong> 4.38 <strong>for</strong> the Tsukuba, Venus, Teddy <strong>and</strong><br />

Cones image sets. The execution speed reported is about 0.002 fps <strong>for</strong> the Tsukuba image pair with<br />

20 disparity levels running on an Intel Pentium 4 processor at 2.8 MHz.<br />

There are, <strong>of</strong> course, more hardware-oriented proposals as well. Many <strong>of</strong> them take advantage<br />

<strong>of</strong> the contemporary powerful graphics machines to achieve enhanced results in terms <strong>of</strong> processing<br />

time <strong>and</strong> data volume. A hierarchical disparity estimation algorithm implemented on programmable<br />

3D graphics processing unit (GPU) is reported in (Zach et al. 2004). This method can process either<br />

rectified or uncalibrated image pairs. Bidirectional matching is utilized in conjunction with a locally<br />

aggregated sum <strong>of</strong> absolute intensity differences. This implementation, on an ATI Radeon 9700 Pro,<br />

can achieve up to 50 fps <strong>for</strong> 256x256 pixel input images.<br />

Moreover, the use <strong>of</strong> CA is exploited in (Kotoulas, Gasteratos, Sirakoulis, Georgoulas & Andreadis<br />

2005). This work presents an architecture <strong>for</strong> real-time extraction <strong>of</strong> disparity maps. It<br />

is capable <strong>of</strong> processing 1Megapixels image pairs at more than 40 fps. The core <strong>of</strong> the algorithm<br />

relies on matching pixels <strong>of</strong> each scan-line using a one-dimensional window <strong>and</strong> the SAD matching<br />

cost as described in (Kotoulas, Georgoulas, Gasteratos, Sirakoulis & Andreadis 2005). This method<br />

involves a pre-processing mean filtering step <strong>and</strong> a post-processing CA based filtering one, that can<br />

be easily implemented in hardware (Nalpantidis et al. 2007, 2008b).<br />

The main features <strong>of</strong> the discussed local algorithms are summarized in Table 2.2.<br />

Global Methods<br />

Contrary to local methods, global ones produce very accurate results. Their goal is to find the<br />

optimum disparity function d = d(x, y) which minimizes a global cost function E, which combines<br />

data <strong>and</strong> smoothness terms.<br />

E(d) =Edata(d)+λEsmooth(d) (2.1)<br />

where Edata takes into consideration the (x, y) pixel’s value throughout the image, Esmooth provides<br />

the algorithm’s smoothening assumptions <strong>and</strong> λ is a weight factor. The main disadvantage <strong>of</strong> the<br />

global methods is that they are more time consuming <strong>and</strong> computational dem<strong>and</strong>ing. The source <strong>of</strong><br />

these characteristics is the iterative refinement approaches that they employ. They can be roughly


Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong> 19<br />

Table 2.2 Characteristics <strong>of</strong> local algorithms<br />

Author Method Features Speed<br />

(fps)<br />

Muhlmann<br />

et al. (2002)<br />

Di Stefano<br />

et al. (2004)<br />

Ogale &<br />

Aloimonos<br />

(2005b)<br />

Yoon et al.<br />

(2005)<br />

Yoon &<br />

Kweon (2006a)<br />

Yoon &<br />

Kweon (2006b)<br />

Gu et al.<br />

(2008)<br />

Salmen et al.<br />

(2009)<br />

Binaghi et al.<br />

(2004)<br />

Mordohai &<br />

Medioni<br />

(2006)<br />

Zach et al.<br />

(2004)<br />

Kotoulas,<br />

Gasteratos,<br />

Sirakoulis,<br />

Georgoulas &<br />

Andreadis<br />

(2005)<br />

SAD -Color usage<br />

-Occlusion h<strong>and</strong>ling<br />

-Left-right consistency<br />

-Uniqueness constraints<br />

SAD -Occlusion h<strong>and</strong>ling<br />

-Uniqueness constraint<br />

SAD -Occlusion h<strong>and</strong>ling<br />

-Interval uniqueness<br />

constraint<br />

SAD -Occlusion h<strong>and</strong>ling<br />

-Left-right consistency<br />

check<br />

-Variable windows<br />

SAD -Color usage<br />

-Varying<br />

support-weights<br />

SAD -Color usage<br />

-Varying<br />

support-weights<br />

-Specular reflection<br />

compensation<br />

SAD -Color usage<br />

-Varying<br />

support-weights<br />

SAD -Color usage<br />

-Varying<br />

support-weights<br />

-Occlusion detection<br />

ZNCC -Varying windows based<br />

on neural networks<br />

NCC -Color usage<br />

-Occlusion h<strong>and</strong>ling<br />

-Tensor voting<br />

SAD -Occlusion h<strong>and</strong>ling<br />

-Implemented on GPU<br />

-Bidirectional matching<br />

Image Size Disparity<br />

levels<br />

Computational<br />

plat<strong>for</strong>m<br />

20 160×120 – Intel Pentium III<br />

800 MHz<br />

with 512 MB RAM<br />

39.59 320×240 16 Intel Pentium III<br />

800 MHz<br />

1 384×288 16 2.4 GHz<br />

7 320×240 32 Intel Pentium 4<br />

2.66 GHz<br />

0.016 384×288 16 AMD 2700+<br />

0.016 384×288 16 AMD 2700+<br />

– – – –<br />

– – – –<br />

0.024 284×216 30 300 MHz<br />

0.002 384×288 20 Intel Pentium<br />

2.8 MHz<br />

50 256x256 88 ATI Radeon 9700<br />

Pro<br />

SAD -Cellular automata 40 1000×1000 – –


20 Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong><br />

divided in those per<strong>for</strong>ming a global energy minimization <strong>and</strong> those pursuing the minimum <strong>for</strong><br />

independent scanlines using DP.<br />

In Figure 2.4 the main characteristics <strong>of</strong> the below discussed global algorithms are presented.<br />

It is clear that the recently published works utilizes global optimization preferably rather than<br />

DP. This observation is not a surprising one, taking into consideration the fact that under the<br />

term global optimization there are actually quite a few different methods. Additionally, DP tends<br />

to produce inferior, thus less impressive, results. There<strong>for</strong>e, applications that don’t have running<br />

speed constraints, preferably utilize global optimization methods.<br />

Fig. 2.4 Diagrammatic representation <strong>of</strong> the global methods’ categorization<br />

Legend:<br />

1. Yang et al. (2006)<br />

2. Veksler (2006)<br />

3. Yu et al. (2007)<br />

4. Yang et al. (2010)<br />

5. Klaus et al. (2006)<br />

6. Yoon & Kweon (2006c)<br />

7. Gutierrez & Marroquin<br />

(2004)<br />

8. Kim & Sohn (2005)<br />

9. Ogale & Aloimonos (2005a)<br />

10. Brockers et al. (2005)<br />

11. Hirschmuller (2005)<br />

12. Hirschmuller (2006)<br />

13. Strecha et al. (2006)<br />

14. Bleyer & Gelautz (2005)<br />

15. Hong & Chen (2004)<br />

16. (Brockers 2009)<br />

17. Zitnick et al. (2004)<br />

18. Bleyer & Gelautz (2005)<br />

19. Sun et al. (2005)<br />

20. Yang et al. (2009)<br />

21. Zitnick & Kang (2007)<br />

22. Lei et al. (2006)<br />

23. Wang et al. (2006)<br />

24. Torra & Criminisi (2004)<br />

25. Kim & Sohn (2005)<br />

26. Veksler (2005)<br />

27. Salmen et al. (2009)


Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong> 21<br />

Global Optimization<br />

The algorithms that per<strong>for</strong>m global optimization take into consideration the whole image in order<br />

to determine the disparity <strong>of</strong> every single pixel. An increasing portion <strong>of</strong> the global optimization<br />

methodologies involves segmentation <strong>of</strong> the input images according to their colors.<br />

The algorithm presented in (Bleyer & Gelautz 2005) uses color segmentation. Each segment is<br />

described by a planar model <strong>and</strong> assigned to a layer using a mean shift based clustering algorithm.<br />

Aglobalcostfunctionisusedthattakesintoaccountthesummedupabsolutedifferences, the<br />

discontinuities between segments <strong>and</strong> the occlusions. The assignment <strong>of</strong> segments to layers is iteratively<br />

updated until the cost function improves no more. The experimental results indicate that<br />

the percentage <strong>of</strong> unoccluded pixels whose absolute disparity error is greater than 1 is 1.53, 0.16<br />

<strong>and</strong> 0.22 <strong>for</strong> the Tsukuba, Venus <strong>and</strong> Sawtooth image sets, respectively.<br />

The stereo matching algorithm proposed in (Hong & Chen 2004) makes use <strong>of</strong> color segmentation<br />

in conjunction with the graph cuts method. The reference image is divided in non-overlapping<br />

segments using the mean shift color segmentation algorithm. Thus, a set <strong>of</strong> planes in the disparity<br />

space is generated. The goal <strong>of</strong> minimizing an energy function is faced in the segment rather than<br />

the pixel domain. A disparity plane is fitted to each segment using the graph cuts method. This<br />

algorithm presents good per<strong>for</strong>mance in the textureless <strong>and</strong> occluded regions as well as at disparity<br />

discontinuities. The running speed reported is 0.33 fps <strong>for</strong> a 384x288 pixel image pair when tested<br />

on a 2.4 GHz Pentium 4 PC. The percentage <strong>of</strong> bad matched pixels <strong>for</strong> the Tsukuba, Sawtooth,<br />

Venus, <strong>and</strong> Map image sets is found to be 1.23, 0.30, 0.08 <strong>and</strong> 1.49 respectively.<br />

Brockers in (Brockers 2009) uses the concept <strong>of</strong> color-dependent adaptive support weights to the<br />

definition <strong>of</strong> local support areas in cooperative stereo methods to improve the accuracy <strong>of</strong> depth<br />

estimation at object borders. The dissimilarity measure used is the ZNCC <strong>and</strong> the algorithm detects<br />

occlusions <strong>and</strong> provides sub-pixel precision. The algorithm was coded in st<strong>and</strong>ard non-optimized<br />

C++ code using full float precision <strong>and</strong> run on an 2.4 GHz Intel Core2Duo T7700. The calculation<br />

speed <strong>for</strong> the Tsukuba scene with 100 iterations was 0.05 fps using a single core <strong>and</strong> 0.09 fps using<br />

both cores.<br />

The ultimate goal <strong>of</strong> the work described in (Zitnick et al. 2004) is to render dynamic scenes<br />

with interactive viewpoint control produced by a few cameras. A suitable color segmentation-based<br />

algorithm is developed <strong>and</strong> implemented on a programmable ATI 9800 PRO GPU. Disparities within<br />

segments must vary smoothly, each image is treated equally, occlusions are modeled explicitly <strong>and</strong><br />

consistency between disparity maps is en<strong>for</strong>ced resulting in higher quality depth maps. The results<br />

<strong>for</strong> each pixel are refined in conjunction with the others.<br />

Another method that uses the concept <strong>of</strong> image color segmentation is reported in (Bleyer &<br />

Gelautz 2005). An initial disparity map is calculated using an adapting window technique, the<br />

segments are combined in larger layers iteratively <strong>and</strong> the assignment <strong>of</strong> segments to layers is<br />

optimized using a global cost function. The quality <strong>of</strong> the disparity map is measured by warping<br />

the reference image to the second view, comparing it with the real image <strong>and</strong> calculating the color<br />

dissimilarity. For the 384x288 pixel Tsukuba <strong>and</strong> the 434x383 pixel Venus test set, the algorithm<br />

produces results at 0.05 fps rate. For the 450×375 pixel Teddy image pair, the running speed<br />

decreased to 0.01 fps due to the increased scene complexity. Running speeds refer to an Intel<br />

Pentium 4 2.0 GHz processor. The root mean square error obtained is 0.73 <strong>for</strong> the Tsukuba, 0.31<br />

<strong>for</strong> the Venus <strong>and</strong> 1.07 <strong>for</strong> the Teddy image pair.<br />

Moreover, Sun <strong>and</strong> his colleagues in (Sun et al. 2005) presented a method which treats the<br />

two images <strong>of</strong> a stereo pair symmetrically within an energy minimization framework that can also


22 Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong><br />

embody color segmentation as a s<strong>of</strong>t constraint. This method en<strong>for</strong>ces that the occlusions in the<br />

reference image are consistent with the disparities found <strong>for</strong> the other image. Belief propagation<br />

iteratively refines the results. Moreover, results <strong>for</strong> the version <strong>of</strong> the algorithm that incorporates<br />

segmentation are better. The percentage <strong>of</strong> pixels with disparity error larger than 1 is 0.97, 0.19,<br />

0.16 <strong>and</strong> 0.16 <strong>for</strong> the Tsukuba, Sawtooth, Venus <strong>and</strong> Map image sets, respectively. The running<br />

speed <strong>for</strong> the a<strong>for</strong>ementioned data sets is about 0.02 fps tested on a 2.8 GHz Pentium 4 processor.<br />

Color segmentation is utilized in (Klaus et al. 2006) as well. The matching cost used here is a<br />

self-adapting dissimilarity measure that takes into account the sum <strong>of</strong> absolute intensity differences<br />

as well as a gradient based measure. Disparity planes are extracted using an insensitive to outliers<br />

technique. Disparity plane labeling is per<strong>for</strong>med using belief propagation. Execution speed varies<br />

between 0.07 <strong>and</strong> 0.04 fps on a 2.21 GHz AMD Athlon 64 processor. The results indicate 1.13, 0.10,<br />

4.22 <strong>and</strong> 2.48 percent <strong>of</strong> bad matched pixels in non-occluded areas <strong>for</strong> the Tsukuba, Venus, Teddy<br />

<strong>and</strong> Cones image sets, respectively.<br />

Finally, one more algorithm that utilizes energy minimization, color segmentation, plane fitting<br />

<strong>and</strong> repeated application <strong>of</strong> hierarchical belief propagation is presented in (Yang et al. 2009). This<br />

algorithm takes into account a color-weighted correlation measure. Discontinuities <strong>and</strong> occlusions<br />

are properly h<strong>and</strong>led. The percentage <strong>of</strong> pixels with disparity error larger than 1 is 0.88, 0.14, 3.55<br />

<strong>and</strong> 2.90 <strong>for</strong> the Tsukuba, Venus, Teddy <strong>and</strong> Cones image sets, respectively.<br />

In (Yoon & Kweon 2006c) twonewsymmetriccostfunctions<strong>for</strong>globalstereomethodsare<br />

proposed. A symmetric data cost function <strong>for</strong> the likelihood, as well as a symmetric discontinuity<br />

cost function <strong>for</strong> the prior in the Markov r<strong>and</strong>om field (MRF) model <strong>for</strong> stereo is presented. Both<br />

the reference image <strong>and</strong> the target image are taken into account to improve per<strong>for</strong>mance without<br />

modeling half-occluded pixels explicitly <strong>and</strong> without using color segmentation. The use <strong>of</strong> both <strong>of</strong><br />

the two proposed symmetric cost functions in conjunction with a belief propagation based stereo<br />

method is evaluated. Experimental results <strong>for</strong> st<strong>and</strong>ard test bed images show that the per<strong>for</strong>mance<br />

<strong>of</strong> the belief propagation based stereo method is greatly improved by the combined use <strong>of</strong> the<br />

proposed symmetric cost functions. The percentage <strong>of</strong> pixels badly matched <strong>for</strong> the non-occluded<br />

areas was found 1.07, 0.69, 0.64 <strong>and</strong> 1.06 <strong>for</strong> the Tsukuba, Sawtooth, Venus <strong>and</strong> Map image sets,<br />

respectively.<br />

A method based on the Bayesian estimation theory with a prior MRF model <strong>for</strong> the assigned<br />

disparities is described in (Gutierrez & Marroquin 2004). The continuity, coherence <strong>and</strong> occlusion<br />

constraints as well as the adjacency principal are taken into account. The optimal estimator is<br />

computed using a Gauss-Markov r<strong>and</strong>om field model <strong>for</strong> the corresponding posterior marginals,<br />

which results in a diffusion process in the probability space. The results are accurate but the<br />

algorithm is not suitable <strong>for</strong> real-time applications, since it needs a few minutes to process a 256x255<br />

stereo pair with up to 32 disparity levels, on an Intel Pentium III running at 450 MHz.<br />

On the other h<strong>and</strong>, Strecha <strong>and</strong> his colleagues in (Strecha et al. 2006) treat every pixel <strong>of</strong> the<br />

input images as generated either by a process, responsible <strong>for</strong> the pixels visible from the reference<br />

camera <strong>and</strong> which obey to the constant brightness assumption, or by an outlier process, responsible<br />

<strong>for</strong> the pixels that cannot be corresponded. Depth <strong>and</strong> visibility are jointly modeled as a hidden<br />

MRF, <strong>and</strong> the spatial correlations <strong>of</strong> both are explicitly accounted <strong>for</strong> by defining a suitable Gibbs<br />

prior distribution. An expectation maximization (EM) algorithm keeps track <strong>of</strong> which points <strong>of</strong><br />

the scene are visible in which images, <strong>and</strong> accounts <strong>for</strong> visibility configurations. The percentages<br />

<strong>of</strong> pixels with disparity error larger than 1 are 2.57, 1.72, 6.86 <strong>and</strong> 4.64 <strong>for</strong> the Tsukuba, Venus,<br />

Teddy <strong>and</strong> Cones image sets, respectively.


Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong> 23<br />

Moreover, a stereo method specifically designed <strong>for</strong> image-based rendering is described in (Zitnick<br />

& Kang 2007). This algorithm uses over-segmentation <strong>of</strong> the input images <strong>and</strong> computes matching<br />

values over entire segments rather than single pixels. Color-based segmentation preserves object<br />

boundaries. The depths <strong>of</strong> the segments <strong>for</strong> each image are computed using loopy belief propagation<br />

within a MRF framework. Occlusions are also considered. The percentage <strong>of</strong> bad matched pixels in<br />

the unoccluded regions is 1.69, 0.50, 6.74 <strong>and</strong> 3.19 <strong>for</strong> the Tsukuba, Venus, Teddy <strong>and</strong> Cones image<br />

sets, respectively. The a<strong>for</strong>ementioned results refer to a 2. 8GHz PC plat<strong>for</strong>m.<br />

In (Hirschmuller 2005) an algorithm based on a hierarchical calculation <strong>of</strong> a mutual in<strong>for</strong>mation<br />

based matching cost is proposed. Its goal is to minimize a proper global energy function, not by<br />

iterative refinements but by aggregating matching costs <strong>for</strong> each pixel from all directions. The final<br />

disparity map is sub-pixel accurate <strong>and</strong> occlusions are detected. The processing speed <strong>for</strong> the Teddy<br />

image set is 0.77 fps. The error in unoccluded regions is found less than 3% <strong>for</strong> all the st<strong>and</strong>ard<br />

image sets. Calculations are made on an Intel Xeon processor running at 2.8 GHz.<br />

An enhanced version <strong>of</strong> the previous method is proposed by the same author in (Hirschmuller<br />

2006). Mutual in<strong>for</strong>mation is once again used as cost function. The extensions applied in it result<br />

in intensity consistent disparity selection <strong>for</strong> untextured areas <strong>and</strong> discontinuity preserving interpolation<br />

<strong>for</strong> filling holes in the disparity maps. It treats successfully complex shapes <strong>and</strong> uses planar<br />

models <strong>for</strong> untextured areas. Bidirectional consistency check, sub-pixel estimation as well as invalid<br />

disparities interpolation are per<strong>for</strong>med. The experimental results indicate that the percentages <strong>of</strong><br />

bad matching pixels in unoccluded regions are 2.61, 0.25, 5.14 <strong>and</strong> 2.77 <strong>for</strong> the Tsukuba, Venus,<br />

Teddy <strong>and</strong> Cones image sets, respectively, with 64 disparity levels searched each time. However, the<br />

reported running speed on a 2.8 GHz PC is less than 1 fps.<br />

The work done by Kim <strong>and</strong> Sohn (Kim & Sohn 2005) introduces a two-stage algorithm consisting<br />

<strong>of</strong> hierarchical dense disparity estimation <strong>and</strong> vector field regularization. The dense disparity estimation<br />

is accomplished by a region dividing technique that uses a Canny edge detector <strong>and</strong> a simple<br />

SAD function. The results are refined by regularizing the vector fields by means <strong>of</strong> minimizing an<br />

energy function. The root mean square error obtained from this method is 0.9278 <strong>and</strong> 0.9094 <strong>for</strong><br />

the Tsukuba <strong>and</strong> Sawtooth image pairs. The running speed is 0.15 fps <strong>and</strong> 0.105 fps respectively<br />

on a Pentium 4 PC running Windows XP.<br />

An uncommon measure is used by Ogale <strong>and</strong> Aloimonos in (Ogale & Aloimonos 2005a). This<br />

work describes an algorithm which is focused on achieving contrast invariant stereo matching. It<br />

relies on multiple spatial frequency channels <strong>for</strong> local matching. The measure <strong>for</strong> this stage is the<br />

deviation <strong>of</strong> phase difference from zero. The global solution is found by a fast non-iterative left<br />

right diffusion process. Occlusions are found by en<strong>for</strong>cing the uniqueness constraint. The algorithm<br />

is able to h<strong>and</strong>le significant changes in contrast between the two images <strong>and</strong> can h<strong>and</strong>le noise in<br />

one <strong>of</strong> the frequency channels. The Matlab implementation <strong>of</strong> the algorithm is capable <strong>of</strong> processing<br />

the Middlebury image pairs at 0.5 to 0.25 fps rate, on a 2 GHz computer plat<strong>for</strong>m.<br />

The method described in (Brockers et al. 2005) uses a cost relaxation approach. A similarity<br />

measurement is obtained as a preliminary stage <strong>of</strong> the relaxation process. Relaxation is an iterative<br />

process that minimizes a global cost function while taking into account the continuity constraint <strong>and</strong><br />

the neighbor-pixel expected similarity. The support regions are 3D within the disparity space volume<br />

<strong>and</strong> have Gaussian weights. The disparity is available at any time <strong>of</strong> the iteratively refinement<br />

phase, having <strong>of</strong> course diminished accuracy <strong>for</strong> little iteration cycles. This feature makes this<br />

method suitable <strong>for</strong> time-critical applications. The percentages <strong>of</strong> bad matching pixels in unoccluded<br />

regions are found to be 4.76, 1.41, 8.18 <strong>and</strong> 3.91 <strong>for</strong> the Tsukuba, Venus, Teddy <strong>and</strong> Cones image<br />

sets, respectively.


24 Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong><br />

In (Yu et al. 2007) the feasibility <strong>of</strong> applying compression techniques to the messages in the<br />

belief propagation algorithm in order to improve the efficiency is studied. A compression scheme<br />

called envelope point trans<strong>for</strong>m is proposed. Experimental results on dense stereo reconstruction<br />

have shown that envelope point trans<strong>for</strong>m-based belief propagation can achieve 8 times or more<br />

compression without significant loss <strong>of</strong> depth accuracy.<br />

The work <strong>of</strong> (Yang et al. 2010) considers the problem <strong>of</strong> stereo matching using loopy belief<br />

propagation. The algorithm hierarchically reduces the disparity search range. By fixing the number<br />

<strong>of</strong> disparity levels on the original resolution, this method solves the message updating problem in<br />

atimelinearinthenumber<strong>of</strong>pixelscontainedintheimage<strong>and</strong>requiresconstantmemoryspace.<br />

Specifically, <strong>for</strong> a 800×600 image with 300 disparity levels the message updating method achieves<br />

execution speed <strong>of</strong> 0.67fps <strong>and</strong> requires little memory. The used plat<strong>for</strong>m was s 2.5 GHz Intel Core<br />

2Duoprocessor.<br />

Another algorithm that generates high quality results in real time is reported in (Yang et al. 2006).<br />

It is based on the minimization <strong>of</strong> a global energy function comprising <strong>of</strong> a data <strong>and</strong> a smoothness<br />

term. The hierarchical belief propagation iteratively optimizes the smoothness term but it achieves<br />

fast convergence by removing redundant computations involved. In order to accomplish real-time<br />

operation authors take advantage <strong>of</strong> the parallelism <strong>of</strong> graphics hardware (GPU). Experimental<br />

results indicate 16 fps processing speed <strong>for</strong> 320×240 pixel self-recorded images with 16 disparity<br />

levels. The percentages <strong>of</strong> bad matching pixels in unoccluded regions <strong>for</strong> the Tsukuba, Venus, Teddy<br />

<strong>and</strong> Cones image sets are found to be 1.49, 0.77, 8.72 <strong>and</strong> 4.61. The computer used is a 3 GHz PC<br />

<strong>and</strong> the GPU is an NVIDIA GeForce 7900 GTX graphics card with 512M video memory.<br />

The work <strong>of</strong> Veksler in (Veksler 2006) indicates that computational cost <strong>of</strong> the graph cuts stereo<br />

correspondence technique can be efficiently decreased using the results <strong>of</strong> a simple local stereo<br />

algorithm to limit the disparity search range. The idea is to analyze <strong>and</strong> exploit the failures <strong>of</strong> local<br />

correspondence algorithms. This method can accelerate the processing by a factor <strong>of</strong> 2.8, compared<br />

to the sole use <strong>of</strong> graph cuts, while the resulting energy is worse only by an average <strong>of</strong> 1.7%. These<br />

figures resulted from an analysis done on a large dataset <strong>of</strong> 32 stereo pairs using a Pentium 4 at 2.6<br />

GHz PC. This is a considerable improvement in efficiency gained <strong>for</strong> a small price in accuracy, <strong>and</strong><br />

it moves the graph-cuts based algorithms closer to real-time implementation. The running speeds<br />

are 0.77, 0.38, 0.16, 0.17, 0.53 <strong>and</strong> 1.04 fps <strong>for</strong> the Tsukuba, Venus, Teddy, Cones, Sawtooth <strong>and</strong><br />

Map image sets, respectively while the corresponding error percentages are found 2.22, 1.39, 12.8,<br />

8.87, 1.18 <strong>and</strong> 0.51.<br />

The main features <strong>of</strong> the discussed algorithms that utilize global optimization are summarized<br />

in Table 2.3.<br />

Table 2.3 Characteristics <strong>of</strong> global algorithms that use global optimization<br />

Author Method Features Speed<br />

(fps)<br />

Bleyer &<br />

Gelautz (2005)<br />

Global cost<br />

function<br />

-Color<br />

segmentation<br />

-Occlusion<br />

h<strong>and</strong>ling<br />

Image<br />

Size<br />

Disparity<br />

levels<br />

Computational<br />

plat<strong>for</strong>m<br />

– – – –<br />

Continued on next page


Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong> 25<br />

Author Method Features Speed<br />

(fps)<br />

Hong & Chen<br />

(2004)<br />

Brockers (2009) Global cost<br />

function<br />

Zitnick et al.<br />

(2004)<br />

Bleyer &<br />

Gelautz (2005)<br />

Sun et al.<br />

(2005)<br />

Klaus et al.<br />

(2006)<br />

Yang et al.<br />

(2009)<br />

Yoon & Kweon<br />

(2006c)<br />

Gutierrez &<br />

Marroquin<br />

(2004)<br />

Graph cuts -Color<br />

segmentation<br />

-Occlusion<br />

h<strong>and</strong>ling<br />

Global cost<br />

function<br />

Global cost<br />

function<br />

Belief<br />

propagation<br />

Belief<br />

propagation<br />

Hierarchical<br />

belief<br />

propagation<br />

Belief<br />

propagation<br />

Gauss-Markov<br />

r<strong>and</strong>om field<br />

-Varying<br />

support-weights<br />

-Cooperative<br />

optimization<br />

-Occlusion<br />

detection<br />

-Color<br />

segmentation<br />

-Occlusion<br />

h<strong>and</strong>ling<br />

-GPU utilization<br />

-Color<br />

segmentation<br />

-Occlusion<br />

h<strong>and</strong>ling<br />

-Color<br />

segmentation<br />

-Occlusion<br />

h<strong>and</strong>ling<br />

-Symmetricaly<br />

treatment<br />

-Color<br />

segmentation<br />

-Color<br />

segmentation<br />

-Occlusion<br />

h<strong>and</strong>ling<br />

-Color<br />

segmentation<br />

-Symmetrical<br />

cost functions<br />

-Occlusion<br />

h<strong>and</strong>ling<br />

-Continuity<br />

-Coherence<br />

-Adjacency<br />

Table 2.3 – continued from previous page<br />

Image<br />

Size<br />

Disparity<br />

levels<br />

Computational<br />

plat<strong>for</strong>m<br />

0.33 384x288 16 Intel Pentium 4<br />

2.4 GHz<br />

0.09 384x288 16 2.4GHz Intel<br />

Core2Duo T7700<br />

– – – ATI 9800 PRO<br />

GPU<br />

0.05 384x288 16 Intel Pentium 4<br />

2.0 GHz<br />

0.02 384x288 16 Intel Pentium 4<br />

2.8 GHz<br />

0.07 384x288 16 AMD Athlon 64<br />

2.21 GHz<br />

– – – –<br />

– – – –<br />


26 Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong><br />

Author Method Features Speed<br />

(fps)<br />

Strecha et al.<br />

(2006)<br />

Zitnick & Kang<br />

(2007)<br />

Hirschmuller<br />

(2005)<br />

Hirschmuller<br />

(2006)<br />

Kim & Sohn<br />

(2005)<br />

Ogale &<br />

Aloimonos<br />

(2005a)<br />

Brockers et al.<br />

(2005)<br />

Yang et al.<br />

(2010)<br />

Yang et al.<br />

(2006)<br />

Hidden Markov<br />

r<strong>and</strong>om field<br />

Belief<br />

propagation<br />

within a MRF<br />

framework<br />

Global cost<br />

function<br />

Global cost<br />

function<br />

Vector field<br />

regularization<br />

Left-right<br />

difusion process<br />

-Occlusion<br />

h<strong>and</strong>ling<br />

-EM algorithm<br />

-Color<br />

segmentation<br />

-Occlusion<br />

h<strong>and</strong>ling<br />

-Occlusion<br />

h<strong>and</strong>ling<br />

-Mutual<br />

in<strong>for</strong>mation<br />

-Occlusion<br />

h<strong>and</strong>ling<br />

-Mutual<br />

in<strong>for</strong>mation<br />

-Bidirectional<br />

match<br />

-Occlusion<br />

h<strong>and</strong>ling<br />

-Canny edge<br />

detector<br />

-Occlusion<br />

h<strong>and</strong>ling<br />

-Phase-based<br />

matching<br />

Cost relaxation -Occlusion<br />

h<strong>and</strong>ling<br />

-3D support<br />

regions<br />

Hierarchical<br />

belief<br />

propagation<br />

Hierarchical<br />

belief<br />

propagation<br />

Table 2.3 – continued from previous page<br />

Image<br />

Size<br />

Disparity<br />

levels<br />

Computational<br />

plat<strong>for</strong>m<br />

– – – –<br />

– – – 2.8 GHz<br />

0.77 450×375 60 Intel Xeon<br />

2.8 GHz<br />


Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong> 27<br />

Dynamic Programming<br />

Many researchers develop stereo correspondence algorithms based on DP. This methodology is a<br />

fair trade-<strong>of</strong>f between the complexity <strong>of</strong> the computations needed <strong>and</strong> the quality <strong>of</strong> the results<br />

obtained. In every aspect, DP st<strong>and</strong>s between the local algorithms <strong>and</strong> the global optimization<br />

ones. However, its computational complexity still renders it as a less preferable option <strong>for</strong> hardware<br />

implementation.<br />

The work <strong>of</strong> Torra <strong>and</strong> Criminisi (Torra & Criminisi 2004) presents a unified framework that<br />

allows the fusion <strong>of</strong> any partial knowledge about disparities, such as matched features <strong>and</strong> known<br />

surfaces within the scene. It combines the results from corner, edge <strong>and</strong> dense stereo matching<br />

algorithms to impose constraints that act as guide points to the st<strong>and</strong>ard DP method. The result<br />

is a fully automatic dense stereo system with up to four times faster running speed <strong>and</strong> greater<br />

accuracy compared to results obtained by the sole use <strong>of</strong> DP.<br />

Moreover, a generalized ground control points (GGCP) scheme is introduced in (Kim et al. 2005).<br />

One or more disparity c<strong>and</strong>idates <strong>for</strong> the true disparity <strong>of</strong> each pixel are assigned by local matching<br />

using oriented spatial filters. Afterwards, a two-pass DP technique that per<strong>for</strong>ms optimization both<br />

along <strong>and</strong> between the scanlines is applied. The result is the reduction <strong>of</strong> false matches as well<br />

as <strong>of</strong> the typical inter-scanline inconsistency problem. The percentage <strong>of</strong> bad matched pixels in<br />

unoccluded regions is 1.53, 0.61, 0.94 <strong>and</strong> 0.706 <strong>for</strong> the Tsukuba, Sawtooth, Venus <strong>and</strong> Map image<br />

sets. The running speeds, tested on a Pentium 4 at 2.4 GHz PC, vary from 0.23 fps <strong>for</strong> the Tsukuba<br />

set with 15 disparity levels down to 0.08 fps <strong>for</strong> the Sawtooth set with 21 disparity levels.<br />

Salmen in (Salmen et al. 2009) presents a refined DP stereo processing algorithm.The concept <strong>of</strong><br />

multi-path backtracking to exploit the in<strong>for</strong>mation gained from DP more effectively is introduced.<br />

All parameters <strong>of</strong> the algorithm are automatically <strong>of</strong>fline tuned by an evolutionary algorithm. The<br />

number <strong>of</strong> incorrect disparities was reduced by 40% compared to the DP reference implementation,<br />

while the overall complexity increased only slightly. The processing speed <strong>for</strong> the Tsukuba image<br />

set is 5 fps, Venus 2.5 fps, Teddy 1.25 fps <strong>and</strong> Cones 1.25 fps on a st<strong>and</strong>ard desktop PC with a 1.8<br />

GHz processor.<br />

Wang et al. in (Wang et al. 2006) present a stereo algorithm that combines high quality results<br />

with real-time per<strong>for</strong>mance. DP is used in conjunction with an adaptive aggregation step. The perpixel<br />

matching costs are aggregated in the vertical direction only resulting in improved inter-scanline<br />

consistency <strong>and</strong> sharp object boundaries. This work exploits the color <strong>and</strong> distance proximity-based<br />

weight assignment <strong>for</strong> the pixels inside a fixed support window as reported in (Yoon & Kweon<br />

2006a). The real-time per<strong>for</strong>mance is achieved due to the parallel use <strong>of</strong> the CPU <strong>and</strong> the GPU <strong>of</strong><br />

a computer. This implementation can process 320×240 pixel images with 16 disparity levels at 43.5<br />

fps <strong>and</strong> 640×480 pixel images with 16 disparity levels at 9.9 fps. The test system is a 3.0 GHz PC<br />

with an ATI Radeon XL1800 GPU.<br />

On the contrary, the algorithm proposed in (Veksler 2005) applies the DP method not across<br />

individual scanlines but to a tree structure. Thus the minimization procedure accounts <strong>for</strong> all the<br />

pixels <strong>of</strong> the image, compensating the known streaking. Reported running speed is a couple <strong>of</strong><br />

frames per second <strong>for</strong> the tested image pairs. So, real-time implementations are feasible. However,<br />

the results’ quality is comparable to that <strong>of</strong> the time-consuming global methods. The reported<br />

results <strong>of</strong> bad matched pixels percentages are 1.77, 1.44, 1.21, <strong>and</strong> 1.45 <strong>for</strong> the tested Tsukuba,<br />

Sawtooth, Venus <strong>and</strong> Map image sets, respectively.<br />

In (Lei et al. 2006) the pixel-tree approach <strong>of</strong> the previous work is replaced by a region-tree<br />

one. First <strong>of</strong> all, the image is color-segmented using the mean-shift algorithm. During the stereo


28 Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong><br />

matching, a corresponding energy function defined on such a region-tree structure is optimized using<br />

the DP technique. Occlusions are h<strong>and</strong>led by compensating <strong>for</strong> border occlusions <strong>and</strong> by applying<br />

cross checking. The obtained results indicate that the percentage <strong>of</strong> the bad matched pixels in<br />

unoccluded regions is 1.39, 0.22, 7.42 <strong>and</strong> 6.31 <strong>for</strong> the Tsukuba, Venus, Teddy <strong>and</strong> Cones image<br />

sets. The running speed, on a 1.4 GHz Intel Pentium M processor, ranges from 0.1 fps <strong>for</strong> the<br />

Tsukuba dataset with 16 disparity levels to 0.04 fps <strong>for</strong> the Cones dataset with 60 disparity levels.<br />

The main features <strong>of</strong> the discussed global algorithms that utilize DP are summarized in Table<br />

2.4.<br />

Table 2.4 Characteristics <strong>of</strong> global algorithms that use DP<br />

Author Method Features Speed<br />

(fps)<br />

Torra &<br />

Criminisi<br />

(2004)<br />

Kim & Sohn<br />

(2005)<br />

Salmen et al.<br />

(2009)<br />

Wang et al.<br />

(2006)<br />

DP -Occlusion h<strong>and</strong>ling<br />

-Prior feature matching<br />

DP -Occlusion h<strong>and</strong>ling<br />

-Prior disparity<br />

c<strong>and</strong>idate assignment<br />

-Two-pass inter-scanline<br />

optimization<br />

DP -Multi-path<br />

backtracking<br />

DP -Color usage<br />

-Interscanline<br />

consistency<br />

-Adaptive aggregation<br />

-Parallel usage <strong>of</strong> CPU<br />

<strong>and</strong> GPU<br />

Veksler (2005) DP -Applied to pixel-tree<br />

structure<br />

Lei et al.<br />

(2006)<br />

Other Methods<br />

DP -Occlusion h<strong>and</strong>ling<br />

-Color usage<br />

-Applied to region-tree<br />

structure<br />

Image Size Disparity<br />

levels<br />

Computational<br />

plat<strong>for</strong>m<br />

– – – –<br />

0.23 384x288 16 Intel Pentium 4<br />

2.4 GHz<br />

5 384x288 16 1.8 GHz<br />

43.5 320×240 16 -3.0 GHz CPU<br />

-ATI Radeon<br />

XL1800 GPU<br />

∼2 – – –<br />

0.1 384x288 16 1.4 GHz Intel<br />

Pentium M<br />

There are <strong>of</strong> course other methods, producing dense disparity maps, which can be placed in neither<br />

<strong>of</strong> previous categories. The below discussed methods use either wavelet-based techniques or<br />

combinations <strong>of</strong> various techniques.<br />

Such a method, based on the continuous wavelet trans<strong>for</strong>m (CWT) is found in (Huang & Dubois<br />

2004). It makes use <strong>of</strong> the redundant in<strong>for</strong>mation that results from the CWT. Using 1D orthogonal


Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong> 29<br />

<strong>and</strong> biorthogonal wavelets as well as 2D orthogonal wavelet the maximum matching rate obtained<br />

is 88.22% <strong>for</strong> the Tsukuba pair. Upsampling the pixels in the horizontal direction by a factor <strong>of</strong> two,<br />

through zero insertion, further decreases the noise <strong>and</strong> the matching rate is increased to 84.91%.<br />

Another work (Liu et al. 2006) presents an algorithm based on non-uni<strong>for</strong>m rational B-splines<br />

(NURBS) curves. The curves replace the edges extracted with a wavelet based method. The NURBS<br />

are projective invariant <strong>and</strong> so they reduce false matches due to distortion <strong>and</strong> image noise. <strong>Stereo</strong><br />

matching is then obtained by estimating the similarity between projections <strong>of</strong> curves <strong>of</strong> an image<br />

<strong>and</strong> curves <strong>of</strong> another image. A 96.5% matching rate <strong>for</strong> a self recorded image pair is reported <strong>for</strong><br />

this method.<br />

Finally, a different way <strong>of</strong> confronting the stereo matching issue is proposed in (De Cubber<br />

et al. 2008). The authors, comprehending that there is no all-satisfying method, investigate the<br />

possibility <strong>of</strong> fusing the results from spatially differentiated (stereo vision) scenery images with<br />

those from temporally differentiated (structure from motion) ones. This method takes advantage <strong>of</strong><br />

both method’s merits improving the per<strong>for</strong>mance.<br />

The main features <strong>of</strong> the discussed algorithms that cannot be clearly assigned to any <strong>of</strong> the<br />

a<strong>for</strong>ementioned categories are summarized in Table 2.5.<br />

Table 2.5 Characteristics <strong>of</strong> the algorithms that cannot be clearly assigned to any category<br />

Author Method Features<br />

Huang & Dubois (2004) Continuous wavelet trans<strong>for</strong>m -1D orthogonal <strong>and</strong> biorthogonal<br />

wavelets<br />

-2D orthogonal wavelet<br />

Liu et al. (2006) Wavelet-based Non-uni<strong>for</strong>m rational B-splines curves<br />

De Cubber et al. (2008) Intensity-based <strong>Stereo</strong> vision <strong>and</strong> structure from<br />

motion fusion<br />

2.1.2 Sparse Disparity Algorithms<br />

Algorithms resulting in sparse, or semi-dense, disparity maps tend to be less attractive as most <strong>of</strong><br />

the contemporary applications require dense disparity in<strong>for</strong>mation. Though, they are very useful<br />

when fast depth estimation is required <strong>and</strong> at the same time detail, in the whole picture, is not<br />

so important. This type <strong>of</strong> algorithms tends to focus on the main features <strong>of</strong> the images leaving<br />

occluded <strong>and</strong> poorly textured areas unmatched. Consequently high processing speeds, accurate results<br />

but with limited density are achieved. Very interesting ideas flourish in this direction but since<br />

contemporary interest is directed towards dense disparity maps, only a few indicatory algorithms<br />

are discussed here.<br />

Veksler in (Veksler 2002) presents an algorithm that detects <strong>and</strong> matches dense features between<br />

the left <strong>and</strong> right images <strong>of</strong> a stereo pair, producing a semi-dense disparity map. A dense feature<br />

is a connected set <strong>of</strong> pixels in the left image <strong>and</strong> a corresponding set <strong>of</strong> pixels in the right image<br />

such that the intensity edges on the boundary <strong>of</strong> these sets are stronger than their matching error.


30 Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong><br />

All these are computed during the stereo matching process. The algorithm computes 1 fps with<br />

14 disparity levels <strong>for</strong> the Tsukuba pair producing 66% density <strong>and</strong> 0.06% average error in the<br />

non-occluded regions.<br />

Another method developed by Veksler (Veksler 2003) is based on the same basic concepts as the<br />

<strong>for</strong>mer one. The main difference is that this one uses the graph cuts algorithm <strong>for</strong> the dense feature<br />

extraction. As a consequence this algorithm produces semi-dense results with significant accuracy<br />

in areas where features are detected. The results are significantly better considering density <strong>and</strong><br />

error percentage but require longer running times. For the Tsukuba pair it achieves a density up to<br />

75%, the total error in the non-occluded regions is 0.36% <strong>and</strong> the running speed is 0.17 fps. For the<br />

Sawtooth pair the corresponding results are 87%, 0.54% <strong>and</strong> 0.08 fps. All the results are obtained<br />

from a Pentium III PC running at 600 MHz.<br />

On the other h<strong>and</strong>, Gong <strong>and</strong> Yang in their paper (Gong & Yang 2005a) proposeaDPalgorithm,<br />

called reliability-based dynamic programming (RDP) that uses a different measure to evaluate the<br />

reliabilities <strong>of</strong> matches. According to this the reliability <strong>of</strong> a proposed match is the cost difference<br />

between the globally best disparity assignment that includes the match <strong>and</strong> the one that does not<br />

include it. The interscanline consistency problem, common to the DP algorithms, is reduced through<br />

a reliability thresholding process. The result is a semi-dense unambiguous disparity map with 76%<br />

density, 0.32% error rate <strong>and</strong> 16 fps <strong>for</strong> the Tsukuba <strong>and</strong> 72% density, 0.23% error rate <strong>and</strong> 7 fps<br />

<strong>for</strong> the Sawtooth image pair. Accordingly, the results <strong>for</strong> Venus <strong>and</strong> Map pairs are 73%, 0.18%, 6.4<br />

fps <strong>and</strong> 86%, 0.7%, 12.8 fps. As a result, the reported execution speeds, tested on a 2 GHz Pentium<br />

4 PC, are encouraging <strong>for</strong> real-time operation if a semi-dense disparity map is acceptable.<br />

Asimilartothepreviousonenear-real-timestereomatchingtechniqueispresentedin(Gong<br />

&Yang2005b) by the same authors, which is also based on the RDP algorithm. This algorithm<br />

can generate semi-dense disparity maps. Two orthogonal RDP passes are used to search <strong>for</strong> reliable<br />

disparities along both horizontal <strong>and</strong> vertical scanlines. Hence, the interscanline consistency<br />

is explicitly en<strong>for</strong>ced. It takes advantage <strong>of</strong> the computational power <strong>of</strong> programmable graphics<br />

hardware, which further improves speed. The algorithm is tested on an Intel Pentium 4 computer<br />

running at 3 GHz with a programmable ATI Radeon 9800 XT GPU equipped with 256MB video<br />

memory. It results in 85% dense disparity map with 0.3% error rate at 23.8 fps <strong>for</strong> the Tsukuba<br />

pair, 93% density, 0.24% error rate at 12.3 fps <strong>for</strong> the Sawtooth pair, 86% density, 0.21% error rate<br />

at 9.2 fps <strong>for</strong> the Venus pair <strong>and</strong> 88% density, 0.05% error rate at 20.8 fps <strong>for</strong> the Map image pair.<br />

If needed, the method can also be used to generate more dense disparity maps deteriorating the<br />

execution speed.<br />

The main features <strong>of</strong> the discussed algorithms that produce sparse output are summarized in<br />

Table 2.6.<br />

2.2 Hardware <strong>Implementation</strong>s <strong>of</strong> <strong>Stereo</strong> Correspondence Algorithms<br />

While the a<strong>for</strong>ementioned categorization involves stereo matching algorithms in general, in practice<br />

it is valuable <strong>for</strong> s<strong>of</strong>tware implemented algorithms only. S<strong>of</strong>tware implementations make use<br />

<strong>of</strong> general purpose personal computers (PC) <strong>and</strong> usually result in considerably long running times.<br />

However, this is not an option when the objective is the development <strong>of</strong> autonomous robotic plat<strong>for</strong>ms,<br />

simultaneous localization <strong>and</strong> mapping (SLAM) or virtual reality (VR) systems. Such tasks<br />

require efficient real-time per<strong>for</strong>mance, dem<strong>and</strong> dedicated hardware <strong>and</strong> consequently specially de-


Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong> 31<br />

Table 2.6 Characteristics <strong>of</strong> the algorithms that produce sparse output<br />

Author Method Density Speed<br />

(fps)<br />

Image Size Disparity<br />

levels<br />

Computational<br />

plat<strong>for</strong>m<br />

Veksler (2002) Local 66 1 384x288 14 –<br />

Veksler (2003) Graph<br />

cuts<br />

Gong & Yang<br />

(2005a)<br />

Gong & Yang<br />

(2005b)<br />

75 0.17 384x288 16 Intel Pentium III<br />

600 MHz<br />

RDP 76 16 384x288 16 Intel Pentium 4 2<br />

GHz<br />

RDP 85 23.8 384x288 16 -Intel Pentium 4 3<br />

GHz CPU<br />

-ATI Radeon 9800<br />

XT GPU<br />

veloped <strong>and</strong> optimized algorithms. Only a small subset <strong>of</strong> the already proposed algorithms is suitable<br />

<strong>for</strong> hardware implementation. Hardware implemented algorithms are characterized from their<br />

theoretical algorithm as well as the implementation itself. There are two broad classes <strong>of</strong> hardware<br />

implementations: the field-programmable gate arrays (FPGA) <strong>and</strong> the application-specific<br />

integrated circuits (ASIC) based ones. Figure 2.5(a) depicts an ASIC chip <strong>and</strong> 2.5(b) an FPGA<br />

development board. Each one can execute stereo vision algorithms without the necessity <strong>of</strong> a PC,<br />

saving volume, weight <strong>and</strong> consumed energy. However, the evolution <strong>of</strong> FPGA has made them an<br />

appealing choice due to the small prototyping times, their flexibility <strong>and</strong> their good per<strong>for</strong>mance.<br />

(a) (b)<br />

Fig. 2.5 An ASIC chip (a) <strong>and</strong> a FPGA development board (b)


32 Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong><br />

There are many applications, as mentioned above, that dem<strong>and</strong> extraction <strong>of</strong> the disparity map<br />

from image pairs in real-time. Moreover, most <strong>of</strong> these applications dem<strong>and</strong> dense output. PCs<br />

due to their serial-processing architecture find it difficult to meet these requirements. This problem<br />

can be efficiently confronted by the use <strong>of</strong> dedicated hardware. In addition, the need <strong>for</strong> dedicated<br />

hardware is more evident in the case <strong>of</strong> autonomous units, where the existence <strong>of</strong> a PC is not a<br />

convenient solution. Hardware implementations can accelerate the per<strong>for</strong>mance <strong>of</strong> the stereo vision<br />

systems. They are able to provide the parallelism that is commonly useful in image processing<br />

<strong>and</strong> vision algorithms. In particular, regular <strong>and</strong> simple structures such as CA or basic filtering<br />

modules can be easily <strong>and</strong> efficiently implemented in hardware. By processing several parts <strong>of</strong> the<br />

data in parallel <strong>and</strong> per<strong>for</strong>ming specific calculations, their overall per<strong>for</strong>mance is considerably better<br />

compared to s<strong>of</strong>tware solutions running on serial general purpose processors.<br />

The hardware implementation <strong>of</strong> global algorithms is neither an appealing nor an easy option.<br />

As stated above, global methods are time <strong>and</strong> computational dem<strong>and</strong>ing because <strong>of</strong> their iterative<br />

nature. This is also the reason that prevents them from being implemented with parallel structures.<br />

On the contrary, global algorithms require odd, rather than simple <strong>and</strong> straight<strong>for</strong>ward, implementations.<br />

DP though is inherently the simplest <strong>of</strong> all the other global approximations.<br />

In contrast, local methods could be greatly benefited by the use <strong>of</strong> such parallel <strong>and</strong> straight<strong>for</strong>ward<br />

structures. Parallelism <strong>and</strong> simplicity are key factors, available in dedicated hardware implementations,<br />

that can reduce the required running times. There are several works that describe local<br />

methods implemented on hardware. What most <strong>of</strong> them have in common is that they implement<br />

arathersimplealgorithm<strong>and</strong>makeextensivelyuse<strong>of</strong>computationconcurrency.Per<strong>for</strong>manceis<br />

refined by custom choices during the hardware architecture development phase. A generalized block<br />

diagram <strong>of</strong> a hardware implementable stereo correspondence algorithm is shown in Figure 2.6.<br />

Fig. 2.6 Generalized block diagram <strong>of</strong> a hardware implementable stereo correspondence algorithm.<br />

Hardware implementation involves using either FPGA or ASIC. Digital signal processor (DSP)<br />

based solutions have also been reported in the past (Faugeras et al. 1993), however they are not<br />

reported as frequently, due to their inhibited difficulty in parallel processing. A survey <strong>of</strong> the recent<br />

bibliography confirms that FPGA implementations are preferable. That is because the time<br />

required <strong>for</strong> fabrication <strong>and</strong> test <strong>of</strong> ASIC implementations is considerably long <strong>and</strong> its cost is high.<br />

Moreover there is almost no flexibility <strong>for</strong> future improvements <strong>and</strong> modifications. On the other<br />

h<strong>and</strong> FPGA provide rapid prototyping time, are far less expensive <strong>and</strong> can be easily adapted to<br />

new specifications. In this way FPGA combine the best parts <strong>of</strong> hardware solutions with those <strong>of</strong><br />

the s<strong>of</strong>tware ones.


Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong> 33<br />

2.2.1 FPGA <strong>Implementation</strong>s<br />

All the hardware implementations examined in this Section can achieve real-time operation. However,<br />

the use <strong>of</strong> FPGAs is now the most convenient <strong>and</strong> reasonable choice <strong>for</strong> hardware development.<br />

They are cheap <strong>and</strong> per<strong>for</strong>m remarkably well. The available resources <strong>of</strong> the devices are constantly<br />

growing, allowing <strong>for</strong> more complex algorithms to be implemented. The variety <strong>of</strong> available electronic<br />

design automation (EDA) tools <strong>and</strong> the absence <strong>of</strong> fabrication stage make the prototyping<br />

times very short. Another advantage is that the resulting hardware implementation is open <strong>for</strong><br />

further upgrades. Thus, FPGA implementations are very flexible <strong>and</strong> fault tolerant.<br />

In the rest <strong>of</strong> this Section FPGA implemented methods based on SAD, DP <strong>and</strong> Local Weighted<br />

Phase-Correlation (LWPC) are presented. Table 2.7 demonstrates the main characteristics <strong>of</strong> the<br />

below discussed works. This table is populated according to the available data. It is evident that<br />

the simplest <strong>and</strong> most straight<strong>for</strong>ward method <strong>of</strong> all, i.e. SAD, is the most preferable one.<br />

FPGA <strong>Implementation</strong>s based on SAD<br />

As expected, when it comes to hardware implementations SAD-based methods are the most preferred<br />

ones. SAD calculation requires simple computational modules, as it involves only summations<br />

<strong>and</strong> absolute values’ calculations.<br />

The FPGA based architecture presented in (Arias-Estrada & Xicotencatl 2001) is able to produce<br />

dense disparity maps in real time. The architecture implements a local algorithm based on the SAD<br />

aggregated in fixed windows. Input data are processed in parallel on a single chip. An extension to<br />

the basic architecture is also proposed in order to compute disparity maps on more than 2 images.<br />

This method can process 320×240 pixel images with 16 disparity levels at speeds higher than 71<br />

fps. The devise utilization is 4.2K slices, equivalent to 69K gates.<br />

The system developed by (Jia et al. 2003) is able to compute dense disparity maps in real time<br />

using the SAD method over fixed windows. The whole algorithm, including radial distortion correction,<br />

Laplacian <strong>of</strong> Gaussian (LoG) filtering, correspondence finding, <strong>and</strong> disparity map computation<br />

is implemented in a single FPGA. The system can process 640×480 pixel images with 64 disparity<br />

levels <strong>and</strong> 8 bit depth precision at 30 fps speed, <strong>and</strong> 320×240 pixel images at 50 fps.<br />

The SAD algorithm aggregated over fixed windows is the option utilized in (Miyajima &<br />

Maruyama 2003) as well. This stereo vision system is implemented on a single FPGA with plenty <strong>of</strong><br />

external memory. It supports camera calibration <strong>and</strong> left-right consistency check. The per<strong>for</strong>mance<br />

is 20 fps <strong>for</strong> 640×480 pixel images <strong>and</strong> 80 fps <strong>for</strong> 320×240. The number <strong>of</strong> disparity levels <strong>for</strong> these<br />

results are 200 <strong>and</strong> the device utilization is 54%. Changing the number <strong>of</strong> disparity levels results<br />

only in changing the circuit size <strong>and</strong> not the per<strong>for</strong>mance.<br />

One more simplified version <strong>of</strong> the adaptive windows aggregation method in conjunction with<br />

SAD is used in (Chonghun et al. 2004). It can process images <strong>of</strong> size up to 1024×1024 pixels with<br />

32 disparity levels at 47 fps speed. The resources needed are 3.4K slices, i.e. 10% <strong>of</strong> the utilized<br />

FPGA area.<br />

Another simple implementation <strong>of</strong> the SAD method with fixed windows is proposed in (Yi et al.<br />

2004). The effect <strong>of</strong> various window shapes is investigated. The results indicate that 270×270 pixel<br />

images with 27 disparity levels can be processed at 30 fps speed achieving 90% <strong>of</strong> correct matches.<br />

The utilization <strong>of</strong> the FPGA reported is in any case less than 46K slices, equivalent to 8M gates.


34 Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong><br />

Table 2.7 FPGA implementations’ characteristics<br />

Author Matching<br />

Cost<br />

Arias-Estrada &<br />

Xicotencatl (2001)<br />

Aggregation Image Size Disparity<br />

Levels<br />

Window<br />

Size<br />

Speed<br />

(fps)<br />

Device<br />

SAD fixed window 320×240 16 7x7 71 Xilinx Virtex<br />

XCV800HQ240-6<br />

Jia et al. (2003) SAD fixed window 640×480 64 9x9 30 –<br />

Miyajima &<br />

Maruyama (2003)<br />

Chonghun et al.<br />

(2004)<br />

SAD fixed window 640×480 200 7x7 20 Xilinx Virtex-II<br />

SAD adaptive window 1024x1024 32 16x16<br />

(max)<br />

47 Xilinx Virtex-II<br />

6000<br />

Yi et al. (2004) SAD fixed window 270×270 27 9x9 30 Xilinx Virtex-II<br />

XC2V8000<br />

Lee et al. (2005) SAD fixed window 640×480 64 32x32 30 Xilinx Virtex-II<br />

XC2V8000<br />

Hariyama,<br />

Kobayashi, Sasaki<br />

&Kameyama<br />

(2005)<br />

Georgoulas et al.<br />

(2008)<br />

Kalomiros &<br />

Lygouras (2008)<br />

Kalomiros &<br />

Lygouras (2009)<br />

Jeong & Park<br />

(2004)<br />

Park & Jeong<br />

(2007)<br />

Kalomiros &<br />

Lygouras (2009)<br />

Darabiha et al.<br />

(2006)<br />

Masrani &<br />

MacLean (2006)<br />

SAD adaptive window 64x64 64 8x8<br />

(max)<br />

SAD adaptive window 640×480 80 7x7<br />

(max)<br />

30 Altera<br />

APEX20KE<br />

275 Altera Stratix II<br />

EP2S180F1020C3<br />

SAD fixed window 320×240 32 3x3 14 Altera Cyclone II<br />

EP2C35F672C6<br />

SAD fixed window 640×480 64 3x3 162 Altera Cyclone II<br />

2C35<br />

DP single line 1280×1000 208 – 15 Xilinx Virtex-II<br />

XC2V8000<br />

DP 2 lines 320×240 128 – 30 Xilinx Virtex-II<br />

pro-100<br />

DP 2 lines 640×480 65 3x3 81 Altera Cyclone II<br />

2C35<br />

LWPC – 256x360 20 – 30 4x Xilinx<br />

Virtex2000E<br />

LWPC – 480×640 128 – 30 4x Altera Stratix<br />

S80


Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong> 35<br />

The same core algorithm as in (Yi et al. 2004) is used in the work reported in (Lee et al. 2005).<br />

The aggregating window shape is found to play a significant role in this implementation. Using<br />

rectangular windows instead <strong>of</strong> square ones reduces the resource usage to 50% i.e. less than 10K<br />

slices <strong>and</strong> at the same time, preserves the same output quality. The proposed system can process<br />

640×480 pixel images with 64 disparity levels at 30 fps rate <strong>and</strong> 320×240 pixel images with 64<br />

disparity levels at 115 fps.<br />

On the other h<strong>and</strong>, a slightly more complex implementation than the previous ones is proposed<br />

in (Hariyama, Kobayashi, Sasaki & Kameyama 2005). It is based on the SAD using adaptive sized<br />

windows. The proposed method iteratively refines the matching results by hierarchically reducing<br />

the window size. The results obtained by the proposed method are 10% better than that <strong>of</strong> the fixedwindow<br />

method. The architecture is fully parallel <strong>and</strong> as a result all the pixels <strong>and</strong> all the windows<br />

are processed simultaneously. The speed <strong>for</strong> 64x64 pixel images with 8 bit grayscale precision <strong>and</strong><br />

64 disparity levels is 30 fps. The resources consumption is 42.5K logic elements, i.e. 82% <strong>of</strong> the<br />

utilized device.<br />

SAD aggregated using adaptive windows is the core <strong>of</strong> the work presented in (Georgoulas et al.<br />

2008). A hardware based CA parallel-pipelined design is realized on a single FPGA device. The<br />

achieved speed is nearly 275 fps, <strong>for</strong> 640×480 pixel image pairs with a disparity range <strong>of</strong> 80 pixels.<br />

The presented hardware-based algorithm provides very good processing speed at the expense <strong>of</strong><br />

accuracy. The device utilization is 149K gates, that is 83% <strong>of</strong> the available resources.<br />

The work <strong>of</strong> (Kalomiros & Lygouras 2008) implement an SAD algorithm on a FPGA board featuring<br />

external memory <strong>and</strong> a Nios II embedded processor clocked at 100 MHz. The implementation<br />

produces dense 8-bit disparity maps <strong>of</strong> 320×240 pixels with 32 disparity levels at a speed <strong>of</strong> 14 fps.<br />

Essential resources are about 16K logic elements, whereas by migrating to more complex devices<br />

the design can easily grow to support better results.<br />

Finally, the same authors in (Kalomiros & Lygouras 2009) present an improved SAD-based algorithm<br />

with a fixed 3×3 aggregation window <strong>and</strong> a hardware median enhancement filter. The<br />

presented system can process 640×480 images with 64 disparity levels at 162 fps speed. The implementation<br />

require 32K logic elements, equivalent to about 63K gates.<br />

FPGA <strong>Implementation</strong>s based on DP<br />

The use <strong>of</strong> DP is an alternative as well. The implementation presented in (Jeong & Park 2004)<br />

uses the DP search method on a trellis solution space. It copes with the vergent cameras case, i.e.<br />

cameras with optical axes that intersect. The images received from a pair <strong>of</strong> cameras are rectified<br />

using linear interpolation <strong>and</strong> then the disparity is calculated. The architecture has the <strong>for</strong>m <strong>of</strong><br />

a linear systolic array using simple processing elements. The design is canonical <strong>and</strong> simple to<br />

be implemented in parallel. The implementation requires 208 processing elements. The resulting<br />

system can process 1280×1000 pixel images with up to 208 disparity levels at 15 fps.<br />

An extension <strong>of</strong> the previous method is presented in (Park & Jeong 2007). The main difference is<br />

that data from the previous line are incorporated so as to en<strong>for</strong>ce better inter-scanline inconsistency.<br />

The running speed is 30 fps <strong>for</strong> 320×240 pixel images with 128 disparity levels. The number <strong>of</strong><br />

utilized processing elements is 128. The percentage <strong>of</strong> pixels with disparity error larger than 1 in<br />

the unoccluded areas is 2.63, 0.91, 3.44 <strong>and</strong> 1.88 <strong>for</strong> the Tsukuba, Map, Venus <strong>and</strong> Sawtooth image<br />

sets, respectively.


36 Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong><br />

Finally, the work <strong>of</strong> (Kalomiros & Lygouras 2009) presents a custom parallelized DP algorithm,<br />

as well. Once again, a fixed 3×3 aggregation window <strong>and</strong> a hardware median enhancement filter is<br />

used. Moreover, interscanline support is utilized. The presented system can process 640×480 images<br />

with 65 disparity levels at 81 fps speed. The implementation require 270K logic elements, equivalent<br />

to about 1.6M gates.<br />

FPGA <strong>Implementation</strong>s based on Phase Methods<br />

Moreover phase-based techniques can be implemented on hardware as well. The algorithm implemented<br />

in (Darabiha et al. 2006) is called Local Weighted Phase-Correlation (LWPC). Hardware<br />

implementation <strong>of</strong> the algorithm turns out to be more than 300 times faster than the s<strong>of</strong>tware one.<br />

The plat<strong>for</strong>m used is the Transmogrifier-3A (TM-3A) containing four Xilinx Virtex2000E FPGAs<br />

connected via a 98 bit bus. A description <strong>of</strong> the programmable hardware plat<strong>for</strong>m, the base stereo<br />

vision algorithm <strong>and</strong> the design <strong>of</strong> the hardware can be found in the paper. 66.6K look-up tables<br />

(LUT) <strong>and</strong> 83K flip-flops (FF) are required. This implementation can produce dense disparity maps<br />

<strong>of</strong> 256×360 pixel image pairs with 20 disparity levels <strong>and</strong> 8 bit sub-pixel accuracy at the rate <strong>of</strong> 30<br />

fps.<br />

The same LWPC method is used in (Masrani & MacLean 2006). The plat<strong>for</strong>m used is the<br />

Transmogrifer-4 containing four Altera Stratix S80 FPGAs. The system per<strong>for</strong>ms rectification <strong>and</strong><br />

left-right consistency check to improve the accuracy <strong>of</strong> the results. The speed <strong>for</strong> 640×480 pixel<br />

images with 128 disparity levels is 30 fps. The hardware resources dem<strong>and</strong>ed are roughly the same<br />

as in (Darabiha et al. 2006) due to the reuse <strong>of</strong> the available temporal in<strong>for</strong>mation <strong>of</strong> the input<br />

video sequence.<br />

2.2.2 ASIC <strong>Implementation</strong>s<br />

On the other h<strong>and</strong> ASIC implementation is an option as well, but it is more expensive, except <strong>of</strong> the<br />

case <strong>of</strong> massive production. The prototyping times are considerable longer <strong>and</strong> the result is highly<br />

process-dependent. Any further changes are difficult <strong>and</strong> additionally time <strong>and</strong> money consuming.<br />

Their per<strong>for</strong>mance supremacy does in most cases not justify choosing ASICs. These are the main<br />

reasons that make recent ASIC implementation publications rare in contrast to the FPGA-based<br />

ones.<br />

Published works concerning ASIC implementations (Hariyama et al. 2000, Hariyama, Sasaki &<br />

Kameyama 2005) <strong>of</strong> stereo matching algorithms are restricted to the use <strong>of</strong> SAD. The reported<br />

architectures make extensive use <strong>of</strong> parallelism <strong>and</strong> seem promising. Though, they lack undisputed<br />

experimental results.<br />

2.3 Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong><br />

<strong>Stereo</strong> vision is a tested, useful <strong>and</strong> popular tool <strong>for</strong> inferring the depth <strong>of</strong> a scene with only passive<br />

optical sensors. Robotics, on the other h<strong>and</strong>, evolve rapidly <strong>and</strong> dem<strong>and</strong> methods that can serve


Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong> 37<br />

autonomous behaviors, such as obstacle avoidance <strong>and</strong> SLAM. Within this context, stereo correspondence<br />

algorithms need to provide accurate depth maps, in real-time frame-rates, confronting,<br />

at the same time, any difficulties imposed by the robots’ environments. This Section provides an<br />

overview <strong>of</strong> the state <strong>of</strong> the art, regarding vision-based obstacle avoidance <strong>and</strong> SLAM robotic applications.<br />

2.3.1 Obstacle Avoidance Applications<br />

A wide range <strong>of</strong> sensors <strong>and</strong> various methods have been proposed in the relevant literature, as far as<br />

obstacle avoidance techniques are concerned. Some interesting details about the developed sensor<br />

systems <strong>and</strong> proposed detection <strong>and</strong> avoidance algorithms can be found in (Borenstein & Koren<br />

1990) <strong>and</strong> (Ohya et al. 1998). Movarec has proposed the Certainty Grid method in (Moravec 1987)<br />

<strong>and</strong> Borenstein (Borenstein & Koren 1991) has proposed the Virtual Force Field method <strong>for</strong> robot<br />

obstacle avoidance. Then the Elastic Strips method was proposed in (Khatib 1996, 1999) treating the<br />

trajectory <strong>of</strong> the robot as an elastic material to avoid obstacles. Moreover, (Kyung Hyun et al. 2008)<br />

present a modified Elastic Strip method <strong>for</strong> mobile robots operating in uncertain environments.<br />

Review <strong>of</strong> popular obstacle avoidance algorithms covering them in more detail can be found in<br />

(Manz et al. 1993) <strong>and</strong> (Kunchev et al. 2006). Finally, the concept <strong>of</strong> using fuzzy logic <strong>for</strong> obstacle<br />

avoidance purposes was covered by (Reignier 1994), but only up to a theoretical level.<br />

The obstacle avoidance systems found in literature involve the use <strong>of</strong> one from or a combination<br />

<strong>of</strong> ultrasonic, laser, infrared (IR) or vision sensors (Siegwart & Nourbakhsh 2004). The use <strong>of</strong><br />

ultrasonic, laser <strong>and</strong> IR sensors is well-studied <strong>and</strong> the depth measurements are quite accurate <strong>and</strong><br />

easily available. However, such sensors suffer either from achieving only low refresh rates (V<strong>and</strong>orpe<br />

et al. 1996) or being extremely expensive. On the other h<strong>and</strong>, vision sensors combine high frame<br />

rates <strong>and</strong> appealing prices.<br />

<strong>Stereo</strong> vision is <strong>of</strong>ten used in vision-based methods, instead <strong>of</strong> monocular sensors, due to the simpler<br />

calculations involved in the depth estimation. Regarding stereo vision systems, one <strong>of</strong> the most<br />

popular methods <strong>for</strong> obstacle avoidance is the initial estimation <strong>of</strong> the so called v-disparity image.<br />

This method is applied in order to confront the noise in low quality disparity images (Labayrade<br />

et al. 2002, Zhao et al. 2007, Soquet et al. 2007). However, if detailed <strong>and</strong> noise-free disparity maps<br />

were available, less complicated methods could have been used instead.<br />

2.3.2 Simultaneous Localization <strong>and</strong> Mapping Applications<br />

The SLAM problem is that <strong>of</strong> estimating a robot’s position <strong>and</strong> <strong>of</strong> progressively building a map<br />

<strong>of</strong> its environment. The issue has been in the focus <strong>of</strong> the robotics research community <strong>for</strong> over<br />

two decades (Dissanayake et al. 2001). The difficulties <strong>of</strong> solving the SLAM problem arise from the<br />

finite precision <strong>of</strong> the sensors <strong>and</strong> actuators <strong>of</strong> the robot, given real-life situations as the ones shown<br />

in Figure 2.7.<br />

The computational load <strong>of</strong> SLAM is another problematic aspect. Much ef<strong>for</strong>t has been devoted<br />

to reduce the dem<strong>and</strong>ed computations (Bailey & Durrant-Whyte 2006, Huang et al. 2008). <strong>Vision</strong><br />

is being used <strong>for</strong> navigation purposes since the early days <strong>of</strong> autonomous robotics (Jung 1994).


38 Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong><br />

! (a) ! (b)<br />

Fig. 2.7 Mobile robots in real environments<br />

However, recently, vision-based mapping <strong>and</strong> measurement methods, either stereo or monocular<br />

(Lemaire et al. 2007), are becoming increasingly popular as cameras’ cost is decreasing. A review<br />

<strong>of</strong> the advances in the field <strong>of</strong> vision-based SLAM can be found in (Chen et al. 2007).<br />

The success <strong>of</strong> the solely vision-based SLAM algorithms is, to a large extend, owed to the development<br />

<strong>of</strong> robust feature detection <strong>and</strong> description methods, such as the scale-invariant feature<br />

trans<strong>for</strong>m (SIFT) (Lowe 2004) <strong>and</strong> the speeded-up robust features (SURF) (Bay et al. 2008). A<br />

quantitative evaluation <strong>of</strong> feature extractors <strong>for</strong> use in vision-based SLAM algorithms can be found<br />

in Klippenstein & Zhang (2007).<br />

Apart from the feature extraction process, the majority <strong>of</strong> state <strong>of</strong> the art SLAM algorithms<br />

also rely on some kind <strong>of</strong> progressive probabilistic framework (Durrant-Whyte & Bailey 2006).<br />

Davison’s work treats the SLAM problem using a robot equipped with an active stereo head,<br />

operating in unknown environments. Features are extracted <strong>and</strong> an extended Kalman Filter (EKF)<br />

is used (Davison & Kita 2001, Davison & Murray 2002, Davison et al. 2003, Davison 2003, 2007).<br />

Areal-timeEKFimplementationabletosignificantlyreducethecomputationalrequirementsis<br />

presented in (Guivant & Nebot 2001). Finally, Holmes in (Holmes et al. 2009) presents a square<br />

root unscented Kalman filter <strong>for</strong> per<strong>for</strong>ming video-rate visual SLAM using a single camera while<br />

keeping the algorithm’s complexity low.<br />

On the other h<strong>and</strong>, an alternative method called FastSLAM has been proposed (Montemerlo<br />

2003, Montemerlo & Thrun 2007, Stentz et al. 2003). This algorithm estimates recursively the full<br />

posterior distribution over robot pose <strong>and</strong> l<strong>and</strong>mark locations, but scales logarithmically with the<br />

number <strong>of</strong> l<strong>and</strong>marks in the map.<br />

Furthermore, the use <strong>of</strong> particle filters have been reported (Moreno et al. 2009). The work <strong>of</strong><br />

(Sim et al. 2007, Sim & Little 2009) uses a stereo camera to collect data <strong>and</strong> a Rao-Blackwellised<br />

particle filter to solve the SLAM problem. The use <strong>of</strong> efficient data structures <strong>and</strong> a hybrid map<br />

representation provide precise robot’s localization <strong>and</strong> maps <strong>of</strong> the environment in high frame rates.<br />

Recently, real-time solutions have been in the focus <strong>of</strong> research. The combination <strong>of</strong> stereo vision<br />

<strong>and</strong> an inertial measurement unit is used in (Zhu et al. 2007), while bundle adjustment is utilized in<br />

(Nister et al. 2006). Finally, highly efficient, solely stereo-based methods are also reported (Agrawal<br />

&Konolige2008,Meietal.2009).


Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong> 39<br />

2.4 Open Issues <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong> <strong>for</strong> Robotic Applications<br />

Despite the fact that stereo vision has been widely used <strong>for</strong> various robotic applications, common<br />

issues related to outdoor exploration have not yet been addressed in a satisfactory manner. Outdoor<br />

robotics place strict constraints on the used algorithms (Konolige et al. 2006, Soquet et al. 2007).<br />

The lighting conditions in outdoor environments are far from being ideal. A stereo camera, which<br />

acquires two displaced views <strong>of</strong> the same scene, is very sensitive to such conditions (Hogue et al.<br />

2007, Klancar et al. 2004). Moreover, the rough terrain <strong>and</strong> the bounces it causes to a moving<br />

robot <strong>of</strong>ten decalibrate the cameras <strong>of</strong> a stereo acquisition array. Autonomous operation dem<strong>and</strong>s<br />

high processing frame-rates. On the other h<strong>and</strong>, a robotic plat<strong>for</strong>m can provide limited computational<br />

resources, power <strong>and</strong> payload capacity <strong>for</strong> many different onboard applications. These facts<br />

differentiate the priorities <strong>of</strong> stereo vision algorithms intended <strong>for</strong> use in outdoor operating robots<br />

from those listed in (Scharstein & Szeliski 2010). The algorithms listed in the a<strong>for</strong>ementioned site<br />

compete based on their accuracy on four perfectly lighted, calibrated <strong>and</strong> rectified image sets without<br />

any timing or computational constraints. The rest <strong>of</strong> this Section discusses the state <strong>of</strong> the art<br />

concerning open issues <strong>of</strong> stereo vision methods when applied to robotic applications.<br />

2.4.1 Simplicity <strong>of</strong> Computations<br />

The per<strong>for</strong>mance <strong>of</strong> stereo vision algorithms greatly affects the relevant autonomous robotic behaviors.<br />

As discussed previously, stereo correspondence algorithms can be coarsely divided in local <strong>and</strong><br />

global ones. Dense local stereo correspondence methods calculate depth <strong>for</strong> almost every pixel <strong>of</strong><br />

the scenery, talking into consideration only a small neighborhood <strong>of</strong> pixels each time (Scharstein &<br />

Szeliski 2002). On the other h<strong>and</strong>, global methods are significantly more accurate but at the same<br />

time more computationally dem<strong>and</strong>ing, as they account <strong>for</strong> the whole image (Torra & Criminisi<br />

2004). Since the most urgent constraint in autonomous robotics is the real-time operation, such<br />

applications usually utilize the computationally simpler local algorithms (Labayrade et al. 2002,<br />

Soquet et al. 2007, Kelly & Stentz 1998, Zhao et al. 2007, Konolige et al. 2006, Agrawal et al. 2007).<br />

Implementing stereo algorithms in hardware can dramatically improve their efficiency. The allure<br />

<strong>of</strong> hardware implementations is that they easily outper<strong>for</strong>m the algorithms executed on a computer.<br />

The achieved frame-rates are generally higher. The power consumed by a dedicated hardware plat<strong>for</strong>m,<br />

e.g. ASIC or FPGA, is considerably lower than that <strong>of</strong> a common microprocessor. Moreover,<br />

the computational power <strong>of</strong> the robot’s onboard available PCs is left intact. However, the hardware<br />

implementation <strong>of</strong> the already presented algorithms, as already discussed, is not always straight<strong>for</strong>ward<br />

In general, robotics require computational simple <strong>and</strong> easy to implement stereo vision algorithms<br />

that will provide accurate <strong>and</strong> reliable results.<br />

2.4.2 Multi-view <strong>Stereo</strong> <strong>Vision</strong><br />

Early previous work focused on developing stereo algorithms mostly <strong>for</strong> binocular camera configurations.<br />

However, redundancy can lead to more accurate <strong>and</strong> reliable depth estimations. More recently,


40 Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong><br />

due to significant boost <strong>of</strong> the available computational power, vision systems using multiple cameras<br />

are becoming increasingly feasible <strong>and</strong> practical. The transition from binocular to multi-ocular<br />

systems has the advantage <strong>of</strong> potentially increasing the stability <strong>and</strong> accuracy <strong>of</strong> depth calculations.<br />

The continuous price reduction <strong>of</strong> vision sensors allowed the development <strong>of</strong> multiple camera<br />

arrays ready <strong>for</strong> use in many applications. For instance, Yang et al. (Ruigang et al. 2002) used a<br />

five-camera system <strong>for</strong> real-time rendering using modern graphics hardware, while Schirmacher et<br />

al. (Schirmacher et al. 2001) increased the number <strong>of</strong> cameras <strong>and</strong> built up a six-camera system<br />

<strong>for</strong> on-the-fly processing <strong>of</strong> generalized Lumigraphs. Moreover, developers <strong>of</strong> camera arrays have<br />

exp<strong>and</strong>ed their systems so as to use tens <strong>of</strong> cameras, such as the MIT distributed light field camera<br />

(Yang et al. 2002) <strong>and</strong> the Stan<strong>for</strong>d multi-camera array (Wilburn et al. 2002). These systems are<br />

using 64, <strong>and</strong> 128 cameras respectively.<br />

Most <strong>of</strong> the a<strong>for</strong>ementioned camera arrays are utilized <strong>for</strong> real-time image rendering. On the<br />

other h<strong>and</strong>, a research area that could also be benefited by the use <strong>of</strong> multiple camera arrays is the<br />

so called cooperative stereo vision; i.e., multiple stereo pairs being considered to improve the overall<br />

depth estimation results. To this end, Zitnick (Zitnick & Kanade 2000) presented an algorithm <strong>for</strong><br />

binocular occlusion detection <strong>and</strong> Mingxiang (Mingxiang & Yunde 2006) exp<strong>and</strong>ed it to trinocular<br />

stereo.<br />

2.4.3 Uncalibrated <strong>Stereo</strong> Images<br />

The two alternatives <strong>for</strong> efficiently estimating disparity are either to precisely align the stereo<br />

camera rig <strong>and</strong> then per<strong>for</strong>m the dem<strong>and</strong>ed rectification (leading to simple scanline searches), or<br />

to have arbitrary stereo cameras setup <strong>and</strong> avoid any calibration (per<strong>for</strong>ming searching throughout<br />

blocks). Accurately aligned stereo devices are very expensive, as they dem<strong>and</strong> calibration <strong>of</strong> a series<br />

<strong>of</strong> factors in micrometer scale (Gasteratos & S<strong>and</strong>ini 2002). On the other h<strong>and</strong>, non-ideal stereo<br />

configurations usually produce inferior results, as they fail to satisfy the epipolar constraint.<br />

The issue <strong>of</strong> processing uncalibrated images, common to applications where the sensory system<br />

is not explicitly specified, is an open one. The plethora <strong>of</strong> computations most commonly require<br />

the massive parallelization found in custom tailored hardware implementations. The contemporary<br />

powerful graphics machines are able to achieve better results in terms <strong>of</strong> processing time <strong>and</strong> data<br />

volume. <strong>Stereo</strong> vision algorithms able to process uncalibrated input images (Zach et al. 2004, Jeong<br />

&Park2004,Masrani&MacLean2006,Park&Jeong2007)havealsobeendiscussedpreviously<br />

in this Chapter.<br />

2.4.4 Non-ideal Lighting Conditions<br />

The vast majority <strong>of</strong> the stereo correspondence algorithms use some kind <strong>of</strong> intensity-based metric<br />

as the basis <strong>of</strong> their dissimilarity measure function (Scharstein & Szeliski 2002). The most common<br />

methods, due to their simplicity <strong>and</strong> real-time per<strong>for</strong>mance, are the SAD <strong>and</strong> the SSD. The correctness<br />

<strong>of</strong> their results is based on the assumption that the same feature in the two stereo images,<br />

should have ideally the same intensity. However, this assumption is <strong>of</strong>ten not valid. Even in the case<br />

that the gains <strong>of</strong> the two cameras are perfectly tuned, so as to result in the same intensity <strong>for</strong> the


Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong> 41<br />

same features in both images, the fact that the two cameras shoot from a different pose, might result<br />

in different intensities <strong>for</strong> the same point, due to shading reasons. Moreover, in real environments,<br />

which is the case <strong>for</strong> robotic applications, the illumination is not ideal (Klancar et al. 2004, Hogue<br />

et al. 2007). This fact leads to large variations <strong>of</strong> the intensity values <strong>for</strong> the same features among<br />

the two images <strong>of</strong> a stereo pair. Such a situation is shown in the stereo pair <strong>of</strong> Figure 2.8. The sun<br />

in the left image is hidden by a bush, whereas in the right image directly faces the camera. Thus,<br />

the non-ideal illumination causes failures during an intensity-based correspondence procedure.<br />

(a) Left image (b) Right image<br />

Fig. 2.8 A real-life stereo pair suffering from different illumination<br />

2.4.5 Biologically Inspired Methods<br />

The success <strong>of</strong> the HVS in obtaining depth in<strong>for</strong>mation from two 2D images still remains a goal to<br />

be accomplished by machine vision. Incorporating procedures <strong>and</strong> features from HVS in artificial<br />

stereo-equipped systems, could improve their per<strong>for</strong>mance. The key concept behind this transfer <strong>of</strong><br />

know-how from nature to science is identifying, underst<strong>and</strong>ing <strong>and</strong> expressing the basic principles<br />

<strong>of</strong> natural stereoscopic vision, aiming to improve the state <strong>of</strong> the art in machine vision. These<br />

principles are mainly involved in the aggregation step that most existing algorithms employ.<br />

HVS has been studied by many branches <strong>of</strong> the scientific community. Physics have expressed color<br />

in<strong>for</strong>mation through color spaces, while biology has investigated the response <strong>and</strong> the physiology<br />

<strong>of</strong> the eyes. Psychophysics have studied the relationship between individual stimuli’s changes <strong>and</strong><br />

the perceived intensity, which is applicable to vision as well as all to the other modalities. On the<br />

other h<strong>and</strong>, the gestalt school <strong>of</strong> psychology suggested grouping as the key <strong>for</strong> interpreting human<br />

vision.<br />

Often, biological <strong>and</strong> psychological findings are incorporated in the expression <strong>of</strong> correlation<br />

functions. Real life is the ultimate resource <strong>for</strong> finding right solutions in many fields <strong>of</strong> robotics,<br />

computer science <strong>and</strong> electronics (Mead 1990, Shimonomura et al. 2008, Berthouze & Metta 2005).<br />

The natural selection process is a strict judge that favors the more effective solutions <strong>for</strong> each<br />

problem. Of course, our underst<strong>and</strong>ing <strong>for</strong> the solutions that emerged from natural selection comes


42 Chapter 2. State <strong>of</strong> the Art in <strong>Stereo</strong> <strong>Vision</strong><br />

mainly from the sciences <strong>of</strong> biology, psychology <strong>and</strong> neuroscience. Applying ideas borrowed from<br />

these sciences in technological problems can lead to very effective results. Consequently, further<br />

blending <strong>of</strong> biological <strong>and</strong> psychological findings with computer vision indicates a promising direction<br />

towards simple <strong>and</strong> accurate computer vision algorithms.


Chapter 3<br />

<strong>Stereo</strong> Correspondence Algorithms<br />

This Chapter presents new stereo correspondence algorithms developed within this thesis. Each <strong>of</strong><br />

the presented algorithms aim to confront some <strong>of</strong> the open issues identified in the previous Chapter.<br />

3.1 <strong>Stereo</strong> Correspondence Algorithm with Enhanced Disparity Selection<br />

In this Section an effective, hardware oriented stereo correspondence algorithm, able to produce<br />

dense disparity maps <strong>of</strong> improved fidelity is presented. The presented algorithm combines rapid<br />

execution, simple <strong>and</strong> straight-<strong>for</strong>ward structure as well as comparably high quality <strong>of</strong> results. These<br />

features render it as an ideal c<strong>and</strong>idate <strong>for</strong> hardware implementation <strong>and</strong> <strong>for</strong> real-time applications.<br />

The algorithm utilizes the AD as matching cost <strong>and</strong> aggregates the results inside support windows,<br />

assigning Gaussian distributed weights to the support pixels, based on their Euclidean distance.<br />

The resulting DSI is furthered refined by CA acting in all <strong>of</strong> the three dimensions <strong>of</strong> the DSI.<br />

The main merit <strong>of</strong> the presented algorithm is its simplicity, rendering it as an ideal choice <strong>for</strong><br />

real-time operations <strong>and</strong> hardware implementation. Its structural elements are summarized as:<br />

1. AD is utilized as matching cost function since it is the simplest one, involving no multiplications.<br />

2. The aggregation step is a 2D process per<strong>for</strong>med inside fix-sized square support windows upon a<br />

slice <strong>of</strong> the DSI. The pixels inside each support window are assigned to a Gaussian distributed<br />

weight during aggregation. The weight <strong>of</strong> each pixel is a Gaussian function <strong>of</strong> its Euclidean<br />

distance towards the central pixel <strong>of</strong> the current window.<br />

3. The resulting aggregated values <strong>of</strong> the DSI are furthered refined by applying CA. CA are used<br />

inside the 3D DSI, <strong>and</strong> not as a 2D post-processing disparity map filter (Kotoulas, Gasteratos,<br />

Sirakoulis, Georgoulas & Andreadis 2005).<br />

4. Finally, the best disparity value <strong>for</strong> each pixel is decided by a WTA selection step.<br />

This algorithm was not indented to achieve excellence <strong>of</strong> results but to provide a simple to<br />

implement, fast to execute yet credible stereo correspondence methodology. In this way the presented<br />

algorithm is able to be executed in real-time <strong>and</strong> to be easily hardware implemented, as dem<strong>and</strong>ed<br />

by many applications.<br />

43


Chapter 3. <strong>Stereo</strong> Correspondence Algorithms 45<br />

!<br />

Fig. 3.2 2D Gaussian mask producing the weight <strong>for</strong> the pixel summation<br />

The resulting aggregated values <strong>of</strong> the DSI are furthered refined by applying CA. All cells can<br />

work in parallel <strong>and</strong> as a result the used CA can be easily implemented in hardware. Two CA<br />

transition rules are applied to the DSI. The values <strong>of</strong> parameters used by them were determined<br />

after extensive testing to per<strong>for</strong>m best. The first rule attempts to resolve disparity ambiguities. It<br />

checks <strong>for</strong> excessive consistency <strong>of</strong> results along the disparity d axis <strong>and</strong>, if necessary, corrects on<br />

the perpendicular (i, j) plane. The second rule is placed in order to smoothen the results <strong>and</strong> at the<br />

same time to preserve the details. It checks <strong>and</strong> acts on constant-disparity planes. The two rules<br />

can be expressed as:<br />

1. if at least one <strong>of</strong> the two pixels lying from either sides <strong>of</strong> a pixel across the disparity axis d differs<br />

from the central pixel less than half <strong>of</strong> its value, then its value is further aggregated within its<br />

3×3 pixel,constant-disparityneighborhood.<br />

First CA rule Pseudocode<br />

if {<br />

|DSI(i,j,d)-DSI(i,j,d-1)| < (1/2)DSI(i,j,d) }<br />

or {<br />

|DSI(i,j,d)-DSI(i,j,d+1)| < (1/2)DSI(i,j,d) }<br />

then {<br />

<strong>for</strong> m,n = (-1,0,1) {<br />

DSI(i,j,d) = (1/9)sum(sum(DSI(i+m,j+n,d) }}<br />

2. if there are at least 7 pixels in the 3×3 pixel neighborhood which differ from the central pixel<br />

less than half <strong>of</strong> the central pixel’s value, then the central pixel’s value is scaled down by the<br />

factor 1.3, as dictated by exhaustive testing.<br />

Second CA rule Pseudocode<br />

<strong>for</strong> m,n = (-1,0,1) {<br />

while (m <strong>and</strong> n)0 {<br />

if {<br />

|DSI(i+m,j+n,d)-DSI(i,j,d)| < (1/2)DSI(i,j,d) }<br />

then {<br />

count++ }}}<br />

if {<br />

count>=7 }<br />

then {


46 Chapter 3. <strong>Stereo</strong> Correspondence Algorithms<br />

DSI(x,y,d) = (1/1.3)DSI(i,j,d) }<br />

The two rules are applied once. Their outcome comprises the enhanced DSI that will be used in<br />

order the optimum disparity map to be chosen by a simple, non-iterative WTA final step.<br />

In the last stage the best disparity value <strong>for</strong> each pixel is decided by a WTA selection procedure.<br />

For each image pixel coordinates (i, j) the smaller value is searched <strong>for</strong> on the d axis <strong>and</strong> its position<br />

is declared to be the pixel’s disparity value. That is:<br />

3.1.2 Experimental Results<br />

D(i, j) =arg(min(DSI(i, j, d)) (3.2)<br />

The algorithm was applied to st<strong>and</strong>ard image sets as well as to self-recorded real-life ones, in order<br />

to be evaluated. Results are presented in terms <strong>of</strong> calculated images <strong>and</strong> quantitative metrics.<br />

St<strong>and</strong>ard Image Sets<br />

The st<strong>and</strong>ard image sets used were the four stereo images (Scharstein & Szeliski 2002, 2003) provided<br />

along with their corresponding ground truth disparity maps by Scharstein <strong>and</strong> Szeliski through their<br />

web site (Scharstein & Szeliski 2010). Figure 3.3 depicts the reference (left) images (a), the provided<br />

ground truth disparity maps (b), the disparity maps calculated by the presented method (c), maps<br />

<strong>of</strong> signed disparity error where middle (50%) gray tone equals to zero error (d), <strong>and</strong> maps <strong>of</strong> pixels<br />

with absolute computed disparity error bigger than 1 shown in black (e). The percentage <strong>of</strong> pixels<br />

whose absolute disparity error is greater than 1 in the non-occluded, all, <strong>and</strong> near discontinuities<br />

<strong>and</strong> occluded regions are presented in Table 3.1. The presented algorithm leaves non-calculated a<br />

frame around the image whose width is equal to the aggregation window width, i.e. 11 pixels. Thus,<br />

the results <strong>of</strong> Table 3.1 slightly underestimate the per<strong>for</strong>mance <strong>of</strong> the presented algorithm, except<br />

<strong>for</strong> the case <strong>of</strong> Tsukuba image set, where the ground truth itself ignores that frame as well.<br />

Table 3.1 Percentage <strong>of</strong> pixels whose absolute disparity error is greater than 1 in various regions <strong>of</strong> the images<br />

Pair Non-occluded (%) All (%) Discontinuities (%)<br />

Tsukuba 10.3 12.3 23.5<br />

Venus 8.86 10.2 35.8<br />

Teddy 24.5 31.5 35.2<br />

Cones 20.6 28.8 31.1<br />

Table 3.2, on the other h<strong>and</strong>, presents the Normalized Mean Square Error (NMSE) <strong>for</strong> the<br />

calculated disparity maps <strong>of</strong> the four image sets, excluding the 11 pixel wide frame. NMSE is<br />

calculated <strong>for</strong> a simplified version <strong>of</strong> the presented algorithm, which makes no use <strong>of</strong> CA, as well as<br />

<strong>for</strong> the complete version <strong>of</strong> the algorithm. The addition <strong>of</strong> CA substantially improves the quality,<br />

as shown from the last column.


Chapter 3. <strong>Stereo</strong> Correspondence Algorithms 47<br />

(a)<br />

(b)<br />

(c)<br />

(d)<br />

(e)<br />

Fig. 3.3 Results <strong>for</strong> the Middlebury data sets. From left to right: the Tsukuba, Venus, Teddy <strong>and</strong> Cones image<br />

From top to bottom: the reference (left) images (a), the provided ground truth disparity maps (b), the disparity<br />

maps calculated by the presented method (c), maps <strong>of</strong> signed disparity error (d), <strong>and</strong> maps <strong>of</strong> pixels with absolute<br />

computed disparity error bigger than 1 (e)<br />

Self-recorded Image Sets<br />

The presented algorithm was also applied to two self-recorded real-life stereo pairs, as well. The<br />

pairs were captured using a PointGrey Research, Bumblebee2 stereo camera system <strong>and</strong> their size is<br />

512×384 pixels. Two scenes were captured, one outdoor <strong>and</strong> one indoor. Per<strong>for</strong>mance on everyday


48 Chapter 3. <strong>Stereo</strong> Correspondence Algorithms<br />

Table 3.2 Calculated NMSE <strong>for</strong> various versions <strong>of</strong> the presented algorithm<br />

Data Set Normalized Mean Square Error (NMSE) Improvement (%)<br />

without CA with CA<br />

Tsukuba 0.0627 0.0593 5.42<br />

Venus 0.0545 0.0447 17.98<br />

Teddy 0.1149 0.1108 3.57<br />

Cones 0.0809 0.0768 5.07<br />

scenes, even if generally ignored in favor <strong>of</strong> synthetic datasets, is very important <strong>for</strong> a system that<br />

aspires to be used in robotic applications. The two stereo pairs along with the calculated disparity<br />

maps are presented in Figure 3.4.<br />

(a)<br />

(b)<br />

Fig. 3.4 Self-recorded scenes. (a) outdoor scene, (b) indoor scene. From left to right: left image, right image,<br />

calculated disparity map<br />

The results are acceptable, considering that the main merit <strong>of</strong> the presented algorithm is its<br />

simplicity <strong>and</strong> effectiveness in conjunction with its ability to be easily hardware implemented <strong>and</strong><br />

to run in real-time. A comparison <strong>of</strong> the a<strong>for</strong>ementioned to the results <strong>of</strong> other methods listed<br />

in (Scharstein & Szeliski 2010), shows that they are comparable to those, <strong>of</strong> the corresponding<br />

simple-structured algorithms.<br />

3.1.3 Discussion<br />

The presented algorithm exhibits satisfactory per<strong>for</strong>mance despite its simple structure. Gaussian<br />

weighted aggregation <strong>and</strong> CA refinement inside the DSI have been proven to comprise an effective


Chapter 3. <strong>Stereo</strong> Correspondence Algorithms 49<br />

computational combination. Disparity maps <strong>of</strong> st<strong>and</strong>ard image sets, as well as <strong>of</strong> self-recorded ones<br />

are calculated. The data show that the presented algorithm is in the right direction <strong>for</strong> a hardware<br />

implementable, real-time solution. However, the quality <strong>of</strong> the results could be further improved by<br />

refining further the applied CA rules. The possibilities concerning the nature <strong>and</strong> the number <strong>of</strong><br />

the applied CA rules are practically endless <strong>and</strong> the chosen ones, although effective, are only one <strong>of</strong><br />

those possibilities. The presented algorithm’s ability to calculate disparity maps <strong>of</strong> real-life scenes is<br />

highly appreciated. Finally, it can be concluded that the algorithm’s serial flow <strong>and</strong> low complexity<br />

combined with the presented satisfactory results render it as an appealing c<strong>and</strong>idate <strong>for</strong> hardware<br />

implementation. Thus, depth calculation could be per<strong>for</strong>med efficiently in real-time by autonomous<br />

robotic systems.<br />

3.2 Quad-view <strong>Stereo</strong> Correspondence Algorithm<br />

This Section proposes a quad-camera based system able to calculate fast <strong>and</strong> accurately a single<br />

depth map <strong>of</strong> a scenery. The four cameras are placed on the corners <strong>of</strong> a square. Thus, three,<br />

differently oriented, stereo pairs result when considering a single reference image (namely an horizontal,<br />

a vertical <strong>and</strong> a diagonal pair). The presented system applies a slightly modified version<br />

<strong>of</strong> the stereo correspondence algorithm presented in the previous Section to each stereo pair. This<br />

way, the computational load is kept within reasonable limits. A reliability measure is used in order<br />

to validate each point <strong>of</strong> the resulting disparity maps. Finally, the three disparity maps are fused<br />

together according to their reliabilities. The maximum reliability is chosen <strong>for</strong> every pixel. The final<br />

output <strong>of</strong> the presented system is a highly reliable depth map which can be used <strong>for</strong> higher level<br />

robotic behaviors.<br />

3.2.1 Algorithm Description<br />

The presented system is a combination <strong>of</strong> sensory hardware <strong>and</strong> a custom-tailored s<strong>of</strong>tware algorithm.<br />

The hardware configuration, i.e. the four cameras’ <strong>for</strong>mation, produce three stereo image<br />

pairs. Each pair is submitted to the simple <strong>and</strong> rapid stereo correspondence algorithm, resulting,<br />

thus, in a disparity map. For each disparity map a certainty map is calculated, indicating each<br />

pixel’s reliability. Finally, the three disparity maps are fused, according to their certainties <strong>for</strong> each<br />

pixel. The outcome is a single disparity map which incorporates the best parts <strong>of</strong> its producing<br />

disparity maps. The combined hardware <strong>and</strong> s<strong>of</strong>tware system is able to produce accurate dense<br />

depth maps in frame rate suitable <strong>for</strong> autonomous robotic applications.<br />

Hardware Sensory System<br />

The sensory configuration <strong>of</strong> the presented system consists <strong>of</strong> four identical cameras. The four<br />

cameras are placed so as their optical axes to have parallel orientation <strong>and</strong> their principal points<br />

to be co-planar residing on the corners <strong>of</strong> the same square, as shown in Figure 3.5(a). The images<br />

captured by the upper-left camera are considered as the reference images <strong>of</strong> each tetrad. Each one


50 Chapter 3. <strong>Stereo</strong> Correspondence Algorithms<br />

<strong>of</strong> the other three cameras produces images to be corresponded to the reference images. Thus, <strong>for</strong><br />

each tetrad <strong>of</strong> images three, differently oriented, stereo pairs result, i.e. an horizontal, a vertical<br />

<strong>and</strong> a diagonal one. The concept, as well as the result <strong>of</strong> such a group <strong>of</strong> cameras are presented in<br />

Figure 3.5(b).<br />

(a) (b)<br />

Fig. 3.5 (a) The quad-camera configuration <strong>and</strong> (b) the results (up-left) <strong>and</strong> scene capturing (right) using the<br />

quad-camera configuration<br />

S<strong>of</strong>tware Architecture<br />

The presented algorithm consists <strong>of</strong> two processing steps. The first one is the stereo correspondence<br />

algorithm that is applied to each image pair. Then, during a fusion step the results <strong>for</strong> all the stereo<br />

pairs are merged.<br />

<strong>Stereo</strong> correspondence algorithm<br />

The presented system utilizes a custom tailored, simple, rapidly executed stereo correspondence<br />

algorithm applied to each stereo pair. <strong>Stereo</strong> disparity is computed using a three-stage local stereo<br />

correspondence algorithm. The algorithm utilized is a slightly modified version <strong>of</strong> the algorithm<br />

presented in Section 3.1. The only difference with the a<strong>for</strong>ementioned stereo correspondence algorithm<br />

has to do with the dimensions <strong>of</strong> the chosen aggregation window. Noise suppression is very<br />

important <strong>for</strong> stereo algorithms that are intended to be applied to outdoors scenes. Outdoors images,<br />

which is <strong>of</strong>ten the case <strong>for</strong> autonomous navigation tasks, usually suffer from noise induced<br />

by a variety <strong>of</strong> reasons, e.g. lighting differences <strong>and</strong> reflections. The aggregation windows dimensions<br />

used in the presented algorithm are bigger, i.e. 13 × 13 pixels. This choice is a compromise<br />

between real-time execution speed <strong>and</strong> sufficient noise cancellation. Overall, the used stereo correspondence<br />

algorithm combines low computational complexity with sophisticated data processing.<br />

Consequently, it is able to produce dense disparity maps <strong>of</strong> good quality in frame rates suitable <strong>for</strong><br />

robotic applications.


52 Chapter 3. <strong>Stereo</strong> Correspondence Algorithms<br />

Fig. 3.6 Algorithm’s steps <strong>and</strong> results <strong>for</strong> the Tsukuba data set. (column 1) the reference image (up-left), (column<br />

2) the three target images (up-right, down-left, down-right), (column 3) the certainty maps <strong>for</strong> the horizontal, vertical<br />

<strong>and</strong> diagonal pair, (column 4) the computed disparity map <strong>for</strong> each stereo pair, (column 5) the fused (top) <strong>and</strong> the<br />

ground truth (bottom) disparity maps<br />

Figure 3.7 shows the experimental results <strong>of</strong> the presented quad-camera algorithm (left), the<br />

computationally equivalent simple stereo algorithm (middle) <strong>and</strong> the utilized single stereo algorithm<br />

applied on the horizontal stereo pair (right). The first row shows the calculated disparity maps. The<br />

second row shows the maps <strong>of</strong> pixels with absolute computed disparity error bigger than 1 shown in<br />

black. Finally, the third row presents maps <strong>of</strong> signed disparity error where the middle (50%) gray<br />

tone equals to zero error. It is obvious that the simple stereo algorithm, shown in the rightmost<br />

column suffers from noise. The usual confrontation <strong>of</strong> this issue is to enlarge the utilized 13 × 13<br />

pixel aggregation window during the respective stage. However, window enlargement generally leads<br />

to loss <strong>of</strong> detail <strong>and</strong> coarse results, as shown in the middle column. This version <strong>of</strong> the algorithm<br />

utilizes a 23 × 23 pixel aggregation window, which results in triple computational load. Obviously,<br />

both <strong>of</strong> these treatments lack the results’ quality <strong>of</strong> the presented method. The final result <strong>of</strong> the<br />

presented algorithm requires roughly the same computational power as the algorithm in the middle<br />

column. The outcome is that the presented quad-camera algorithm achieves better results than<br />

its computationally equivalent simple two-camera stereo counterpart <strong>and</strong> the simple initial stereo<br />

algorithm.<br />

The percentage <strong>of</strong> pixels whose absolute disparity error is greater than 1 in the non-occluded, all,<br />

<strong>and</strong> near discontinuities <strong>and</strong> occluded regions are presented in Table 3.3. The presented percentages<br />

refer to the three initially computed stereo pairs (namely the horizontal, vertical <strong>and</strong> diagonal pair),<br />

the final fused result <strong>of</strong> the presented system <strong>and</strong>, finally, the computationally equivalent two-camera<br />

utilized stereo correspondence algorithm.<br />

As shown in Table 3.3 there are cases where the results <strong>of</strong> the fusion process are marginally worse<br />

than those <strong>of</strong> an initial step. However, the image pair direction that provides the optimum results<br />

<strong>and</strong> should be considered as the most reliable <strong>and</strong> useful can not be initially anticipated. Moreover,


Chapter 3. <strong>Stereo</strong> Correspondence Algorithms 53<br />

Fig. 3.7 Results <strong>of</strong> the presented fusion system (left), the computationally equivalent simple stereo algorithm<br />

(middle) <strong>and</strong> the preliminary simple stereo algorithm applied on the horizontal image pair (right). From top to<br />

bottom: the computed disparity maps, pixels with absolute computed disparity error bigger than 1 <strong>and</strong> maps <strong>of</strong><br />

signed disparity error<br />

the optimum direction is arbitrary <strong>and</strong>, there<strong>for</strong>e, there is little chance to coincide with any <strong>of</strong> the<br />

available three in the presented system throughout the whole scene. However, the goal <strong>of</strong> the fusion<br />

system is to identify the best disparity value <strong>for</strong> every pixel. Thus, the results will be roughly as, or<br />

occasionally even more, accurate as the best initial results. On the other h<strong>and</strong>, the final disparity<br />

map is, in any case, far more reliable than the initial ones, since it has gone through a validation<br />

procedure, guaranteed by Equation 3.3.<br />

Table 3.3 Percentage <strong>of</strong> pixels whose absolute disparity error is greater than 1 in various regions <strong>for</strong> the Tsukuba<br />

pairs<br />

Pair Non-occluded (%) All (%) Discontinuities (%)<br />

Horizontal 16.2 18.1 29.9<br />

Vertical 12.5 13.8 35.1<br />

Diagonal 10.7 12.4 32.3<br />

presented 10.8 12.6 31.5<br />

Equivalent 15.8 17.6 33.9<br />

The presented algorithm has been also applied to a virtual scenery. A virtual quad-camera system<br />

was inserted to the virtual room shown in the two first columns <strong>of</strong> Figure 3.8 <strong>and</strong> the dem<strong>and</strong>ed<br />

tetrad <strong>of</strong> images was captured. The room scene was chosen as it is a complex <strong>and</strong> dem<strong>and</strong>ing one,<br />

having both regions with fine details <strong>and</strong> low-textured ones. Moreover, the repetitive pattern <strong>of</strong>


54 Chapter 3. <strong>Stereo</strong> Correspondence Algorithms<br />

the books, in the background, is a challenging element <strong>for</strong> the stereo correspondence algorithms.<br />

Figure 3.8 depicts the reference i.e. up-left image in the first column <strong>and</strong> the three target images i.e.<br />

up-right, down-left <strong>and</strong> down-right in the second column. The third <strong>and</strong> the fourth columns show<br />

the certainty <strong>and</strong> disparity maps calculated <strong>for</strong> the image pairs consisting <strong>of</strong> the single reference<br />

image <strong>and</strong> the corresponding target ones. Finally, the fifth column <strong>of</strong> the figure shows the fused<br />

final disparity map.<br />

Fig. 3.8 Algorithm’s steps <strong>and</strong> results <strong>for</strong> a synthetic room scene. (column 1) The reference image (up-left), (column<br />

2) the three target images (up-right, down-left, down-right), (column 3) the certainty maps <strong>for</strong> the horizontal, vertical<br />

<strong>and</strong> diagonal pair, (column 4) the computed disparity map <strong>for</strong> each stereo pair, (column 5) the final fused depth map<br />

The availability <strong>of</strong> reliable depth maps is the cornerstone <strong>of</strong> many computer vision as well as<br />

robotic applications. Figure 3.9(a) shows a screenshot <strong>of</strong> the 3D reconstructed Tsukuba scene. The<br />

depth map <strong>of</strong> Figure 3.6 obtained using the presented method was utilized in order to add the third<br />

dimension’s in<strong>for</strong>mation to the reference image. Thus, a 3D model <strong>of</strong> the scene was reconstructed<br />

<strong>and</strong> a computer user can virtually navigate around the scene. On the other h<strong>and</strong>, Figure 3.9(b)<br />

shows an obstacle detection application based on the availability <strong>of</strong> reliable depth map. <strong>Stereo</strong><br />

vision can be used by autonomous robotic plat<strong>for</strong>ms in order to reliably detect obstacles within<br />

their movement range <strong>and</strong> move accordingly. The previously obtained depth map <strong>of</strong> Figure 3.8<br />

was used <strong>for</strong> the calculation <strong>of</strong> the v-disparity image. Using the Hough trans<strong>for</strong>mation the floor<br />

plane was calculated <strong>and</strong> the obstacles were detected. This result is useful <strong>for</strong> any path-planning<br />

algorithm.


Chapter 3. <strong>Stereo</strong> Correspondence Algorithms 55<br />

(a) (b)<br />

Fig. 3.9 Application results obtained using the calculated depth maps. (a) View <strong>of</strong> the reconstructed Tsukuba scene<br />

<strong>and</strong> (b) obstacle detection in the virtual room scene<br />

3.2.3 Discussion<br />

A depth computing system has been presented aimed <strong>for</strong> autonomous robotics applications. The<br />

system utilizes a square <strong>for</strong>mation <strong>of</strong> four identical cameras capturing the same scene. Selecting<br />

one <strong>of</strong> the images <strong>of</strong> each tetrad as reference, three image pairs result. Each pair is processed by a<br />

simple <strong>and</strong> rapid, custom stereo correspondence algorithm which results in an initial disparity map,<br />

as well as in a certainty map. A fusion process evaluates the three initial disparity maps according<br />

to their certainty <strong>and</strong> produces the final combined disparity map.<br />

Autonomous robotic applications dem<strong>and</strong> reliable depth estimations obtained in real-time frame<br />

rates, having at the same time limited computational resources. The presented system substitutes<br />

the computational complexity with a special sensor configuration. However, the dem<strong>and</strong>ed configuration<br />

can be easily <strong>and</strong> cost efficiently be achieved. The presented results exhibit a fair compromise<br />

between the objectives <strong>of</strong> low computational complexity <strong>and</strong> result’s reliability.<br />

The accuracy <strong>of</strong> local algorithms in various regions <strong>of</strong> a scene is strongly correlated to the orientation<br />

<strong>of</strong> the depicted objects in that particular region towards the orientation <strong>of</strong> the correspondence<br />

search procedure. That is, the depth discontinuities are more discriminable when they are oriented<br />

vertically to the correspondence search direction. This conclusion is based on the inherent way local<br />

algorithms operate <strong>and</strong> can be confirmed by the preliminary disparity maps, presented in the fourth<br />

row <strong>of</strong> Figure 3.6 <strong>and</strong> Figure 3.8. The presented system has the advantage <strong>of</strong> being able to adapt to<br />

various objects’ orientations. The result is that the final fused disparity map is at least as accurate<br />

as the most accurate <strong>of</strong> the initial disparity maps <strong>and</strong> at the same time much more reliable than<br />

any <strong>of</strong> them. Moreover, the structure <strong>of</strong> the presented s<strong>of</strong>tware architecture is ideal <strong>for</strong> execution on<br />

the nowadays widely available quad-core processors. Each one <strong>of</strong> the identical but separate stereo<br />

correspondence searches can be assigned to a core, while the fourth core will supervise the whole<br />

procedure.


56 Chapter 3. <strong>Stereo</strong> Correspondence Algorithms<br />

3.3 Hierarchical <strong>Stereo</strong> Correspondence Algorithm <strong>for</strong> Uncalibrated<br />

Images<br />

In motion estimation, the sub-pixel matching technique involves the search <strong>of</strong> sub-sample positions<br />

as well as integer-sample positions between the image pairs, choosing the one that gives the best<br />

match. Based on this idea, the presented algorithm proposes an estimation method, which per<strong>for</strong>ms<br />

a 2D correspondence search using a hierarchical search pattern. The intermediate results are refined<br />

by 3D CA. The disparity value is then defined using the horizontal distance <strong>of</strong> the matching position.<br />

There<strong>for</strong>e, the presented algorithm can process uncalibrated <strong>and</strong> non-rectified stereo image pairs,<br />

maintaining the computational load within reasonable levels.<br />

This stereo vision algorithm is inspired by recent motion estimation techniques. The presented<br />

algorithm has been adapted to the dem<strong>and</strong>s <strong>of</strong> the contemporary outdoor robotic applications. It<br />

is based on a fast executed SAD core <strong>for</strong> correspondence search in both the vertical <strong>and</strong> horizontal<br />

direction <strong>of</strong> the input images. The results <strong>of</strong> this core are enhanced using sophisticated computational<br />

techniques; Gaussian weighted aggregation <strong>and</strong> 3D CA rules are used similarly to Section 3.1.<br />

The hierarchical iteration <strong>of</strong> the basic stereo algorithm was achieved using a fuzzy scaling technique<br />

(Amanatiadis et al. 2008). The a<strong>for</strong>ementioned characteristics provide improved quality <strong>of</strong> results,<br />

being at the same time easy to be hardware implemented. As a result, the presented algorithm is<br />

able to cope with uncalibrated input images.<br />

The presented scheme is block matching-based <strong>and</strong> does not per<strong>for</strong>m scanline pixel matching.<br />

As a result, it does require neither camera calibration nor image rectification. However, it is clear<br />

that block matching approaches require more computational resources since the number <strong>of</strong> pixels<br />

to be considered is greatly increased. In order to address this problem, the presented algorithm<br />

is a variation <strong>of</strong> a motion estimation algorithm (Yin et al. 2003) which is used <strong>for</strong> JVT/H.264<br />

video coding (Wieg<strong>and</strong> et al. 2003). The adaptation <strong>of</strong> compression motion estimation algorithms<br />

into disparity estimation schemes can be effective both in accuracy <strong>and</strong> complexity terms, since<br />

compression algorithms also attempt to achieve complexity reduction while maintaining coding<br />

efficiency. On the other h<strong>and</strong>, CA have been employed as a intelligent <strong>and</strong> efficient way to refine<br />

<strong>and</strong> enhance the stereo algorithm’s intermediate results.<br />

3.3.1 Algorithm Description<br />

The algorithm presented in this Section is an extension <strong>of</strong> the algorithm found in Section 3.1. The<br />

original algorithm has been extended so as to per<strong>for</strong>m two-dimensional matching search instead <strong>of</strong><br />

one-dimensional. The search is per<strong>for</strong>med hierarchically in three steps incrementally improving the<br />

match precision, but avoiding the computational load <strong>of</strong> a full two-dimensional search scheme.<br />

<strong>Stereo</strong> Correspondence Algorithm<br />

The presented system utilizes a simple, rapidly executed stereo correspondence algorithm applied<br />

to each stereo pair. The matching cost function utilized is the AD, per<strong>for</strong>med in both dimensions<br />

<strong>of</strong> the image.


58 Chapter 3. <strong>Stereo</strong> Correspondence Algorithms<br />

Fig. 3.10 Quadruple, double <strong>and</strong> single pixel sample matching algorithm<br />

Fig. 3.11 General scheme <strong>of</strong> the presented hierarchical matching disparity algorithm. The search block is enlarged<br />

<strong>for</strong> viewing purposes<br />

The general scheme <strong>of</strong> the presented hierarchical matching disparity algorithm between a stereo<br />

image pair is shown in Figure 3.11. Each <strong>of</strong> the intermediate disparity maps <strong>of</strong> the first two steps<br />

are used as initial conditions <strong>for</strong> the succeeding, refining correspondence searches.<br />

In order to per<strong>for</strong>m the hierarchical disparity search three different versions <strong>of</strong> the input images<br />

are employed <strong>and</strong> the stereo correspondence algorithm is applied to each <strong>of</strong> these three pairs. The<br />

quadruple search step is per<strong>for</strong>med as a normal pixel-by-pixel search, on a quarter-size version <strong>of</strong><br />

the input images. That is, each <strong>of</strong> the initial images has been down-sampled to 25% <strong>of</strong> their initial<br />

dimensions. The quadruple search is per<strong>for</strong>med by applying the stereo correspondence algorithm<br />

in (D/4) × (D/4) search regions, on the down-sized image pair (D being the maximum expected<br />

horizontal disparity value in the original image pair). The choice <strong>of</strong> the maximum searched disparity<br />

D/4 is reasonable as the search is per<strong>for</strong>med on a 1/4 version <strong>of</strong> the original images.<br />

The window value 2w +1 used in this stage is 9, i.e. w =4.Oncethebestmatchisobtained<br />

<strong>for</strong> each pixel, another correspondence search is per<strong>for</strong>med in 3 × 3 search regions, on a half-size<br />

version <strong>of</strong> the initial image pair. Thus, the double pixel search is per<strong>for</strong>med on a 50% down-sampled<br />

version <strong>of</strong> the input images with window dimension 2w +1being 15, i.e. w =7.Finally,thesingle<br />

pixel matching is per<strong>for</strong>med in 3 × 3 regions, on the original input pair. The window value 2w +1


Chapter 3. <strong>Stereo</strong> Correspondence Algorithms 59<br />

used in this final stage is 23, i.e. w = 11. The block diagram <strong>of</strong> the presented algorithm is shown<br />

in Figure 3.12. The choice <strong>of</strong> 3 × 3 search regions <strong>for</strong> the last two steps <strong>of</strong> the hierarchical pattern<br />

can be explained as follows. The first stage is expected to find the best match <strong>for</strong> each pixel. As the<br />

next stage uses another version <strong>of</strong> the same image with double dimensions, the initially matched<br />

pixel could have been mapped to any <strong>of</strong> the 3 × 3 pixels neighborhood in the bigger version <strong>of</strong> the<br />

image.<br />

Left Image<br />

Right Image<br />

Downsize to<br />

25%<br />

Downize to<br />

25%<br />

Downscale<br />

to 50%<br />

Downscale<br />

to 50%<br />

<strong>Stereo</strong><br />

Correspondence<br />

Algorithm<br />

<strong>Stereo</strong><br />

Correspondence<br />

Algorithm<br />

<strong>Stereo</strong><br />

Correspondence<br />

Algorithm<br />

Fig. 3.12 Block diagram <strong>of</strong> the hierarchical disparity search algorithm<br />

Quadruple Search<br />

Disparity Map<br />

Upscale by<br />

factor 2<br />

Upscale by<br />

factor 2<br />

Double Search<br />

Disparity Map<br />

Final<br />

Disparity map<br />

From the block diagram it is obvious that up-scaling <strong>and</strong> down-scaling play critical role in<br />

the whole hierarchical process. These two image trans<strong>for</strong>mations are realized by interpolation algorithms.<br />

Image interpolation can be described as the process <strong>of</strong> using known data to estimate<br />

values at unknown locations. The interpolated value f(x) at coordinate x in a space <strong>of</strong> dimension<br />

q can be expressed as a linear combination <strong>of</strong> samples fk evaluated at integer coordinates<br />

k =(k1,k2,...,kk) ∈ Z q


Chapter 3. <strong>Stereo</strong> Correspondence Algorithms 61<br />

that have been tested are those presented in (Ogale & Aloimonos 2007) <strong>and</strong> (Yoon & Kweon<br />

2006a). Their results <strong>for</strong> the same distorted image set are shown in Figure 3.13(f) <strong>and</strong> Figure<br />

3.13(g) respectively. The first <strong>of</strong> them proposes a compositional approach to unify many early<br />

visual modules such as segmentation, shape <strong>and</strong> depth estimation, occlusion detection <strong>and</strong> local<br />

signal processing. The second one features an adaptive support weight aggregation scheme based on<br />

pixels’ color similarity <strong>and</strong> geometric proximity. Both the rival algorithms are state <strong>of</strong> the art local<br />

ones, since global algorithms, even though more accurate, are generally not suitable <strong>for</strong> real-time<br />

robotic applications. The results obtained by the presented algorithm are obviously better than<br />

those <strong>of</strong> the other two algorithms. This fact is due to the lack <strong>of</strong> calibration <strong>for</strong> the two input<br />

images, which can be h<strong>and</strong>led by the presented algorithm but not by the two others.<br />

The final result <strong>of</strong> the presented algorithm, Figure 3.13(e), has the same dimensions as the input<br />

images, while the previous ones have their half <strong>and</strong> quarter dimensions respectively. A full search<br />

algorithm would require D × D calculations <strong>for</strong> every pixel. On the other h<strong>and</strong>, the presented<br />

algorithm per<strong>for</strong>ms only (D/4) × (D/4) + 3 × 3+3× 3 calculations. Considering D = 32, itcanbe<br />

found that the presented algorithm is 15.7 times less computational dem<strong>and</strong>ing.<br />

Additionally, the presented algorithm has been applied to four commonly used image sets. Once<br />

again the image sets were manually distorted with the use <strong>of</strong> special commercial s<strong>of</strong>tware. Thus,<br />

the radial distortion <strong>of</strong> an optical lens was simulated. The distortion induced was 10% <strong>for</strong> all the<br />

four image pair, as well as <strong>for</strong> their given ground truth disparity maps. The tested distorted image<br />

pairs, as well as the calculated disparity maps are shown in Figure 3.14.<br />

The results shown in Figure 3.14 were compared with the respective disparity maps, which had<br />

been distorted to the same degree as the input images i.e. 10%. For each distorted image set, the<br />

NMSE has been calculated as a quantitative measure <strong>of</strong> the algorithm’s behavior. Moreover, the<br />

presented algorithm has been applied to the original undistorted versions <strong>of</strong> the image sets. The<br />

NMSE has been once more calculated. A typical stereo correspondence algorithm, would have been<br />

able to cope with the undistorted images, but it would have failed to process the distorted ones. The<br />

variation <strong>of</strong> per<strong>for</strong>mance, would have been significant, <strong>and</strong> always in favor <strong>of</strong> the undistorted image<br />

pairs. In Table 3.8 the calculated NMSE <strong>for</strong> the presented algorithm is given, when applied to the<br />

distorted <strong>and</strong> the original versions <strong>of</strong> the four image sets. The last column presents the percentage<br />

<strong>of</strong> variation, where positive values indicate better results on the original image sets while negative<br />

values indicate better results on the distorted image sets. It is evident that the presented algorithm<br />

is not being affected by the presence <strong>of</strong> non-calibration effects in the processed images.<br />

Table 3.4 Calculated NMSE <strong>for</strong> the presented algorithm <strong>for</strong> various pairs with constant distortion 10%<br />

Pair NMSE Variation (%)<br />

Distorted Original<br />

Tsukuba 0.0712 0.0781 -0.097<br />

Venus 0.0491 0.0461 +0.061<br />

Teddy 0.1098 0.0976 +0.111<br />

Cones 0.0500 0.0519 -0.038<br />

The manually induced lens distortion percentage, i.e 10%, was chosen as a typical value. However,<br />

the per<strong>for</strong>mance <strong>of</strong> the presented algorithm was tested <strong>for</strong> various values <strong>of</strong> induced lens distortion.<br />

Seven versions <strong>of</strong> the Tsukuba image sets were prepared <strong>and</strong> tested. In Figure 3.15 the two<br />

distorted input images, as well as the calculated disparity maps, are shown <strong>for</strong> various percentages


62 Chapter 3. <strong>Stereo</strong> Correspondence Algorithms<br />

(a) (b)<br />

(c) (d) (e)<br />

(f) (g)<br />

Fig. 3.13 (a), (b) The uncalibrated, diagonally captured input images <strong>and</strong> the resulting disparity maps <strong>of</strong> the<br />

presented algorithm <strong>for</strong> (c) the quadruple, (d) double <strong>and</strong> (e) single pixel estimation respectively. The result <strong>of</strong> (f)<br />

Ogale & Aloimonos (2007) <strong>and</strong> (g) Yoon & Kweon (2006a) <strong>for</strong> the same input images<br />

<strong>of</strong> distortion. The calculated NMSE <strong>for</strong> each version is given in Table 3.5 <strong>and</strong> these results can be<br />

visually assessed in Figure 3.16. It can be deduced that the presented algorithm presents a stable<br />

behavior over a large range <strong>of</strong> distortion values.


Chapter 3. <strong>Stereo</strong> Correspondence Algorithms 63<br />

(a) (b) (c)<br />

(d) (e) (f)<br />

(g) (h) (i)<br />

(j) (k) (l)<br />

Fig. 3.14 From left to right: the left <strong>and</strong> right 10% distorted input images <strong>and</strong> the calculated final disparity map<br />

<strong>for</strong> the (from up to down:) Tsukuba, Venus, Teddy <strong>and</strong> Cones image sets respectively


64 Chapter 3. <strong>Stereo</strong> Correspondence Algorithms<br />

(a) Distortion 0%<br />

(b) Distortion 2.5%<br />

(c) Distortion 5%<br />

(d) Distortion 7.5%<br />

(e) Distortion 10%<br />

(f) Distortion 12.5%<br />

(g) Distortion 15%<br />

Fig. 3.15 (from left to right:) The left <strong>and</strong> right distorted input images <strong>and</strong> the calculated final disparity maps <strong>for</strong><br />

various percentages <strong>of</strong> the induced lens distortion


Chapter 3. <strong>Stereo</strong> Correspondence Algorithms 65<br />

Table 3.5 Calculated NMSE <strong>for</strong> the presented algorithm <strong>for</strong> the Tsukuba pair with various distortion percentages<br />

NMSE<br />

!"$<br />

!"!#<br />

!"!(<br />

!"!'<br />

!"!&<br />

!"!%<br />

Distortion (%) NMSE<br />

0.0 0.0781<br />

2.5 0.0712<br />

5.0 0.0708<br />

7.5 0.0663<br />

10.0 0.0712<br />

12.5 0.0723<br />

15.0 0.0761<br />

! )"% % '"% $! $)"% $%<br />

Distortion Percentage<br />

Fig. 3.16 The NMSE <strong>for</strong> the Tsukuba image pair <strong>for</strong> various distortion percentages<br />

Self-captured Image Sets<br />

Furthermore, the algorithm has been applied on the self-captured image pairs, shown in Figure 3.17<br />

-Figure3.19.Theusedpairssuffer from typical outdoor environment’s issues. Apart from being<br />

shot from cameras displaced both horizontally <strong>and</strong> vertically, having not parallel directions they<br />

involve textureless areas <strong>and</strong> difficult lighting conditions. Moreover, examination <strong>of</strong> Figure 3.18(a),<br />

Fig 3.18(b) <strong>and</strong> Figure 3.19(a), 3.19(b) reveals that the different position <strong>of</strong> the cameras result in<br />

lighting <strong>and</strong> chromatic differences.<br />

3.3.3 Discussion<br />

The disparity estimation technique presented is able to process input images from uncalibrated<br />

stereo cameras <strong>and</strong> at the same time retain low computational complexity. The hierarchical search<br />

scheme is based on the JVT/H.264 motion estimation algorithm, initially developed <strong>for</strong> video coding.<br />

The presented algorithm searches <strong>for</strong> stereo correspondences inside D × D search blocks requiring,<br />

however, significantly less computations than a typical full search.<br />

Sophisticated methods <strong>and</strong> techniques, such as Gaussian weighted aggregation <strong>and</strong> 3D CA refinement<br />

rules have been applied to an hierarchical process. The presented algorithm’s per<strong>for</strong>mance


66 Chapter 3. <strong>Stereo</strong> Correspondence Algorithms<br />

(a) (b)<br />

(c) (d) (e)<br />

Fig. 3.17 (a), (b) The self-captured input images <strong>of</strong> an alley, <strong>and</strong> the resulting disparity maps <strong>for</strong> (c) the quadruple,<br />

(d) double <strong>and</strong> (e) single pixel estimation respectively<br />

(a) (b)<br />

(c) (d) (e)<br />

Fig. 3.18 (a), (b) The self-captured input images <strong>of</strong> a building, <strong>and</strong> the resulting disparity maps <strong>for</strong> (c) the quadruple,<br />

(d) double <strong>and</strong> (e) single pixel estimation respectively


Chapter 3. <strong>Stereo</strong> Correspondence Algorithms 67<br />

(a) (b)<br />

(c) (d) (e)<br />

Fig. 3.19 (a), (b) The self-captured input images <strong>of</strong> a corner, <strong>and</strong> the resulting disparity maps <strong>for</strong> (c) the quadruple,<br />

(d) double <strong>and</strong> (e) single pixel estimation respectively<br />

is retained practically unaffected by spatial displacements <strong>and</strong> lens distortions in the input images,<br />

as was qualitatively <strong>and</strong> quantitatively indicated. Moreover, the ability to tolerate poorly or even<br />

not calibrated input images in conjunction with its speed <strong>and</strong> the presented result quality, show<br />

that this algorithm can cope with the dem<strong>and</strong>ing issue <strong>of</strong> autonomous outdoor navigation.<br />

3.4 Biologically <strong>and</strong> Psychophysically Inspired <strong>Stereo</strong> Correspondence<br />

Algorithm<br />

Amoreadvancedstereocorrespondencealgorithmhasbeendevelopedthatincorporatesmany<br />

biologically <strong>and</strong> psychologically inspired features to an adaptive weighted SAD framework in order<br />

to determine the right depth <strong>of</strong> the scenery. In addition to ideas already found in the relevant<br />

literature, such as the color in<strong>for</strong>mation utilization, gestalt laws <strong>of</strong> proximity <strong>and</strong> similarity, new<br />

ones have been adopted. The presented algorithm introduces the use <strong>of</strong> circular support regions,<br />

the gestalt law <strong>of</strong> continuity as well as the psychophysically-based logarithmic response law. All<br />

the a<strong>for</strong>ementioned perceptual tools act complementarily inside a straight-<strong>for</strong>ward computational<br />

algorithm applicable to robotic applications. The results <strong>of</strong> the algorithm have been evaluated <strong>and</strong><br />

compared to those <strong>of</strong> similar algorithms.


68 Chapter 3. <strong>Stereo</strong> Correspondence Algorithms<br />

3.4.1 Novel Concepts<br />

Circular Windows<br />

The search <strong>for</strong> pixel correspondences between the two images <strong>of</strong> a stereo image pair is usually treated<br />

by comparing the surrounding regions <strong>of</strong> the examined pixels, rather than the examined pixels alone.<br />

The choice <strong>of</strong> those support windows, as discussed in previous Chapter, play an important role in<br />

the accuracy <strong>of</strong> the results. The support windows may vary in shape or dimensions <strong>and</strong> could be<br />

either <strong>of</strong> a fixed size or <strong>of</strong> an adaptive one. However, the use <strong>of</strong> square or rectangular regions <strong>of</strong><br />

fixed size is the most common choice.<br />

Adaptive support weights (ASW) aggregation method, as presented in (Yoon & Kweon 2006a),<br />

makes use <strong>of</strong> fixed size, square windows with comparatively large size. However, the biological<br />

model <strong>of</strong> stereo vision seems to be better approximated by using circular shaped windows (Bharath<br />

& Petrou 2008). Aggregation inside circular windows is also preferable since the contribution <strong>of</strong> the<br />

neighboring pixels becomes perfectly isotropic, i.e. there is the same number <strong>of</strong> pixels contributing<br />

in any direction on the image plane. This fact makes the aggregation results produced by circular<br />

windows more reliable to any other window’s shape.<br />

Gestalt laws<br />

Aggregation is a crucial stage <strong>of</strong> almost every stereo algorithm. Assigning the right significance<br />

weights to each pixel during aggregation is a difficult decision, where gestalt theory, as discussed in<br />

Section 1.1.3, can provide an answer. Within this context, three basic gestalt laws get the following<br />

interpretation:<br />

• Proximity (or equivalently Distance): The closer two pixels are the more correlated to each other<br />

they are.<br />

• Intensity similarity (or equivalently Intensity dissimilarity): The more similar the colors <strong>of</strong> two<br />

pixels are the more correlated they are.<br />

• Continuity (or equivalently discontinuity): The more similar is the depth <strong>of</strong> two pixels the more<br />

probable it is that they belong to the same larger feature <strong>and</strong> thus the more correlated they are.<br />

Thus, gestalt theory can be used in order to determine to which degree two pixels are correlated.<br />

Psychophysically-based Weight Assignment<br />

The remaining question is exactly how much a correlated pixel to another should contribute to it<br />

during the aggregation process. In other words, it is necessary to establish an appropriate mapping<br />

between correlation degree <strong>and</strong> contribution. It is well known, since the 19th century, that HVS<br />

interprets physical stimuli in a psychological, non-linear rather than in an absolute, linear manner.<br />

This psychophysical relationship has been investigated in depth <strong>and</strong> many explaining theories have<br />

been expressed (Pinoli & Debayle 2007). The Weber-Fechner law is one <strong>of</strong> those theories <strong>and</strong> is<br />

widely acceptable. It indicates a logarithmic correlation between the subjective perceived intensity<br />

<strong>and</strong> the objective stimulus intensity.


Chapter 3. <strong>Stereo</strong> Correspondence Algorithms 69<br />

The mathematical expression <strong>of</strong> this psychophysical law can be derived considering that the<br />

change <strong>of</strong> perception is proportional to the relative change <strong>of</strong> the causing stimulus.<br />

dp = −k dS<br />

S<br />

where dp is the differential change in perceived intensity, dS is the differential increase in the stimulus’<br />

intensity, S is the stimulus’ intensity at the instant <strong>and</strong> k is a positive constant determined by<br />

the nature <strong>of</strong> the stimulus. However, stimuli whose growth produce decreasing perception intensity,<br />

e.g. distance, dissimilarity, discontinuity that are used in the presented algorithm, can be described<br />

by assuming that the proportionality constant is negative.<br />

Integration <strong>of</strong> the last equation results in<br />

(3.8)<br />

p = −k ln S + C (3.9)<br />

where C is the constant <strong>of</strong> integration. Assuming zero perceived intensity, the value <strong>of</strong> C can be<br />

found<br />

C = k ln So<br />

(3.10)<br />

where So is the stimulus’ value that results in zero perception <strong>and</strong> under which no stimulus’ change<br />

is noticeable. Combining the above <strong>for</strong>mulas it can be derived that<br />

p = −k ln S<br />

Figure 3.20 presents the response obtained by such a function.<br />

Fig. 3.20 Perceived intensity response according to the Weber-Fechner law<br />

So<br />

(3.11)


Chapter 3. <strong>Stereo</strong> Correspondence Algorithms 73<br />

is 1/w <strong>and</strong> <strong>for</strong> dissimilarity <strong>and</strong> discontinuity 1/255, assuming255levels<strong>for</strong>eachchromaticchannel<br />

<strong>of</strong> an RGB image. The coarsest <strong>of</strong> them, i.e. 1/w was adopted as the truncation value. Any value<br />

<strong>of</strong> all the a<strong>for</strong>ementioned metrics smaller than this truncation value is considered equal to it. This<br />

way the problem <strong>of</strong> obtaining infinite weighting factors is bypassed.<br />

As already discussed, this algorithm proposes three new extensions to the ASW framework, i.e.<br />

the addition <strong>of</strong> the gestalt law <strong>of</strong> continuity, the logarithmic response to stimuli <strong>and</strong> the use <strong>of</strong><br />

circular support windows. In order to evaluate the contribution <strong>of</strong> each extension separately, the<br />

presented overall algorithm was modified so as to exclude one extension at a time. The presented<br />

overall algorithm <strong>and</strong> the resulting three truncated ones were applied to the Tsukuba image set.<br />

The percentages <strong>of</strong> erroneously calculated pixels in various image regions were calculated <strong>and</strong> the<br />

variation <strong>of</strong> each truncated implementation’s results with respect to the complete algorithm’s was<br />

calculated. The results <strong>and</strong> the respective variations are shown in Table 3.6. The version <strong>of</strong> the<br />

algorithm that does not involve the logarithmic response uses an exponential function instead; as a<br />

result, it accounts <strong>for</strong> each gestalt law in a manner similar to the one described in (Yoon & Kweon<br />

2006a) <strong>and</strong> followed by (Gu et al. 2008). The version <strong>of</strong> the algorithm that excludes the use <strong>of</strong> 55<br />

pixel diameter circular window utilizes a 48 × 48 square window instead. Thus, the covered area is<br />

the same in terms <strong>of</strong> pixel population <strong>and</strong> the processing load is preserved constant. The results<br />

shown in Table 3.6 indicate that the omission <strong>of</strong> any <strong>of</strong> the three extensions leads to increased error<br />

percentages.<br />

Table 3.6 Variation <strong>of</strong> the presented algorithm’s results <strong>for</strong> the Tsukuba image set when excluding one <strong>of</strong> the new<br />

concepts<br />

nonocc all disc<br />

error variation error variation error variation<br />

presented 3.62 5.52 14.6<br />

no continuity 5.19 +43.37% 7.17 +29.89% 21.7 +48.63%<br />

no log. response 8.89 +145.58% 10.5 +90.22% 36.1 +147.26%<br />

no circ. window 3.79 +4.70% 5.62 +1.81% 15.8 +8.22%<br />

The per<strong>for</strong>mance <strong>of</strong> the overall algorithm was evaluated using the st<strong>and</strong>ard online test-bench<br />

hosted by the university <strong>of</strong> Middlebury (Scharstein & Szeliski 2010). This test provides a common<br />

evaluation data set <strong>and</strong> allows an objective comparison <strong>of</strong> the various stereo algorithms’ results. The<br />

st<strong>and</strong>ard image sets used were the four stereo images (Scharstein & Szeliski 2002, 2003) provided<br />

along with their corresponding ground truth disparity maps, by Scharstein <strong>and</strong> Szeliski. Figure<br />

3.22 depicts the reference (left) images, the provided ground truth disparity maps, the disparity<br />

maps calculated by the presented method, maps <strong>of</strong> signed disparity error where the middle gray<br />

tone equals to zero error, <strong>and</strong> maps <strong>of</strong> pixels with absolute computed disparity error bigger than 1<br />

shown in black.<br />

The Middlebury result table, available at (Scharstein & Szeliski 2010), presents the results <strong>of</strong><br />

the submitted algorithms without taking into consideration any complexity or execution speed<br />

factors. Moreover, the results <strong>of</strong> global <strong>and</strong> local algorithms are directly compared. The presented<br />

algorithm can successfully cope with fine detailed images like the Cones dataset. On the other<br />

h<strong>and</strong>, problems occur, as in most local algorithms, with large textureless areas like in the Venus<br />

dataset. However, most <strong>of</strong> the higher ranked entries in the Middlebury results table involve some<br />

kind <strong>of</strong> global optimization <strong>and</strong> a direct comparison is neither fair nor results in useful conclusions.


74 Chapter 3. <strong>Stereo</strong> Correspondence Algorithms<br />

Fig. 3.22 Results <strong>for</strong> the Middlebury data sets. From left to right: the Tsukuba, Venus, Teddy <strong>and</strong> Cones image.<br />

From top to bottom: the reference (left) images, the provided ground truth disparity maps, the disparity maps<br />

calculated by the presented method, maps <strong>of</strong> signed disparity error <strong>and</strong> maps <strong>of</strong> pixels with absolute computed<br />

disparity error bigger than 1<br />

Only a small fraction <strong>of</strong> the listed algorithms are purely local <strong>and</strong> the per<strong>for</strong>mance <strong>of</strong> the presented<br />

algorithm is above the average when compared to other local stereo algorithms. The presented<br />

algorithm st<strong>and</strong>s in-between algorithms providing excellent results at the expense <strong>of</strong> computational<br />

load <strong>and</strong> algorithms providing poor results in favor <strong>of</strong> computational simplicity.<br />

The algorithm was also applied to some new image sets (Scharstein & Pal 2007, Hirschmuller &<br />

Scharstein 2007). The four previously discussed image sets have been in the focus <strong>of</strong> research <strong>for</strong><br />

quite a long period <strong>of</strong> time. Consequently, various algorithms have been presented that produce<br />

impressive results <strong>for</strong> those image sets. However, this fact does not necessarily imply that those<br />

algorithms’ results will be equally impressive <strong>for</strong> different image sets. There is a number <strong>of</strong> factors<br />

i.e. structure, complexity, detail, illumination etc <strong>for</strong> each image set that differentiate the results<br />

<strong>of</strong> the same algorithm applied on them. Thus the need <strong>for</strong> more image sets, other than the typical<br />

ones, is apparent. In Figure 3.23 the results <strong>of</strong> the presented algorithm applied on 7 new image sets


Chapter 3. <strong>Stereo</strong> Correspondence Algorithms 75<br />

Table 3.7 Evaluation <strong>of</strong> various ASW <strong>and</strong> local algorithms<br />

Tsukuba Venus Teddy Cones<br />

nonocc all disc nonocc all disc nonocc all disc nonocc all disc<br />

AdaptDispCalib 1.19 1.42 6.15 0.23 0.34 2.50 7.80 13.6 17.3 3.62 9.33 9.72<br />

AdaptWeight 1.38 1.85 6.90 0.71 1.19 6.13 7.88 13.3 18.6 3.97 9.79 8.26<br />

RealTimeGPU 2.05 4.22 10.6 1.92 2.98 20.3 7.23 14.4 17.6 6.41 13.7 16.5<br />

presented 3.62 5.52 14.6 3.15 4.20 20.4 11.5 18.2 23.2 4.93 13.0 11.7<br />

PhaseBased 4.26 6.53 15.4 6.71 8.16 26.4 14.5 23.1 25.5 10.8 20.5 21.2<br />

SSD+MF 5.23 7.07 24.1 3.74 5.16 11.9 16.5 24.8 32.9 10.6 19.8 26.3<br />

PhaseDiff 4.89 7.11 16.3 8.34 9.76 26.0 20.0 28.0 29.0 19.8 28.5 27.5<br />

are presented. The image sets were once again obtained by the Middlebury web site (Scharstein &<br />

Szeliski 2010) <strong>and</strong> as shown in Figure 3.23 they are from top to bottom: Aloe, Babe3, Bowling2,<br />

Cloth1, Cloth3, Cloth4 <strong>and</strong> Flowerpots. Each row <strong>of</strong> this figure shows from left to right: the left<br />

image <strong>of</strong> the stereo pair, the provided ground truth, the disparity map computed by the presented<br />

algorithm <strong>and</strong> an error map. The error maps denote by black those pixels, the computed disparity<br />

value <strong>of</strong> which differs more than 1 from the ground truth. The outcome <strong>of</strong> these results is that the<br />

presented algorithm exhibits a good behavior <strong>for</strong> a variety <strong>of</strong> stereo pair images.<br />

Another interesting point is to compare the presented algorithm to similar ones. Both the ASWbased<br />

algorithms <strong>and</strong> the traditional local ones will be considered in this comparison. The results<br />

are presented in Table 3.7. The numbers represent the percentage <strong>of</strong> pixels whose absolute disparity<br />

error is greater than 1. The three columns <strong>for</strong> each dataset represent the percentages <strong>for</strong> the<br />

pixels in nonoccluded areas, all pixels <strong>and</strong>, pixels near depth discontinuities <strong>and</strong> occluded regions,<br />

respectively. As <strong>for</strong> the ASW-based algorithms, there are 3 <strong>of</strong> them already listed in the Middlebury<br />

results table apart from the presented one. The method called [AdaptWeight] (Yoon & Kweon<br />

2006a) is the core that all the three other rival ASW-based algorithms share. Its structure is similar<br />

to that <strong>of</strong> the presented algorithm apart from its dem<strong>and</strong> <strong>for</strong> changing the input images’ color<br />

space from RGB to CIELab. The [RealTimeGPU] method (Wang et al. 2006) is a global algorithm,<br />

employing dynamic programming <strong>for</strong> disparity selection. Finally, [AdaptDispCalib] (Gu et al. 2008)<br />

is not a typical local single-stage algorithm. It employs more computational stages than all <strong>of</strong> the<br />

previous algorithms <strong>and</strong> has a rather complex structure. ASW-based algorithms generally produce<br />

more accurate results than the other non-global ones. The comparison <strong>of</strong> the presented algorithm<br />

to other non-global ones, such as [PhaseBased] (El-Etriby et al. 2007), [SSD+MF] (Scharstein &<br />

Szeliski 2002), [PhaseDiff] (El-Etriby et al. 2006), shows its superiority. All these algorithms involve<br />

no iterative procedures, in contrast to global algorithms, <strong>and</strong> have a straight-<strong>for</strong>ward structure. This<br />

common characteristic allows them to be evaluated <strong>and</strong> directly compared.<br />

Besides from having a simple structure, the presented algorithm has the merit <strong>of</strong> employing only<br />

two user-defined parameters, as discussed earlier in this Section. Other than the almost inevitable<br />

choice <strong>of</strong> the window’s size, no empirically defined parameters are used, in contrast to the other<br />

ASW-based methods presented. Those methods involve the a priori definition <strong>of</strong> various parameters<br />

that significantly change the algorithms’ behavior.


76 Chapter 3. <strong>Stereo</strong> Correspondence Algorithms<br />

Fig. 3.23 Results <strong>for</strong> new data sets. From top to bottom: Aloe, Babe3, Bowling2, Cloth1, Cloth3, Cloth4 <strong>and</strong><br />

Flowerpots. From left to right: the reference (left) image <strong>of</strong> the stereo pair, the provided ground truth, the disparity<br />

map computed by the presented algorithm <strong>and</strong> error map<br />

3.4.4 Discussion<br />

Anovellocalstereocorrespondencealgorithm,applicabletoroboticapplications,waspresented.<br />

It makes use <strong>of</strong> AD as matching function <strong>and</strong> the ASW aggregation technique <strong>for</strong> matching the<br />

images’ regions correctly. Many new features inspired by biology, psychology <strong>and</strong> psychophysics are<br />

incorporated in this algorithm. It comprises a context within which the gestalt laws, the law <strong>of</strong>


Chapter 3. <strong>Stereo</strong> Correspondence Algorithms 77<br />

Fechner-Weber <strong>and</strong> HVS’s physiology findings can coexist <strong>and</strong> a act complementarily in a simple<br />

manner. In accordance with stereoscopic vision in nature no iterative procedures are involved. The<br />

simple structure <strong>of</strong> the algorithm was also dictated by the need <strong>for</strong> rapid execution in robotic<br />

applications. However, the presented algorithm exhibits remarkably accurate results.<br />

3.5 Illumination-Invariant Dissimilarity Measure <strong>and</strong> <strong>Stereo</strong><br />

Correspondence Algorithm<br />

Many robotic <strong>and</strong> machine-vision applications rely on the accurate results <strong>of</strong> stereo correspondence<br />

algorithms. However, difficult environmental conditions, such as differentiations in illumination depending<br />

on the viewpoint, heavily affect the stereo algorithms’ per<strong>for</strong>mance. This Section presents<br />

anewillumination-invariantdissimilaritymeasureinordertosubstitutetheestablishedintensitybased<br />

ones. The presented measure can be adopted by almost any <strong>of</strong> the existing stereo algorithms,<br />

enhancing them with its robust features. The per<strong>for</strong>mance <strong>of</strong> the dissimilarity measure is validated<br />

through experimentation with a new ASW stereo correspondence algorithm. Experimental results<br />

<strong>for</strong> a variety <strong>of</strong> lighting conditions are gathered <strong>and</strong> compared to those <strong>of</strong> intensity-based algorithms.<br />

The algorithm using the presented dissimilarity measure outper<strong>for</strong>ms all the other examined algorithms,<br />

exhibiting tolerance to illumination differentiations <strong>and</strong> robust behavior.<br />

3.5.1 Description <strong>of</strong> Illumination-Invariant Dissimilarity Measure<br />

The HSL colorspace inherently expresses the lightness <strong>of</strong> a color <strong>and</strong> demarcates it from its qualitative<br />

characteristics. That is, an object will result in the same values <strong>of</strong> H <strong>and</strong> S regardless the<br />

environment’s illumination conditions. According to this assumption, the presented dissimilarity<br />

measure disregards the values <strong>of</strong> the L channel in order to calculate the dissimilarity <strong>of</strong> two colors.<br />

The omission <strong>of</strong> the vertical (L) axis from the colorspace representation leads to 2D circular disk,<br />

defined only by H <strong>and</strong> S, as show in Figure 3.24(b).<br />

The transition from the 3D colorspace representation to the 2D one, can be conceived as a floor<br />

plan projection <strong>of</strong> the double cone, when observed along the vertical (L) axis. Thus, any color can<br />

be described as a planar vector with its initial point being the disc’s center. As a consequence, each<br />

color Pk can be described as a polar vector or equivalently as a complex number with modulus<br />

equal to Sk <strong>and</strong> argument equal to Hk. That is, a color in the luminosity indifferent colorspace<br />

representation can be described as:<br />

Pk = Ske iHk (3.22)<br />

As a result, the difference, or equivalently the luminosity-compensated dissimilarity measure<br />

(LCDM), <strong>of</strong> two colors P1 <strong>and</strong> P2, shown with dashed line in Figure 3.24(b) can be calculated as<br />

the difference <strong>of</strong> the two complex numbers:


Chapter 3. <strong>Stereo</strong> Correspondence Algorithms 81<br />

3.5.3 Experimental Results<br />

The presented method <strong>and</strong>, where needed, its AD variant, the ZNCC algorithm’s implementation<br />

(Corke 2005) <strong>and</strong> the Ogale <strong>and</strong> Aloimonos algorithm (Ogale & Aloimonos 2005a,b, 2007)(available<br />

to be downloaded from (Ogale 2009)) were applied to various stereo image pairs, in order to evaluate<br />

their behavior. However, it is not the algorithms’ per<strong>for</strong>mance that is being considered but rather<br />

the behavior <strong>of</strong> the used dissimilarity measures (LCDM, AD, ZNCC, phase differences). Within<br />

this scope, various different image pairs <strong>and</strong> various lighting non-uni<strong>for</strong>mities were tested.<br />

St<strong>and</strong>ard Image Sets<br />

The per<strong>for</strong>mance <strong>of</strong> the presented algorithm was again evaluated using the st<strong>and</strong>ard online testbench<br />

hosted by the university <strong>of</strong> Middlebury (Scharstein & Szeliski 2010). The st<strong>and</strong>ard image sets<br />

used were the four stereo images (Scharstein & Szeliski 2002, 2003) provided along with their corresponding<br />

ground truth disparity maps, by Scharstein <strong>and</strong> Szeliski. However, the listed benchmark<br />

images have been acquired under perfect lighting conditions <strong>and</strong> there are no significant variations<br />

<strong>of</strong> luminosity between the left <strong>and</strong> the right images. Figure 3.26 depicts from left to right, the reference<br />

(left) input images <strong>of</strong> the stereo pair, the right input images <strong>of</strong> the stereo pair, the disparity<br />

maps as calculated by the presented LCDM-based method, maps <strong>of</strong> the pixels (shown in black),<br />

whose absolute computed disparity error is bigger than 1 <strong>and</strong> maps <strong>of</strong> signed disparity error (where<br />

the 50% gray tone indicates null error).<br />

The results summarized in Figure 3.26 are illustrated in Table 3.8. Table 3.8 presents the percentage<br />

<strong>of</strong> pixels whose absolute disparity error is greater than 1 in the non-occluded regions, in<br />

all the images’ regions, <strong>and</strong> in the regions near discontinuities or the occluded ones. These results<br />

show the per<strong>for</strong>mance <strong>of</strong> the presented LCDM-based algorithm when applied to st<strong>and</strong>ard image<br />

sets, captured under ideal lighting conditions.<br />

Table 3.8 Percentage <strong>of</strong> pixels whose absolute disparity error is greater than 1 <strong>for</strong> st<strong>and</strong>ard image sets using the<br />

presented LCDM-based algorithm<br />

Image Pair Region<br />

nonocc all disc<br />

Tsukuba 5.98 7.84 22.2<br />

Venus 14.5 15.4 35.9<br />

Teddy 20.8 27.3 38.3<br />

Cones 8.90 17.2 20.0<br />

St<strong>and</strong>ard Image Sets with Altered Illumination<br />

The presented LCDM intentionally excludes some <strong>of</strong> the pixels’ in<strong>for</strong>mation, contrary to AD, in<br />

order to be able to cope with non-symmetrical lighting conditions. As a result, the presented method<br />

is expected to per<strong>for</strong>m somewhat worse when applied to ideally lighted stereo pairs. However,


82 Chapter 3. <strong>Stereo</strong> Correspondence Algorithms<br />

Fig. 3.26 Results <strong>for</strong> the Middlebury data sets. From top to bottom: the Tsukuba, Venus, Teddy <strong>and</strong> Cones image<br />

sets. From left to right: the reference (left) input images, the right input images, the disparity maps calculated by<br />

the presented LCDM-based method, maps <strong>of</strong> pixels with absolute computed disparity error bigger than 1 <strong>and</strong> maps<br />

<strong>of</strong> signed disparity error<br />

deviating from this ideal situation is expected to favor the use <strong>of</strong> LCDM. The ZNCC algorithm<br />

was computed <strong>for</strong> 15 × 15 window’s size as this value was found to suppress noise better. Finally,<br />

the Ogale <strong>and</strong> Aloimonos algorithm was computed using the default settings. In order to test <strong>and</strong><br />

compare the per<strong>for</strong>mance <strong>of</strong> the four algorithms they were applied to a series <strong>of</strong> stereo pairs. Each<br />

pair consisted <strong>of</strong> the same, original reference (left) image <strong>of</strong> the Tsukuba image set <strong>and</strong> a differently<br />

illuminated version <strong>of</strong> the original right image <strong>of</strong> the pair. The right image <strong>of</strong> the Tsukuba pair,<br />

was manually processed with specialized s<strong>of</strong>tware <strong>and</strong> its luminosity was altered. The amount <strong>of</strong><br />

alteration ranged from -25% to +25% with 5% increments. All the four stereo algorithms were<br />

applied to each one <strong>of</strong> the resulting stereo pairs. The input images as well as the results <strong>of</strong> the four<br />

algorithms are shown in Figure 3.27.<br />

Column (c) <strong>of</strong> Figure 3.27 shows the disparity maps computed by the presented LCDM-based<br />

algorithm, column (d) shows the disparity maps computed by the RGB-based AD version <strong>of</strong> the<br />

algorithm, column (e) shows the disparity maps computed by the ZNCC stereo algorithm <strong>and</strong><br />

finally column (f) shows the disparity maps computed by the Ogale-Aloimonos algorithm. It can<br />

be seen that <strong>for</strong> ideal lighting conditions (0% difference in luminosity) Ogale-Aloimonos’ algorithm<br />

produces the best results <strong>and</strong> that the presented LCDM algorithm produces slightly inferior results<br />

compared to its AD counterpart. However, the quality <strong>of</strong> the LCDM-based algorithm’s results<br />

remains practically the same <strong>for</strong> every tested lighting condition, contrary to the Ogale-Aloimonos<br />

<strong>and</strong> the AD version <strong>of</strong> the algorithm. The Algorithm <strong>of</strong> Ogale <strong>and</strong> Aloimonos may be able to cope<br />

with contrast variations but is not as successful against lightness differences. On the other h<strong>and</strong>, the


Chapter 3. <strong>Stereo</strong> Correspondence Algorithms 83<br />

Fig. 3.27 Left input images (a), right input images with altered luminosity (b) <strong>and</strong> calculated disparity maps <strong>for</strong><br />

the presented (c), its RGB-based AD version (d), the ZNCC stereo (e) <strong>and</strong> the Ogale-Aloimonos (f) algorithms <strong>for</strong><br />

various Lightness conditions


84 Chapter 3. <strong>Stereo</strong> Correspondence Algorithms<br />

ZNCC stereo algorithm’s precision is less dependent on the lighting conditions than the AD-based<br />

algorithm but still more dependent than the presented algorithm. Moreover, the ZNCC algorithm<br />

always produces bigger error rates, <strong>and</strong> especially <strong>for</strong> the discontinuities regions. The results shown<br />

in Figure 3.27 are quantified in Figure 3.28. Figure 3.28(a) shows that the per<strong>for</strong>mance <strong>of</strong> the<br />

algorithm that uses the presented LCDM is left practically unaffected by any difference <strong>of</strong> the<br />

input images’ luminosity. On the contrary, the RGB-based AD version <strong>of</strong> the algorithm, shown<br />

in Figure 3.28(b), linearly deteriorates in terms <strong>of</strong> accuracy with the lighting non-uni<strong>for</strong>mity. The<br />

ZNCC stereo algorithm, shown in Figure 3.28(c), st<strong>and</strong>s between the two others in terms <strong>of</strong> results’<br />

constancy, but its error percentage is generally higher than that <strong>of</strong> the presented algorithm’s. Finally,<br />

the algorithm presented by Ogale <strong>and</strong> Aloimonos is the more accurate <strong>of</strong> all the others <strong>for</strong> ideal<br />

conditions but is found to fail to compensate <strong>for</strong> lightness mismatches.<br />

(a) presented algorithm (b) RGB-based AD version<br />

(c) ZNCC stereo algorithm (d) Ogale <strong>and</strong> Aloimonos stereo algorithm<br />

Fig. 3.28 Percentage <strong>of</strong> erroneously calculated pixels <strong>for</strong> the presented, its RGB-based AD version, the ZNCC <strong>and</strong><br />

the Ogale-Aloimonos stereo algorithms <strong>for</strong> various lightness conditions<br />

St<strong>and</strong>ard Image Sets with Variably Altered Illumination<br />

The previously presented results were obtained <strong>for</strong> image pairs consisting <strong>of</strong> images with different<br />

illumination each. Despite the different illumination between the pair, each single image was


Chapter 3. <strong>Stereo</strong> Correspondence Algorithms 85<br />

uni<strong>for</strong>mly over- or under- lighted. In this case, a luminosity normalization pre-processing step applied<br />

to each image may have assisted the rival stereo algorithms to obtain similar results with the<br />

presented method. However, real-life conditions may result in differently illuminated areas within<br />

the same image. To this end various color constancy methods were utilized in order to provide<br />

the rival stereo algorithms with images suitable <strong>for</strong> successful matching even if the original images<br />

suffered from illumination non-uni<strong>for</strong>mities. The tested methods were the histogram equalization,<br />

the patented Retinex algorithm (Jobson et al. 1997) <strong>and</strong> a HVS-inspired algorithm presented by<br />

Vonikakis (Vonikakis et al. 2008). Histogram equalization remaps an image’s histogram in order to<br />

improve its visual quality. The image is first converted to the HSL color space <strong>and</strong> the algorithm is<br />

applied to the luminance channel. The luminance channel’s values are trans<strong>for</strong>med with respect to<br />

areferenceimage,sothatthehistogram<strong>of</strong>theoutputimageapproximatelymatchesthereference<br />

image’s histogram. The trans<strong>for</strong>mation T minimizes the difference:<br />

|c1(T (L)) − c0(L)| (3.29)<br />

where c0 is the cumulative histogram <strong>of</strong> the image <strong>and</strong> c1 is the cumulative sum <strong>of</strong> the desired<br />

histogram <strong>for</strong> all the values L. This minimization is subject to the constraints that T must be<br />

monotonic <strong>and</strong> c1(T (L)) cannot overshoot c0(L) by more than half the distance between the histogram<br />

counts at L. The trans<strong>for</strong>mation T maps the luminance values to their new ones. Retinex<br />

is a NASA <strong>and</strong> TruView Imaging Co. patented image enhancement algorithm (Jobson et al. 1997).<br />

It corrects under-exposed areas <strong>of</strong> an image without affecting correctly exposed areas <strong>and</strong> restores<br />

rich colors. Both the pictures <strong>of</strong> each image pair were processed with this algorithm using the<br />

same parameters’ default values. The use <strong>of</strong> the default parameters’ values <strong>for</strong> all the processed<br />

pictures ensured that the tested approach would be general, i.e. without being specially optimized<br />

<strong>for</strong> a certain image pair. Finally, the HVS-inspired algorithm, presented by Vonikakis in (Vonikakis<br />

et al. 2008) per<strong>for</strong>ms spatially modulated tone mapping. That is, the method per<strong>for</strong>ms image enhancement<br />

by lightening the tones in the under-exposed regions while darkening the tones in the<br />

over-exposed, without affecting the correctly exposed ones. The tone mapping function is inspired<br />

by the shunting characteristics <strong>of</strong> the center-surround cells <strong>of</strong> the HVS. The images <strong>of</strong> all the tested<br />

pairs were processed with this algorithm using the same values <strong>for</strong> the parameters.<br />

The tested algorithms presented in this Section are the presented LCDM-based algorithm, the<br />

AD-based RGB variant algorithm, the ZNCC algorithm <strong>and</strong> finally the stereo algorithm presented<br />

by Ogale <strong>and</strong> Aloimonos in (Ogale & Aloimonos 2005a,b, 2007)asimplementedin(Ogale2009).<br />

The LCDM-based presented algorithm is compared to each one <strong>of</strong> the three others. Moreover, the<br />

three other algorithms are considered when applied to the original input images, as well as to the<br />

images processed by the histogram equalization, the Retinex <strong>and</strong> the Vonikakis et al. methods.<br />

As a test example, consider an image whose left end is darker than its right end, <strong>and</strong> the luminosity<br />

is continuously varying across the picture. This scenario was tested using the four st<strong>and</strong>ard<br />

image sets <strong>of</strong> the Middlebury university. The left image <strong>of</strong> each pair was left intact, while the right<br />

image was processed with specialized s<strong>of</strong>tware in order to apply a luminosity gradient across the<br />

horizontal direction. The gradient ranges from -50% <strong>of</strong> the original luminosity at the left end <strong>of</strong><br />

the image to +50% at the right, linearly. Figure 3.29 shows the intact left input images in column<br />

(a) <strong>and</strong> the illumination-graded right images in column (b). The disparity maps calculated by the<br />

algorithm using the presented LCDM are shown in column (c), the disparity maps calculated by its<br />

AD-based variant are shown in column (d), the disparity maps calculated by the AD-based variant<br />

algorithm using histogram equalized input images are shown in column (e), the disparity maps<br />

calculated by the AD-based variant algorithm using Retinex enhanced input images are shown in


86 Chapter 3. <strong>Stereo</strong> Correspondence Algorithms<br />

column (f), <strong>and</strong> finally the disparity maps calculated by the AD-based variant algorithm using<br />

input images enhanced by the Vonikakis et al. algorithm’s implementation given by the authors in<br />

(Vonikakis 2009) are shown in column (g).<br />

Fig. 3.29 From left to right: left input images with constant luminosity, right input images with luminosity grading<br />

from -50% to +50% along the horizontal direction, <strong>and</strong> calculated disparity maps <strong>for</strong> the st<strong>and</strong>ard image sets using<br />

the presented LCDM (c), the AD-based variant algorithm (d), the AD-based variant algorithm with histogram<br />

equalization (e), the AD-based variant algorithm with Retinex enhancement (f), the AD-based variant algorithm<br />

with enhanced pictures according to Vonikakis et al. (2008) (g)<br />

The tested stereo algorithms <strong>and</strong> the corresponding disparity maps presented in Figure 3.29<br />

result in the histograms shown in Figure 3.30. As shown in Figure 3.29 <strong>and</strong> Figure 3.30 the presented<br />

algorithm produces better results than any <strong>of</strong> the other tested compound algorithms. The<br />

pre-processing tone mapping techniques obviously failed to globally compensate <strong>for</strong> the lighting differentiations.<br />

Moreover, a direct comparison <strong>of</strong> Figure 3.29 <strong>and</strong> Figure 3.26 shows that the algorithm<br />

using the presented LCDM retains the same quality <strong>of</strong> results regardless the lighting conditions.<br />

Consequently, it can be derived that the presented algorithm compensates <strong>for</strong> different lighting<br />

conditions, exhibiting robust behavior.<br />

Next, the presented algorithm is compared to the ZNCC algorithm. Figure 3.31 shows the intact<br />

left input images in column (a) <strong>and</strong> the illumination graded right images in column (b). Once again,<br />

the disparity maps calculated by the algorithm using the presented LCDM are shown in column<br />

(c), the disparity maps calculated by the ZNCC algorithm <strong>for</strong> 15 × 15 pixels window are shown<br />

in column (d), the disparity maps calculated by the ZNCC algorithm using histogram equalized<br />

input images are shown in column (e), the disparity maps calculated by the ZNCC algorithm using<br />

Retinex enhanced input images are shown in column (f), <strong>and</strong> finally the disparity maps calculated


Chapter 3. <strong>Stereo</strong> Correspondence Algorithms 87<br />

!""#"!$%"&%'()*%<br />

*!<br />

)!<br />

(!<br />

'!<br />

&!<br />

%!<br />

$!<br />

#!<br />

"!<br />

!<br />

+,-.-/0<br />

123-,<br />

+2445<br />

6732,<br />

+,-.-/0<br />

123-,<br />

+2445<br />

6732,<br />

+,-.-/0<br />

123-,<br />

+2445<br />

6732,<br />

+,-.-/0<br />

123-,<br />

+2445<br />

6732,<br />

+,-.-/0<br />

123-,<br />

+2445<br />

6732,<br />

869: ;9 ?!@A-0B?!;9 C2>=32D!;9 173=.0.=,!2>!<br />

0B?!;9<br />

3737EE<br />

0BB<br />

4=,E!F!7EEB<br />

Fig. 3.30 Percentage <strong>of</strong> pixels whose absolute disparity error is greater than 1 <strong>for</strong> st<strong>and</strong>ard image sets calculated<br />

using the presented LCDM, the AD-based variant algorithm, the AD-based variant algorithm with histogram equalization,<br />

the AD-based variant algorithm with Retinex enhancement, the AD-based variant algorithm with enhanced<br />

pictures according to Vonikakis et al. (2008)<br />

by the ZNCC algorithm using input images enhanced by the method <strong>of</strong> Vonikakis et al. are shown<br />

in column (g).<br />

The tested stereo algorithms <strong>and</strong> the corresponding disparity maps presented in Figure 3.31 result<br />

in the histograms shown in Figure 3.32. The results shown in Figure 3.31 <strong>and</strong> Figure 3.32 show<br />

that the ZNCC algorithm can effectively compensate <strong>for</strong> illumination differentiations. However,<br />

the window dimensions that resulted in effective noise suppression, i.e. 15 × 15 pixels, could not<br />

preserve the fine details <strong>of</strong> the scenes <strong>and</strong> as a result produced higher error rates, compared to<br />

the presented algorithm, in depth discontinuities regions. Moreover, the ZNCC’s computation is<br />

obligatorily time-consuming <strong>and</strong> would not be a good c<strong>and</strong>idate <strong>for</strong> robotic applications.<br />

Finally, the presented algorithm is compared to the Ogale-Aloimonos algorithm. Figure 3.33<br />

shows the intact left input images in column (a) <strong>and</strong> the illumination-graded right images in column<br />

(b). Again, the disparity maps calculated by the algorithm using the presented LCDM are shown<br />

in column (c), the disparity maps calculated by the Ogale <strong>and</strong> Aloimonos algorithm are shown in<br />

column (d), the disparity maps calculated by the Ogale <strong>and</strong> Aloimonos algorithm using histogram<br />

equalized input images are shown in column (e), the disparity maps calculated by the Ogale <strong>and</strong><br />

Aloimonos algorithm using Retinex enhanced input images are shown in column (f), <strong>and</strong> finally the<br />

disparity maps calculated by the Ogale <strong>and</strong> Aloimonos algorithm using input images enhanced by<br />

the method <strong>of</strong> Vonikakis et al. are shown in column (g).<br />

The tested stereo algorithms <strong>and</strong> the corresponding disparity maps presented in Figure 3.33 result<br />

in the histograms shown in Figure 3.34. The results shown in Figure 3.33 <strong>and</strong> Figure 3.34 show that<br />

the Ogale <strong>and</strong> Aloimonos algorithm can preserve details better than any other tested algorithm <strong>for</strong><br />

ideal lighting conditions. However, deviations <strong>for</strong>m the ideal conditions result in significantly worse<br />

results compared to the presented LCDM-based algorithm.<br />

!


!<br />

88 Chapter 3. <strong>Stereo</strong> Correspondence Algorithms<br />

!<br />

!! !! !! !! !! !! !<br />

!! !! !! !! !! !! !<br />

!! !! !! !! !! !! !<br />

!! !! !! !! !! !! !<br />

!!!!!!!!!!!!!!!!!(a) (b) (c) (d) (e) (f) (g)<br />

!<br />

Fig. 3.31 From left to right: left input images with constant luminosity, right input images with luminosity grading<br />

from -50% to +50% along the horizontal direction, <strong>and</strong> calculated disparity maps <strong>for</strong> the st<strong>and</strong>ard image sets using<br />

the presented LCDM (c), the ZNCC algorithm (d), the ZNCC algorithm with histogram equalization (e), the ZNCC<br />

algorithm with Retinex enhancement (f), the ZNCC algorithm with enhanced pictures according to Vonikakis et al.<br />

(2008) (g)<br />

!""#"!$%"&%'()*%<br />

'!<br />

&!<br />

%!<br />

$!<br />

#!<br />

"!<br />

!<br />

()*+*,-<br />

./0*)<br />

(/112<br />

340/)<br />

()*+*,-<br />

./0*)<br />

(/112<br />

340/)<br />

()*+*,-<br />

./0*)<br />

(/112<br />

340/)<br />

5367 8933 :;)?*-@=!<br />

8933<br />

()*+*,-<br />

./0*)<br />

(/112<br />

340/)<br />

()*+*,-<br />

./0*)<br />

(/112<br />

340/)<br />

A/


!<br />

Chapter 3. <strong>Stereo</strong> Correspondence Algorithms 89<br />

!<br />

!! !! !! !! !! !! !<br />

!! !! !! !! !! !! !<br />

!! !! !! !! !! !! !<br />

!! !! !! !! !! !! !<br />

!!!!!!!!!!!!!!!!!(a) (b) (c) (d) (e) (f) (g)<br />

!<br />

Fig. 3.33 From left to right: left input images with constant luminosity, right input images with luminosity grading<br />

from -50% to +50% along the horizontal direction, <strong>and</strong> calculated disparity maps <strong>for</strong> the st<strong>and</strong>ard image sets<br />

using the presented LCDM (c), the Ogale-Aloimonos algorithm (d), the Ogale-Aloimonos algorithm with histogram<br />

equalization (e), the Ogale-Aloimonos algorithm with Retinex enhancement (f), the Ogale-Aloimonos algorithm with<br />

enhanced pictures according to Vonikakis et al. (2008) (g)<br />

!""#"!$%"&%'()*%<br />

"!!<br />

*!<br />

)!<br />

(!<br />

'!<br />

&!<br />

%!<br />

$!<br />

#!<br />

"!<br />

!<br />

+,-.-/0<br />

123-,<br />

+2445<br />

6732,<br />

+,-.-/0<br />

123-,<br />

+2445<br />

6732,<br />

869: ;=7?@737,<br />

+,-.-/0<br />

123-,<br />

+2445<br />

6732,<br />

A?,BC!DE-0=C!<br />

;=7?@737,<br />

+,-.-/0<br />

123-,<br />

+2445<br />

6732,<br />

+,-.-/0<br />

123-,<br />

+2445<br />

6732,<br />

F2B?32G!;=7?@737, 0=C!;=7?@737,<br />

3737HH<br />

0==<br />

4?,H!I!7HH=<br />

Fig. 3.34 Percentage <strong>of</strong> pixels whose absolute disparity error is greater than 1 <strong>for</strong> st<strong>and</strong>ard image sets calculated<br />

using the presented LCDM, the Ogale-Aloimonos algorithm, the Ogale-Aloimonos algorithm with histogram equalization,<br />

the Ogale-Aloimonos algorithm with Retinex enhancement, the Ogale-Aloimonos algorithm with enhanced<br />

pictures according to Vonikakis et al. (2008)<br />

!


90 Chapter 3. <strong>Stereo</strong> Correspondence Algorithms<br />

Non-synthetic Self-recorded Image Sets<br />

The previous Sections have shown that the presented algorithm produces better results than the<br />

other ones. The two developed ASW-based algorithms, using the LCDM <strong>and</strong> the AD respectively,<br />

have demonstrated superior characteristics such as preservation <strong>of</strong> details, some degree <strong>of</strong> lightness<br />

differences tolerance <strong>and</strong> adjustable computational load. The LCDM-based <strong>and</strong> the AD-based<br />

algorithms were applied to the extreme case <strong>of</strong> the self-recorded image pair previously shown in<br />

Figure 2.8 as well as to various other self-captured real life image pairs exhibiting difficult lighting<br />

conditions. Examination <strong>of</strong> the input images reveals the different lighting conditions among each<br />

couple’s images. In Figure 3.35 the two images <strong>of</strong> the stereo pair are shown in the two first columns,<br />

the respective disparity maps calculated by the presented LCDM-based algorithm are shown in the<br />

third column <strong>and</strong> the four next columns show the disparity maps calculated by the AD-based variant<br />

algorithm, the AD-based variant algorithm with histogram equalization, the AD-based variant<br />

algorithm with Retinex enhancement <strong>and</strong> the AD-based variant algorithm with enhanced pictures<br />

according to (Vonikakis et al. 2008).<br />

(a) Campus images<br />

(b) Building images<br />

(c) St<strong>and</strong>ing man images<br />

(d) Park images<br />

Fig. 3.35 Various self-recorded outdoor input image pairs <strong>and</strong> the resulting disparity maps. From left to right: the<br />

left <strong>and</strong> right input images <strong>and</strong> the disparity maps calculated with: the presented LCDM-based algorithm, the RGB<br />

AD-based algorithm applied on the raw images, the RGB AD-based algorithm applied on the histogram equalized<br />

images, the RGB AD-based algorithm applied on the Retinex enhanced images, the RGB AD-based algorithm applied<br />

on the images enhanced according to Vonikakis et al. (2008)<br />

The input images <strong>of</strong> Figure 3.35 exhibit large variations <strong>and</strong> can be considered as extreme cases<br />

<strong>of</strong> various lighting difficulties. They challenge every stereo algorithm, but at the same time they have<br />

to be confronted since they can be found in environments where robots have to operate. While the<br />

produced disparity maps are not absolutely accurate, the presented LCDM dissimilarity measure


Chapter 3. <strong>Stereo</strong> Correspondence Algorithms 91<br />

can compensate <strong>for</strong> the illumination non-uni<strong>for</strong>mities to a large extend, generally outper<strong>for</strong>ming<br />

the AD-based methods. Summarizing the results presented in this <strong>and</strong> the previous Sections it<br />

can be deduced that, although not always the best, the presented algorithm exhibits a robust<br />

<strong>and</strong> trustworthy behavior in all the examined image sets. On the other h<strong>and</strong>, other algorithms<br />

may produce good results in some image sets but not consistently in all <strong>of</strong> them. As a result, the<br />

presented algorithm <strong>and</strong> in particular the presented LCDM dissimilarity measure can effectively<br />

<strong>and</strong> regularly face difficult lighting situations.<br />

3.5.4 Discussion<br />

Anewilluminationinvariantdissimilaritymeasure,theluminosity-compensateddissimilaritymeasure<br />

(LCDM) has been presented. The motivation behind the presented dissimilarity measure <strong>and</strong><br />

stereo algorithm were the problems occurring when using stereo image processing in robots tested in<br />

actual outdoor environments. Such environments do not guaranty uni<strong>for</strong>m illumination conditions,<br />

regardless <strong>of</strong> the camera position. As a consequence, the same feature may exhibit different intensity<br />

value in different images. The new measure can substitute the traditional RGB intensity-based dissimilarity<br />

measures (e.g. AD or SD) in almost any <strong>of</strong> the available stereo algorithms. Using the HSL<br />

colorspace <strong>and</strong> being calculated on the Hue-Saturation plane, the presented LCDM is able to compensate<br />

<strong>for</strong> lighting differentiations <strong>and</strong> provide with robust <strong>and</strong> reliable results. As many robotic<br />

<strong>and</strong> machine vision applications rely on the accuracy <strong>of</strong> stereo algorithms’ results, the presented<br />

measure could be an ideal choice.<br />

The presented LCDM was tested on various image pairs <strong>and</strong> was compared to the simple AD<br />

measure. In order to obtain reliable, non-biased results <strong>for</strong> comparison, two identical state <strong>of</strong> the<br />

art stereo algorithms were developed <strong>and</strong> tested. These stereo algorithms used a gestalt-based ASW<br />

aggregation scheme. The only difference was that the first version used the presented LCDM while<br />

the second the AD as a dissimilarity measure. Tests with various image sets <strong>and</strong> different lighting<br />

conditions have shown that the presented LCDM exhibits good <strong>and</strong> moreover robust <strong>and</strong> reliable<br />

behavior.<br />

Moreover, the presented algorithm was tested against the ZNCC algorithm <strong>and</strong> the algorithm<br />

<strong>of</strong> Ogale <strong>and</strong> Aloimonos, which both are able to confront lightness non-uni<strong>for</strong>mities. Regarding the<br />

ZNCC algorithm, the window dimensions that resulted in effective noise suppression, i.e. 15 × 15<br />

pixels, could not preserve the fine details <strong>of</strong> the scene <strong>and</strong> as a result produced higher error rates,<br />

compared to the presented algorithm, near depth discontinuities regions. Moreover, the ZNCC<br />

algorithm is structurally different from algorithms based on the others dissimilarity measures, i.e.<br />

LCDM, AD, SD. While ZNCC simultaneously compute the dissimilarity inside a support region,<br />

the others compute the dissimilarity <strong>of</strong> single pixel pairs <strong>and</strong> then during a different step they<br />

aggregate the results. Consequently, the ZNCC algorithm is inevitably computationally intensive<br />

<strong>and</strong> thus inappropriate <strong>for</strong> robotics applications, which should be reasonably fast, as update rate<br />

is critical. On the other h<strong>and</strong>, the other measures can be adopted in different aggregation schemes,<br />

thus resulting in schemes <strong>of</strong> desirable computational load. This last feature, is highly desirable in<br />

robotic applications.<br />

The algorithm <strong>of</strong> Ogale <strong>and</strong> Aloimonos propose a compositional approach that unifies many early<br />

visual modules. As a result this method can robustly process images with contrast, among others,<br />

mismatches. Its results are remarkably accurate <strong>for</strong> conditions that do not significantly deviate from


92 Chapter 3. <strong>Stereo</strong> Correspondence Algorithms<br />

the ideal ones. However, even if this method can process contrast differentiations it does not exhibit<br />

the same behavior <strong>for</strong> luminosity differentiations.<br />

In conclusion, the presented dissimilarity measure is able to compensate <strong>for</strong> luminosity nonuni<strong>for</strong>mities<br />

<strong>and</strong> at the same time preserve the details <strong>of</strong> the scene. Additionally, it is easy to<br />

be computed in very high frame rates as its computational load is very small compared to other<br />

measures such as the ZNCC. The presented measure can be embodied as the first stage <strong>of</strong> a stereo<br />

algorithm whose speed <strong>and</strong> complexity are subjects to the dem<strong>and</strong>s. The MATLAB implementation<br />

<strong>of</strong> the presented stereo algorithm that is based on the presented LCDM measure is not fast enough to<br />

achieve the real- or near real-time frame rates dem<strong>and</strong>ed by robotic applications. However, a C++<br />

version could be reasonably fast. Finally, considering all the a<strong>for</strong>ementioned features combined, it<br />

can be concluded that the presented dissimilarity measure <strong>and</strong> the resulting presented stereo vision<br />

algorithm are ideal c<strong>and</strong>idates <strong>for</strong> robotic applications.


Chapter 4<br />

Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong><br />

Robotics <strong>of</strong>ten replicate human modalities in order to achieve autonomous behaviors (Russell et al.<br />

2004). Above all senses, vision is the most important one to humans. Moreover we have structured<br />

our environments based on this fact. Thus, it comes naturally that autonomous robotics can be<br />

greatly benefited by employing vision methods (Santini et al. 2009).<br />

In this Chapter robotic applications based on stereo vision systems are presented. The stereo<br />

correspondence algorithms that <strong>for</strong>m the bases <strong>of</strong> these applications are some <strong>of</strong> the ones presented<br />

in the previous Chapter. The robotic applications covered involve new, computationally efficient<br />

obstacle avoidance <strong>and</strong> SLAM algorithms.<br />

4.1 <strong>Stereo</strong> <strong>Vision</strong>-based Obstacle Avoidance Algorithm<br />

In order to achieve reliable obstacle avoiding behavior, many popular methods involve the use <strong>of</strong><br />

artificial stereo vision systems. As affirmed by its biomimetic origin (Gutmann et al. 2005, Sabe<br />

et al. 2004), stereoscopic vision can be effectively used in order to derive the depth map <strong>of</strong> a scene.<br />

The two versions <strong>of</strong> the vision-based obstacle avoidance algorithm presented in this Section provide<br />

efficient solutions that use a minimum <strong>of</strong> sensors <strong>and</strong> avoid, as much as possible, computationally<br />

complex processes. The only sensor required is a stereo camera. First, a simple modular algorithm<br />

is presented. It employs a stereo algorithm, which is essentially the same stereo algorithm covered<br />

previously in Section 3.1, <strong>and</strong> a threshold-based decision making algorithm that analyzes the depth<br />

maps <strong>and</strong> deduces the most appropriate direction <strong>for</strong> the robot to avoid any existing obstacles.<br />

Then, an improved version <strong>of</strong> the algorithm is presented using a fuzzy decision making algorithm<br />

instead. The presented methodologies are tested on sequences <strong>of</strong> self-captured outdoor images <strong>and</strong><br />

their results are evaluated.<br />

The contribution <strong>of</strong> the two versions <strong>of</strong> the developed algorithm is to provide lightweight approaches<br />

<strong>for</strong> obstacle avoidance with the sole use <strong>of</strong> a stereoscopic camera. The use <strong>of</strong> only one<br />

sensor <strong>and</strong> specifically <strong>of</strong> a stereoscopic camera diminish the complexity <strong>of</strong> the system <strong>and</strong> allows<br />

<strong>for</strong> easy integration with other vision tasks, such as object recognition or tracking.<br />

93


94 Chapter 4. Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong><br />

4.1.1 Threshold Algorithm Description<br />

The presented vision-based obstacle avoidance algorithm is intended to be used in autonomous<br />

mobile robotics. The development <strong>of</strong> an efficient, solely vision-based method <strong>for</strong> mobile robot navigation<br />

is still an active research topic. Towards this direction, the first step is to avoid any obstacles<br />

through vision. However, systems placed on robots have to con<strong>for</strong>m to the restrictions imposed<br />

by them. Autonomous robot navigation requires almost real-time frame rates from the responsible<br />

algorithms. Furthermore, computing resources are strictly limited onboard a robot. Thus, the<br />

omission <strong>of</strong> popular obstacle detection techniques such as the v-disparity, which require Houghtrans<strong>for</strong>mations,<br />

would be highly appreciated. Instead, simple <strong>and</strong> efficient solutions are dem<strong>and</strong>ed.<br />

The developed algorithm is based only on a stereo camera. The core <strong>of</strong> the presented approach<br />

can be divided into two separate <strong>and</strong> independent algorithms:<br />

• The stereo vision algorithm. It retrieves in<strong>for</strong>mation about the environment from a stereo camera<br />

<strong>and</strong> produces a depth image, i.e. disparity map, <strong>of</strong> the scenery.<br />

• The threshold-based decision making algorithm. It analyzes the data <strong>of</strong> the previous algorithm<br />

<strong>and</strong> decides the best direction, i.e. <strong>for</strong>ward, right or left, <strong>for</strong> the robot to move in order to avoid<br />

any existing obstacles.<br />

The modularity <strong>of</strong> the system allows the easy modification, easy debugging <strong>and</strong> ensures the adaptability<br />

<strong>of</strong> the overall algorithm. Figure 4.1 presents the flow chart <strong>of</strong> the implemented algorithm.<br />

<strong>Stereo</strong> <strong>Vision</strong><br />

The stereo correspondence algorithm upon which the presented obstacle avoidance algorithm is<br />

based is essentially the one covered in Section 3.1. However, there are a number <strong>of</strong> differences:<br />

• Additionally to the previously mentioned stereo algorithm, which directly uses the camera’s images,<br />

this version uses an enhanced version <strong>of</strong> the captured images as input. The initially captured<br />

images are processed in order to extract the edges in the depicted scene. The utilized edge detecting<br />

method is the Laplacian <strong>of</strong> Gaussian (LoG), using a zero threshold. This choice produces<br />

the maximum possible edges. The LoG edge detection method smoothens the initial images with<br />

a Gaussian filter in order to suppress any possible noise. Then a Laplacian kernel is applied that<br />

marks regions <strong>of</strong> significant intensity change. Actually, the combined LoG filter is applied at<br />

once <strong>and</strong> the zero crossings are found. The extracted edges are, afterwards, superimposed to the<br />

initial images. The steps <strong>of</strong> the a<strong>for</strong>ementioned process are shown in Figure 4.2. The outcome <strong>of</strong><br />

this procedure is a new version <strong>of</strong> the original images having more striking features <strong>and</strong> textured<br />

surfaces, which facilitate the following stereo matching procedure.<br />

• The matching cost function utilized is the truncated AD. The AD are truncated if they excess<br />

the 4% <strong>of</strong> the maximum intensity value. Truncation suppresses the influence <strong>of</strong> noise in the final<br />

result. This is very important <strong>for</strong> stereo algorithms that are intended to be applied to outdoor<br />

scenes. Outdoor pairs usually suffer from noise induced by a variety <strong>of</strong> reasons, e.g. lighting<br />

differences <strong>and</strong> reflections.<br />

• Moreover the use <strong>of</strong> CA is absent in this version <strong>of</strong> the stereo correspondence algorithm <strong>for</strong><br />

simplicity reasons.


Chapter 4. Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong> 95<br />

Fig. 4.1 Flow chart <strong>of</strong> the implemented threshold-based obstacle avoidance algorithm<br />

!<br />

"#$%$&!'()$%(!<br />

!<br />

*$+#!,)(-$!<br />

!<br />

./-0#!,)(-$!<br />

!<br />

!<br />

Fig. 4.2 Image enhancement steps <strong>of</strong> the presented stereo algorithm<br />

!<br />

!<br />

12-$!13#%(4#/&5!<br />

!<br />

12-$!13#%(4#/&5!<br />

!<br />

!<br />

"67$%/)7&8/#/&5!<br />

!<br />

"67$%/)7&8/#/&5!<br />

!<br />

!<br />

!<br />

9$7#0!:(7!<br />

!<br />

!


96 Chapter 4. Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong><br />

As mentioned be<strong>for</strong>e, the resulting disparity maps are equivalent to depth maps <strong>of</strong> the depicted<br />

scene <strong>and</strong> can be used directly <strong>for</strong> the subsequent obstacle analysis.<br />

Threshold Direction Decision Algorithm<br />

The previously calculated disparity map is used to extract useful in<strong>for</strong>mation about the navigation<br />

<strong>of</strong> a robot. Contrary to many implementations that involve complex calculations upon the disparity<br />

map, the presented decision making algorithm involves only simple summations <strong>and</strong> threshold<br />

checks. This is feasible due to the absence <strong>of</strong> significant noise in the produced disparity map. The<br />

goal <strong>of</strong> the developed algorithm is to detect any existing obstacles in front <strong>of</strong> the robot <strong>and</strong> to safely<br />

avoid it, by steering the robot left, right or to moving it <strong>for</strong>ward.<br />

In order to achieve that, the developed method divides the disparity map into three windows,<br />

as in Figure 4.3. The division <strong>of</strong> the disparity map excludes the boundary regions, in this case<br />

a peripheral frame <strong>of</strong> 20 pixels width, because the disparity calculation in such regions is <strong>of</strong>ten<br />

problematic.<br />

Fig. 4.3 Depth map’s division in three windows<br />

In the central window, the pixels p whose disparity value D(p) is greater than a defined threshold<br />

value T are enumerated. Then, the enumeration result is examined. If it is smaller than a predefined<br />

rate r <strong>of</strong> all the central window’s pixels, this means that there are no obstacles detected exactly in<br />

front <strong>of</strong> the robot <strong>and</strong> in close distance, <strong>and</strong> thus the robot can move <strong>for</strong>ward. On the other h<strong>and</strong>, if<br />

this enumeration’s result exceeds the predefined rate, the algorithm examines the other two windows<br />

<strong>and</strong> chooses the one with the smaller average disparity value. In this way the window with the fewer<br />

obstacles will be selected. The pseudocode <strong>of</strong> the implemented simple decision making algorithm<br />

follows:<br />

Threshold-based Decision Making Pseudocode<br />

<strong>for</strong> all the pixels p <strong>of</strong> the central window {<br />

if D(p) > T {


Chapter 4. Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong> 97<br />

counter++ }<br />

numC++ }<br />

if counter < r% <strong>of</strong> numC {<br />

GO STRAIGHT }<br />

else {<br />

<strong>for</strong> all the pixels p <strong>of</strong> the left window {<br />

sumL =+ D(p)<br />

numL++ }<br />

<strong>for</strong> all the pixels p <strong>of</strong> the right window {<br />

sumR =+ D(p)<br />

numR++ } }<br />

avL = sumL / numL<br />

avR = sumR / sumR<br />

if avL


98 Chapter 4. Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong><br />

Fig. 4.4 A sample outdoor route <strong>and</strong> the algorithm’s outputs<br />

would be meaningful only in the cases <strong>of</strong> the left or right decisions. As these two decisions are<br />

based on the same heuristic, they can be directly compared. The certainty cert <strong>of</strong> a direction’s<br />

decision which yields an average disparity avD1 over the other direction which yields avD2 >avD1<br />

is calculated as:<br />

cert = avD2 − avD1<br />

(4.1)<br />

avD2<br />

The results <strong>for</strong> the left <strong>and</strong> right decisions <strong>of</strong> the algorithm are shown in Figure 4.6. For each<br />

decision the pair’s indicating number as well as the algorithm’s decision is given. The certainty<br />

ranges <strong>for</strong>m 0% <strong>for</strong> no certainty at all, to 100% <strong>for</strong> absolute certainty. The bigger the area defined<br />

by the resulting points, the bigger the algorithm’s overall certainty. However, big values <strong>of</strong> certainty<br />

are not always achievable. In the extreme case when both the left <strong>and</strong> the right direction are<br />

fully traversable, the certainty measure would become 0%. Despite this fact, the certainty is useful.<br />

Observing the correlation between false decisions <strong>and</strong> certainty values, a threshold could be decided,<br />

below which the algorithm should reconsider its decision.


Chapter 4. Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong> 99<br />

Fig. 4.5 Percentage <strong>of</strong> the algorithm’s correct decision<br />

$$#/0123#<br />

$!#*+,#<br />

$)#/0123#<br />

('#/0123#<br />

(!!"#<br />

)#*+,#<br />

'!"#<br />

&!"#<br />

%!"#<br />

$!"#<br />

!"#<br />

(.#/0123#<br />

Fig. 4.6 Percentage <strong>of</strong> certainty <strong>for</strong> the algorithm’s decisions<br />

4.1.3 Fuzzy Algorithm Description<br />

-#*+,#<br />

(%#*+,#<br />

.#/0123#<br />

((#*+,#<br />

An improved version <strong>of</strong> the threshold-based obstacle avoidance algorithms has been developed<br />

afterwards. Again, the algorithm requires only one stereo camera as input <strong>and</strong> consists <strong>of</strong> two<br />

independent modules. The first module is exactly the same stereo correspondence algorithm used<br />

in the threshold algorithm. The second module is a fuzzy decision making algorithm that analyzes<br />

the depth maps.<br />

• The stereo vision algorithm. It retrieves in<strong>for</strong>mation about the environment from a stereo camera<br />

<strong>and</strong> produces a depth image, i.e. disparity map, <strong>of</strong> the scene.


100 Chapter 4. Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong><br />

• The fuzzy decision making algorithm. It analyzes the data <strong>of</strong> the previous algorithm <strong>and</strong> decides<br />

the best direction <strong>for</strong> the robot to move so as to avoid any existing obstacles, based on a simple<br />

fuzzy inference system (FIS).<br />

The presented method processes each pair <strong>of</strong> stereoscopic images <strong>and</strong> indicates an obstacleavoiding<br />

direction <strong>of</strong> movement <strong>for</strong> a robot, such as the one shown in Figure 4.7(a). First, the<br />

stereo image pair is given as input to a stereo vision algorithm <strong>and</strong> a depth map <strong>of</strong> the scene is<br />

obtained. This depth map is thereafter used as input <strong>of</strong> the fuzzy obstacle analysis <strong>and</strong> direction<br />

decision module. This fuzzy module indicates the proper direction <strong>of</strong> movement. The direction <strong>of</strong><br />

movement ranges from −30 o to +30 o ,considering0 o as the current direction <strong>of</strong> the robot. This<br />

angle range is dictated by the used stereo camera, i.e. a Bumblebee2 stereo camera manufactured<br />

by Point Grey Research, having a 60 o horizontal field <strong>of</strong> view (HFoV). Furthermore, in cases when<br />

the scene is full <strong>of</strong> obstacles or the depth map is too noisy to conclude safely a "move backwards"<br />

signal is <strong>for</strong>eseen. Figure 4.7(b) presents the mobile robot, shown as the "R" in the center, <strong>and</strong> the<br />

possible positions after the application <strong>of</strong> the presented algorithm, shown by the bold regions <strong>of</strong> the<br />

outer circle.<br />

(a) (b)<br />

Fig. 4.7 (a) <strong>Stereo</strong> camera equipped mobile robotic plat<strong>for</strong>m <strong>and</strong> (b) floor plan <strong>of</strong> the robot’s environment<br />

This method also divides each disparity map into three equal windows, in exactly the same way<br />

that has been shown in Figure 4.3. However, this algorithm treats all the three windows identically.<br />

In each window w, thepixelsp whose disparity value D(p) is greater than a defined threshold value<br />

T are enumerated. The enumeration results are normalized towards the widow’s pixels population<br />

<strong>and</strong> then examined. The more traversable the corresponding direction is the smaller the enumeration<br />

result should be. Thus, the traversability <strong>of</strong> the left, central <strong>and</strong> right window, respectively T RAVL,<br />

T RAVC <strong>and</strong> T RAVR, isassessed.<br />

The results <strong>of</strong> the traversability estimation <strong>for</strong> the three windows, i.e. the left, central, <strong>and</strong> right<br />

one, are used as the three input values <strong>of</strong> a FIS that decides the proper direction <strong>of</strong> movement <strong>for</strong><br />

the robot. The outputs <strong>of</strong> the FIS are the angle <strong>of</strong> the direction that the robot should follow <strong>and</strong> an<br />

indicator that the robot should move backwards. Figure 4.8 shows the membership functions (MF)<br />

<strong>for</strong> the three inputs (all having identical MF, which is shown in Figure 4.8(a)) <strong>and</strong> the two outputs<br />

(Figure 4.8(b) <strong>and</strong> 4.8(c)).


Chapter 4. Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong> 101<br />

("<br />

!#'"<br />

!#&"<br />

!#%"<br />

!#$"<br />

!"<br />

!" !#$" !#%" !#&" !#'" ("<br />

)*+,-"<br />

(a) Input MF: traversability <strong>of</strong> the left/central/right window<br />

("<br />

!#'"<br />

!#&"<br />

!#%"<br />

!#$"<br />

!"<br />

!#'"<br />

!#&"<br />

!#%"<br />

!#$"<br />

Fig. 4.8 Fuzzy membership functions<br />

("<br />

!"<br />

)*!" )$!" )(!" !" (!" $!" *!"<br />

(b) Output MF: direction angle<br />

!" !#$" !#%" !#&" !#'" ("<br />

(c) Output MF: "move backwards" indicator<br />

+,-"<br />

./01203"<br />

45678"<br />

)*+,-"


Chapter 4. Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong> 103<br />

(a) (b)<br />

(c) (d)<br />

(e) (f)<br />

(g) (h)<br />

(i) (j)<br />

(k) (l)<br />

Fig. 4.9 Test images <strong>and</strong> disparity maps where the algorithm chose to move <strong>for</strong>ward<br />

As shown by the value <strong>of</strong> the parameter "move backwards" <strong>for</strong> the pairs <strong>of</strong> Figure 4.10(c) <strong>and</strong><br />

4.11(d) there is a relatively high tendency to adopt the option to move backwards in these cases.<br />

However, since the threshold value had been set to 0.65 the robot chose to steer left <strong>and</strong> right<br />

respectively. On the other h<strong>and</strong>, Figure 4.12 <strong>and</strong> Table 4.4 show a scene <strong>and</strong> the respective FIS<br />

variables where the robot actually decided to move backwards. Other threshold value <strong>for</strong> the "move<br />

backwards" parameter would had lead to other behaviors <strong>for</strong> the last three cases.


104 Chapter 4. Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong><br />

Table 4.1 Results <strong>for</strong> the cases where the algorithm chose to move <strong>for</strong>ward<br />

Image pair Inputs Outputs<br />

Figure Left (%) Central (%) Right (%) Angle (deg) "move backwards"<br />

4.9(a) 0.03 1.68 1.14 0.0 0.50<br />

4.9(b) 0.03 0.02 0.80 0.0 0.50<br />

4.9(c) 19.82 9.82 5.87 0.0 0.50<br />

4.9(d) 54.05 5.65 5.23 0.0 0.50<br />

4.9(e) 4.95 6.00 3.91 0.0 0.50<br />

4.9(f) 5.20 8.16 9.34 0.0 0.50<br />

4.9(g) 20.71 15.21 10.52 0.0 0.50<br />

4.9(h) 18.72 11.65 15.38 0.0 0.50<br />

4.9(i) 13.06 14.60 9.01 0.0 0.50<br />

4.9(j) 71.42 16.05 4.83 0.0 0.50<br />

4.9(k) 9.84 8.44 8.08 0.0 0.50<br />

4.9(l) 0.85 0.13 3.81 0.0 0.50<br />

Fig. 4.10 Test images <strong>and</strong> disparity maps where the algorithm chose to move left<br />

(a)<br />

(b)<br />

(c)<br />

(d)


Chapter 4. Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong> 105<br />

Table 4.2 Results <strong>for</strong> the cases where the algorithm chose to move left<br />

Image pair Inputs Outputs<br />

Figure Left (%) Central (%) Right (%) Angle (deg) "move backwards"<br />

4.10(a) 8.75 89.62 44.53 -17.1 0.50<br />

4.10(b) 9.71 37.54 43.81 -16.6 0.50<br />

4.10(c) 22.86 38.60 43.37 -12.0 0.56<br />

4.10(d) 3.94 73.83 82.03 -19.0 0.50<br />

Fig. 4.11 Test images <strong>and</strong> disparity maps where the algorithm chose to move right<br />

Table 4.3 Results <strong>for</strong> the cases where the algorithm chose to move right<br />

(a)<br />

(b)<br />

(c)<br />

(d)<br />

Image pair Inputs Outputs<br />

Figure Left (%) Central (%) Right (%) Angle (deg) "move backwards"<br />

4.11(a) 81.45 66.78 7.62 +18.6 0.50<br />

4.11(b) 83.79 85.95 14.94 +19.3 0.50<br />

4.11(c) 67.27 80.19 9.28 +18.6 0.50<br />

4.11(d) 67.83 67.51 36.54 +9.1 0.61


106 Chapter 4. Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong><br />

Fig. 4.12 Test images <strong>and</strong> disparity map where the algorithm chose to move backwards<br />

Table 4.4 Results <strong>for</strong> the cases where the algorithm chose to move backwards<br />

4.1.5 Discussion<br />

Image pair Inputs Outputs<br />

Figure Left (%) Central (%) Right (%) Angle (deg) "move backwards"<br />

4.12 70.94 87.21 73.51 -2.6 0.69<br />

In order mobile robots to move towards human-like behaviors autonomous navigation is an essential<br />

milestone. Obstacle avoidance, using a minimum <strong>of</strong> sensory <strong>and</strong> processing resources is the first<br />

step to this direction. In this Section two versions <strong>of</strong> a vision-based obstacle avoidance algorithm<br />

<strong>for</strong> autonomous mobile robots have been presented. The presented algorithms require only one<br />

sensor, i.e. a stereo camera, <strong>and</strong> a low amount <strong>of</strong> involved computations. The algorithms’ structure<br />

consists <strong>of</strong> a specially developed <strong>and</strong> optimized stereo algorithm that produces noise-free depth<br />

maps, <strong>and</strong> a computationally simple decision making algorithm. The decision making algorithm<br />

avoids complex calculations <strong>and</strong> trans<strong>for</strong>mations. Consider as an example, the case <strong>of</strong> the popular vdisparity<br />

implementation where Hough-trans<strong>for</strong>mation is needed in order to compensate <strong>for</strong> the low<br />

quality disparity maps. On the other h<strong>and</strong>, simpler than the presented direction deciding algorithms<br />

fail to yield correct results. In this case, consider an algorithm where the three windows <strong>of</strong> Fig. 4.3<br />

are treated equally <strong>and</strong> the smallest average disparity is sought. This methodology is doomed to<br />

fail in the case, among many others, where only a thin obstacle is close to the robot <strong>and</strong> other<br />

obstacles are in medium range. Such a naive algorithm would chose the direction towards the close<br />

thin obstacle, avoiding the medium ranged obstacles.<br />

Firstly, the threshold-based version <strong>of</strong> the algorithm has been presented <strong>and</strong> tested on sequences<br />

<strong>of</strong> self-captured outdoor images. Its per<strong>for</strong>mance has been presented <strong>and</strong> discussed. The presented<br />

algorithm managed to successfully avoid obstacles in the vast majority <strong>of</strong> the tested image pairs.<br />

Despite its simple calculations, both during the disparity map generation <strong>and</strong> the decision making,<br />

the algorithm exhibited promising behavior.<br />

Then, an improved fuzzy-based version <strong>of</strong> the obstacle avoidance algorithm has been covered. The<br />

presented method is based on the same custom developed stereo algorithm <strong>and</strong> a simple but effective<br />

fuzzy obstacle analysis <strong>and</strong> direction decision module. The robot executing the presented algorithm<br />

has effectively detected <strong>and</strong> avoided any obstacles using only stereo vision as input. The behavior <strong>of</strong><br />

the method has been validated by real outdoor data sets <strong>of</strong> various scenes. The algorithm exhibits<br />

robust behavior <strong>and</strong> is able to ensure collision-free autonomous mobility to robots. Moreover, the<br />

trajectory <strong>of</strong> the robot’s overall movement is smooth, resembling that <strong>of</strong> living creatures, due to<br />

the fuzzy system’s continuous range <strong>of</strong> output values.<br />

The simple structure <strong>of</strong> the presented algorithms <strong>and</strong> the absence <strong>of</strong> heavy computational payload<br />

are characteristics highly desirable in autonomous robotics. The real-time collision-free navigation


Chapter 4. Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong> 107<br />

<strong>of</strong> autonomous robotic plat<strong>for</strong>ms is the first step towards the accomplishment <strong>of</strong> more complex<br />

activities, e.g. path planning <strong>and</strong> mapping <strong>of</strong> an area. Consequently, the presented algorithms are<br />

suitable <strong>for</strong> autonomous robotic applications <strong>and</strong> are able to provide real-time obstacle avoidance<br />

behavior, based solely on stereo vision.<br />

4.2 <strong>Stereo</strong> <strong>Vision</strong>-based SLAM<br />

A visual SLAM algorithm suitable <strong>for</strong> indoor applications has been developed. The algorithm is<br />

focused on computational effectiveness. The only sensor used is a stereo camera placed onboard a<br />

moving robot. The algorithm processes the acquired images calculating the depth <strong>of</strong> the scenery,<br />

detecting occupied areas <strong>and</strong> progressively building a map <strong>of</strong> the environment. The stereo visionbased<br />

SLAM algorithm embodies a custom-tailored stereo correspondence algorithm, the robust<br />

scale- <strong>and</strong> rotation-invariant feature detection <strong>and</strong> matching SURF method, a computationally effective<br />

v-disparity image calculation scheme, a novel map-merging module, as well as a sophisticated<br />

CA-based enhancement stage. The presented algorithm is suitable <strong>for</strong> autonomously mapping <strong>and</strong><br />

measuring indoor areas using robots.<br />

The presented SLAM approach adopts a simple solution that avoids complex update strategies<br />

in favor <strong>of</strong> a computationally efficient one. Emphasis has been given to the development <strong>of</strong> customtailored,<br />

non-iterative solutions <strong>for</strong> each step <strong>of</strong> the presented algorithm’s execution. The specially<br />

developed stereo correspondence algorithm is a rapidly executed local SAD algorithm embodying<br />

Gaussian weighted aggregation <strong>and</strong> a double validation scheme based on a certainty estimation<br />

criterion, bidirectional consistency check <strong>and</strong> sub-pixel accuracy. Concerning the camera’s motion<br />

estimation, the SURF feature detector <strong>and</strong> matcher (Bay et al. 2008) has been utilized as the first<br />

step <strong>of</strong> an efficient estimation method. This estimation is further refined afterwards during a sophisticated<br />

map merging procedure <strong>and</strong> sharpened up by CA. The presented algorithm progressively<br />

builds a map <strong>of</strong> the environment, based entirely on stereo vision in<strong>for</strong>mation. The produced maps<br />

indicate the occupied <strong>and</strong> free regions <strong>of</strong> the explored environment. The outline <strong>of</strong> the presented<br />

algorithm is summarized in Figure 4.13.<br />

-*)+"&.,/)#&<br />

0#*%?:&<br />

.,/)#&<br />

@?A#$&<br />

!"#$#%&<br />

'()%$*"+,&<br />

Fig. 4.13 Outline <strong>of</strong> the presented SLAM algorithm<br />

0%1/(&2/3&<br />

4#5#$/6%5&<br />

8/,#$/9:&&<br />

2%6%5&<br />

;:6,/6%5&<br />

4(%7/(&<br />

2/3&<br />

4#5#$/6%5&<br />

=$#>*%?:&<br />

2/3&<br />

@?A#$&<br />

8/,#$/9:&<br />

2%6%5&<br />

4(%7/(&2/3&


Chapter 4. Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong> 109<br />

(a) (b)<br />

Fig. 4.14 Reference image (a) <strong>of</strong> an indoor scene <strong>and</strong> sparse disparity map (b) obtained with the presented stereo<br />

correspondence algorithm<br />

4.2.2 Camera’s Motion Estimation<br />

While depth <strong>of</strong> the depicted objects is obtained by examining the two images <strong>of</strong> each stereo image<br />

pair, the motion <strong>of</strong> the camera is estimated by correlating the reference images from two consecutive<br />

image pairs, as shown in Figure 4.15.<br />

Fig. 4.15 Depth vs. camera’s motion estimation


110 Chapter 4. Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong><br />

Feature Detection <strong>and</strong> Matching<br />

Feature detection <strong>and</strong> matching has become a very attractive <strong>and</strong> useful field <strong>for</strong> many computer<br />

vision applications. Among the variety <strong>of</strong> possible detectors <strong>and</strong> descriptors this work has embodied<br />

SURF, as described in (Bay et al. 2008). SURF is a scale <strong>and</strong> rotation invariant detector <strong>and</strong><br />

descriptor. It has the advantages <strong>of</strong> achieving high repeatability, distinctiveness <strong>and</strong> robustness.<br />

However, the most attractive feature <strong>of</strong> SURF is its computational efficiency, which allows very<br />

fast computation times. Preliminary experiments have confirmed the accuracy <strong>and</strong> effectiveness <strong>of</strong><br />

SURF <strong>for</strong> the examined situations. SURF is given with two consecutive reference images as input<br />

<strong>and</strong> provides as output a list containing the coordinates <strong>of</strong> N matched features in the two images.<br />

(a) Local map (b) Initial global map<br />

(c) Updated global map (d) CA enhanced global map<br />

Fig. 4.16 Environment’s maps <strong>for</strong> the scene <strong>of</strong> Figure 4.14 obtained with the presented algorithm


112 Chapter 4. Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong><br />

4.2.3 Local Map Generation<br />

Alocal2Dmapiscomputedfromeachstereoimagepair.Figure4.14(a)presentsthereference<br />

image <strong>of</strong> a test image pair. Using the sparse disparity map obtained (see Figure 4.14(b)) a reliable<br />

v-disparity image can be computed (Labayrade et al. 2002, Zhao et al. 2007), as shown in Figure<br />

4.17(a). The terrain is modeled in the v-disparity image by a linear equation. The parameters <strong>of</strong><br />

this linear equation can be found using Hough trans<strong>for</strong>m (De Cubber et al. 2009), if the cameraenvironment<br />

system’s geometry is unknown. However, if the geometry <strong>of</strong> the system is constant<br />

<strong>and</strong> known (which is the case <strong>for</strong> a camera firmly mounted on a robot exploring a flat, e.g. indoor,<br />

environment) the two parameters can be easily computed be<strong>for</strong>eh<strong>and</strong> <strong>and</strong> used in all the image<br />

pairs during the exploration. A tolerance region on either side <strong>of</strong> the terrain’s linear segment is<br />

considered <strong>and</strong> any point outside this region is considered as an "obstacle". The linear segments<br />

denoting the terrain <strong>and</strong> the tolerance region overlaid on the v-disparity image are shown in Figure<br />

4.17(b).<br />

(a) Calculated v-disparity image (b) V-disparity image with terrain modeled by the<br />

continuous line <strong>and</strong> the tolerance region shown between<br />

the two dashed lines<br />

Fig. 4.17 V-disparity images <strong>for</strong> the image <strong>of</strong> Figure 4.14(a) <strong>and</strong> the corresponding disparity map <strong>of</strong> Figure 4.14(b)<br />

For each pixel corresponding to an "obstacle" the local coordinates are computed. The local map,<br />

e.g. the one shown in Figure 4.16(a), is an occupancy grid <strong>of</strong> the environment consisting <strong>of</strong> all the<br />

points corresponding to "obstacles".<br />

4.2.4 Global Map Generation<br />

The motion estimation technique gives the relative translation T <strong>and</strong> rotation R required to superimpose<br />

the new local map, Figure 4.16(a), to the global map accumulated up to that point, Figure<br />

4.16(b). However, the situation <strong>of</strong> perfectly matched features that result in exactly precise T <strong>and</strong> R


114 Chapter 4. Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong><br />

images <strong>of</strong> the used dataset, as detected using the SURF algorithm. It can be seen that there are<br />

some faulty matches. However, the presented algorithm is not significantly affected by such cases,<br />

as shown by the accuracy <strong>of</strong> the results shown in Figure 4.19.<br />

Fig. 4.18 Features detected <strong>and</strong> matched using SURF <strong>for</strong> various consecutive images <strong>of</strong> the used dataset<br />

(a)<br />

(b)<br />

(c)<br />

In Figure 4.19 the first column (a), presents the reference images <strong>of</strong> the first, second, sixth, <strong>and</strong><br />

tenth image pair <strong>of</strong> the tested image series. The differences in the illumination conditions are evident,<br />

especially in the image <strong>of</strong> the third row. The second column, Figure 4.19(b), presents the sparse<br />

disparity maps computed with the used stereo algorithm. One can observe that very little falsely<br />

matched pixels have been produced. However, the overall coverage <strong>of</strong> the scene is more than enough<br />

in order to detect any obstacles in the scene. The third column, Figure 4.19(c), shows the computed<br />

local maps, i.e. the occupancy grids <strong>of</strong> the obstacles detected in the corresponding disparity maps.<br />

On the other h<strong>and</strong>, the fourth column, Figure 4.19(d), shows the global maps <strong>of</strong> the environment


Chapter 4. Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong> 115<br />

(a) Reference images <strong>of</strong><br />

the scene<br />

(b) Sparse disparity<br />

maps <strong>of</strong> the scene<br />

(c) Local maps (d) Updated global<br />

maps<br />

(e) CA enhanced<br />

global maps<br />

Fig. 4.19 Experimental results after processing 1 (first row), 2 (second row), 6 (third row), <strong>and</strong> 10 (fourth row)<br />

image pairs <strong>of</strong> the scene<br />

containing the accumulated local maps up to that point. In the first image pair, the local <strong>and</strong> the<br />

global maps are identical since no prior knowledge about the environment existed. The gradual<br />

superimposition <strong>of</strong> further local maps is remarkably accurate <strong>and</strong> results in clear arrangements<br />

<strong>of</strong> "obstacle"-points. The final column <strong>of</strong> the figure, Figure 4.19(e), shows the global maps after<br />

the presented CA enhancement. This procedure, makes the sparse in<strong>for</strong>mation <strong>of</strong> the global maps<br />

continuous <strong>and</strong> more clear.<br />

As shown by the lower-right image <strong>of</strong> Figure 4.19 the result <strong>of</strong> the algorithm with only ten input<br />

image pairs is a clear <strong>and</strong> reliable representation <strong>of</strong> the obstacles found in the hallway. The walls,<br />

the closed door straight ahead <strong>of</strong> the camera, <strong>and</strong> some open doors are all clearly visible.


116 Chapter 4. Robotic Applications <strong>of</strong> <strong>Stereo</strong> <strong>Vision</strong><br />

4.2.6 Discussion<br />

The presented stereo vision-based SLAM algorithm incorporates new methodologies <strong>for</strong> the generation<br />

<strong>and</strong> the superimposition <strong>of</strong> the partial maps. The sole use <strong>of</strong> one sensor, i.e. a stereo camera,<br />

<strong>and</strong> the substitution <strong>of</strong> computationally dem<strong>and</strong>ing procedures are indicative <strong>of</strong> the algorithm’s<br />

focus on computational simplicity. The experimental results that have been presented reveal the<br />

overall algorithm’s accuracy.


Chapter 5<br />

Conclusion <strong>and</strong> Future Work<br />

5.1 Conclusion<br />

This thesis had been motivated by the observation that even though stereo vision <strong>and</strong> autonomous<br />

robots are <strong>of</strong>ten used together, no special care is attributed so as each component to take into<br />

full consideration the requirements <strong>of</strong> the other. Consequently, the objective <strong>of</strong> this thesis was to<br />

develop sophisticated stereo vision systems suitable <strong>for</strong> use in autonomous robots.<br />

Towards this end, first, the relevant literature was surveyed. <strong>Stereo</strong> vision algorithms <strong>and</strong> respective<br />

robotic applications were covered. This survey pointed out the weaknesses <strong>of</strong> the available<br />

stereo correspondence algorithms, when used by robots in real working environments. The compilation<br />

<strong>of</strong> a list containing the major open issues <strong>of</strong> robotic stereo vision, as given in Section 2.4, is<br />

a first step towards confronting these issues. Up to now, little care had been given so as to develop<br />

custom stereo vision algorithms <strong>for</strong> robotic applications <strong>and</strong> simple or even simplistic approaches<br />

were <strong>of</strong>ten adopted by robotics researchers. As a result, the survey indicated that the most suitable<br />

<strong>and</strong> realistic way to proceed was to develop some new custom-made stereo correspondence<br />

algorithms.<br />

The next step after recognizing the weaknesses <strong>of</strong> the current systems was to propose new <strong>and</strong><br />

efficient ways to deal with those weaknesses. The solutions to those issues proposed within this<br />

dissertation make use <strong>of</strong> various sophisticated, interdisciplinary computational tools. There have<br />

been used biologically <strong>and</strong> psychologically inspired methods, such as the logarithmic response law<br />

(Weber-Fechner law) <strong>and</strong> the gestalt laws <strong>of</strong> perceptual organization (proximity, similarity <strong>and</strong><br />

continuity). Furthermore, there have been used sophisticated computational methods, such as 2D<br />

<strong>and</strong> 3D cellular automata <strong>and</strong> fuzzy inference systems <strong>for</strong> robotic vision applications. Additionally,<br />

ideas from the field <strong>of</strong> video coding have been incorporated in stereo vision applications. The<br />

experimental results obtained by the proposed algorithms show that efficient robotic stereo vision<br />

is achievable with the use <strong>of</strong> carefully selected computational tools.<br />

What all <strong>of</strong> the developed stereo correspondence algorithms have in common is that they embody<br />

non-iterative computational stages. The proposed algorithms varied from simple local stereo algorithms<br />

to sophisticated ASW-based ones. Many open issues detected by the literature survey were<br />

addressed by these algorithms. The goals <strong>of</strong> simple computational schemes, efficient exploitation <strong>of</strong><br />

input redundancy, tolerance to non-calibrated input, tolerance to difficult lighting conditions <strong>and</strong><br />

use <strong>of</strong> biological inspired concepts have been pursued. The proposed algorithms exhibited advan-<br />

117


118 Chapter 5. Conclusion <strong>and</strong> Future Work<br />

tages over other methods when the objective had been the use in robotic applications. Moreover,<br />

the online test-bench <strong>of</strong> (Scharstein & Szeliski 2010) has been used to compare the results <strong>of</strong> the<br />

developed algorithms to those <strong>of</strong> other ones. The algorithms listed in that site include computationally<br />

intensive <strong>and</strong> not suitable <strong>for</strong> robotic applications ones. However, the most advanced <strong>of</strong> the<br />

algorithms proposed within this thesis (Nalpantidis & Gasteratos 2010a,b) areindexedinadequate<br />

places in the web-site’s evaluation list.<br />

Finally, the knowledge gained by the development <strong>of</strong> new stereo correspondence algorithms was<br />

exploited in developing stereo vision systems <strong>for</strong> robotic applications. The depth estimations provided<br />

by stereo vision algorithms were further analyzed in order robots to safely navigate in their<br />

environments. Obstacle avoidance <strong>and</strong> SLAM applications have been developed. Once again, the<br />

developed systems focused on avoiding complex computational schemes. Instead, new, efficient<br />

methods have been proposed so as to retain the computational load low. The experimental results<br />

show that the objective, set at the beginning <strong>of</strong> this thesis, has been met. The presented systems<br />

respect the restrictions set by their host robotic plat<strong>for</strong>ms <strong>and</strong> still achieve accurate results.<br />

Using only vision sensors <strong>for</strong> robotic navigation purposes is an appealing solution. <strong>Vision</strong> provides<br />

enough in<strong>for</strong>mation which, if accurate <strong>and</strong> reliable, can diminish the need <strong>for</strong> additional sensors’<br />

input. As a result, the complexity <strong>of</strong> the system can be significantly reduced. However, the solely<br />

vision-based autonomous robotic applications dem<strong>and</strong> highly effective vision algorithms. The reliability<br />

<strong>of</strong> vision algorithms <strong>and</strong> their successful operation under difficult conditions are necessary<br />

conditions that have to be met. The success <strong>of</strong> vision-based robotic applications depends on their<br />

underlying vision algorithms.<br />

5.2 Future Work<br />

This thesis has accomplished its initially set objective. However, the course <strong>of</strong> this work has revealed<br />

various other appealing research directions. Even better <strong>and</strong> more efficient new stereo vision systems<br />

can be developed. The field <strong>of</strong> stereo vision has not reached a state <strong>of</strong> saturation over the last decades<br />

<strong>and</strong> is not expected to do so in the next few years. The knowledge gained through this thesis can<br />

provide a stable basis upon which even better results can be achieved.<br />

One possible future research direction has to do with incorporating the latest neuroscience findings<br />

in robotic vision algorithms <strong>and</strong> beyond that, in robotic vision-based inference systems. Neuroscience<br />

has made a tremendous progress during the last years but its findings are neither completely<br />

decoded nor adapted to help solving the open problems <strong>of</strong> the robotic vision community. The use <strong>of</strong><br />

the HVS’s mechanisms in robotic vision issues is a very interesting <strong>and</strong> challenging prospect both<br />

from a scientific <strong>and</strong> a technological point <strong>of</strong> view. The analog nature <strong>of</strong> brain stimuli <strong>and</strong> the vast<br />

processing power dem<strong>and</strong>ed <strong>for</strong> their processing are both available in contemporary technology.<br />

Filling the missing link between these two aspects, i.e. neuromorphic sensors <strong>and</strong> vision algorithms,<br />

requires working in both directions. The pursue <strong>of</strong> this endeavor can possibly produce results that<br />

will further advance the robotic vision field.<br />

On the other h<strong>and</strong>, using the already developed stereo vision algorithms <strong>and</strong> methods as a basis<br />

<strong>for</strong> achieving further <strong>and</strong> more advanced autonomous behaviors is another interesting research<br />

direction. More precisely, problems such as the SLAM, human-machine interaction, as well as scene<br />

analysis <strong>and</strong> underst<strong>and</strong>ing are still open to a large extent. The need <strong>for</strong> robust, autonomous<br />

capabilities <strong>of</strong> robotic assistants in defense, security <strong>and</strong> civil protection applications make applied


Chapter 5. Conclusion <strong>and</strong> Future Work 119<br />

research in this area rather interesting <strong>and</strong> appealing. Additionally, according to many people’s<br />

opinions robots are going to play an increasingly important role in many aspects <strong>of</strong> our lives. For<br />

robots to seamlessly adapt in our anthropocentric environments cognitive capabilities are required<br />

<strong>and</strong> effective vision systems play an essential role.<br />

Hardware implementation <strong>of</strong> the presented stereo vision algorithms is also an appealing research<br />

direction. The algorithms developed within this thesis focused on the simplicity <strong>of</strong> the used computational<br />

tools <strong>and</strong> adopted non-iterative schemes. These attributes make the hardware implementation<br />

<strong>of</strong> those algorithms feasible. An implementation in FPGA would provide very rapid execution<br />

times, small power consumption <strong>and</strong> would avoid the extensive usage <strong>of</strong> the PC located onboard<br />

the host robotic plat<strong>for</strong>m.<br />

To sum up, stereo vision is rapidly evolving so as to cover the dem<strong>and</strong>s posed by autonomous<br />

robots. Numerous <strong>and</strong> more reliable vision-based applications are expected to emerge as this technology<br />

matures. As a result, the axes, along which the future work on stereo vision is expected to<br />

be deployed, mainly lies on the application level.


References<br />

Agrawal, M. & Konolige, K. (2008), ‘FrameSLAM: From bundle adjustment to real-time visual<br />

mapping’, IEEE Transactions on Robotics 24(5).<br />

Agrawal, M., Konolige, K. & Bolles, R. (2007), Localization <strong>and</strong> mapping <strong>for</strong> autonomous navigation<br />

in outdoor terrains: A stereo vision approach, in ‘IEEE Workshop on Applications <strong>of</strong> Computer<br />

<strong>Vision</strong>’, Austin, Texas, USA.<br />

Amanatiadis, A., Andreadis, I. & Konstantinidis, K. (2008), ‘Design <strong>and</strong> <strong>Implementation</strong> <strong>of</strong> a Fuzzy<br />

Area-Based Image-Scaling Technique’, IEEE Transactions on Instrumentation <strong>and</strong> Measurement<br />

57(8), 1504–1513.<br />

Arias-Estrada, M. & Xicotencatl, J. M. (2001), Multiple stereo matching using an extended architecture,<br />

in ‘International Conference on Field-Programmable Logic <strong>and</strong> Applications’, Vol. 2147<br />

<strong>of</strong> Lecture Notes in Computer Science, Springer-Verlag,pp.203–212.<br />

Bailey, T. & Durrant-Whyte, H. (2006), ‘Simultaneous localization <strong>and</strong> mapping (SLAM): Part II’,<br />

IEEE Robotics & Automation Magazine 13(3), 108–117.<br />

Barnard, S. T. & Thompson, W. B. (1980), ‘Disparity analysis <strong>of</strong> images’, IEEE Transactions on<br />

Pattern Analysis <strong>and</strong> Machine Intelligence 2(4), 333–340.<br />

Bay, H., Ess, A., Tuytelaars, T. & Van Gool, L. (2008), ‘Speeded-up robust features (SURF)’,<br />

Computer <strong>Vision</strong> <strong>and</strong> Image Underst<strong>and</strong>ing 110, 346–359.<br />

Berthouze, L. & Metta, G. (2005), ‘Epigenetic robotics: modelling cognitive development in robotic<br />

systems’, Cognitive <strong>Systems</strong> Research 6(3), 189–192.<br />

Bharath, A. & Petrou, M. (2008), Next Generation Artificial <strong>Vision</strong> <strong>Systems</strong>: Reverse Engineering<br />

the Human Visual System, ArtechHouse,USA.<br />

Binaghi, E., Gallo, I., Marino, G. & Raspanti, M. (2004), ‘Neural adaptive stereo matching’, Pattern<br />

Recognition Letters 25(15), 1743–1758.<br />

Bleyer, M. & Gelautz, M. (2005), ‘A layered stereo matching algorithm using image segmentation<br />

<strong>and</strong> global visibility constraints’, ISPRS Journal <strong>of</strong> Photogrammetry <strong>and</strong> Remote Sensing<br />

59(3), 128–150.<br />

Borenstein, J. & Koren, Y. (1990), ‘Real-time obstacle avoidance <strong>for</strong> fast mobile robots in cluttered<br />

environments’, IEEE Transactions on <strong>Systems</strong>, Man, <strong>and</strong> Cybernetics 19(5), 1179–1187.<br />

121


122 References<br />

Borenstein, J. & Koren, Y. (1991), ‘The vector field histogram-fast obstacle avoidance <strong>for</strong> mobile<br />

robot’, IEEE Transactions on Robotics <strong>and</strong> Automation 7(3), 278–288.<br />

Brockers, R. (2009), Cooperative stereo matching with color-based adaptive local support, in ‘International<br />

Conference on Computer Analysis <strong>of</strong> Images <strong>and</strong> Patterns’, Springer-Verlag, Berlin,<br />

Heidelberg, pp. 1019–1027.<br />

Brockers, R., Hund, M. & Mertsching, B. (2005), <strong>Stereo</strong> vision using cost-relaxation with 3d support<br />

regions, in ‘Image <strong>and</strong> <strong>Vision</strong> Computing New Zeal<strong>and</strong>’, pp. 96–101.<br />

Chen, Z., Samarab<strong>and</strong>u, J. & Rodrigo, R. (2007), ‘Recent advances in simultaneous localization<br />

<strong>and</strong> map-buildingcusing computer vision’, Advanced Robotics 21(3), 233–265.<br />

Chonghun, R., Taehyun, H., Sungsik, K. & Jaeseok, K. (2004), Symmetrical dense disparity estimation:<br />

algorithms <strong>and</strong> fpgas implementation, in ‘IEEE International Symposium on Consumer<br />

Electronics’, pp. 452–456.<br />

Chopard, B. & Droz, M. (1998), Cellular Automata Modeling <strong>of</strong> Physical systems, Cambridge University<br />

Press.<br />

Corke, P. (2005), ‘Machine vision toolbox’, IEEE Robotics <strong>and</strong> Automation Magazine 12(4), 16–25.<br />

Darabiha, A., Maclean, J. W. & Rose, J. (2006), ‘Reconfigurable hardware implementation <strong>of</strong> a<br />

phase-correlation stereo algorithm’, Machine <strong>Vision</strong> <strong>and</strong> Applications 17(2), 116–132.<br />

Davison, A. (2007), <strong>Vision</strong>-based SLAM in real-time, in ‘Pattern Recognition <strong>and</strong> Image Analysis’,<br />

Vol. 1 <strong>of</strong> Lecture Notes in Computer Science, SpringerBerlin/Heidelberg,pp.9–12.<br />

Davison, A. J. (2003), Real-time simultaneous localisation <strong>and</strong> mapping with a single camera, in<br />

‘IEEE International Conference on Computer <strong>Vision</strong>’, Vol. 2, pp. 1403–1410.<br />

Davison, A. J. & Kita, N. (2001), 3d simultaneous localisation <strong>and</strong> map-building using active vision<br />

<strong>for</strong> a robot moving on undulating terrain, in ‘IEEE Conference on Computer <strong>Vision</strong> <strong>and</strong> Pattern<br />

Recognition’, Vol. 1, IEEE Computer Society Press, pp. 384–391.<br />

Davison, A. J., Mayol, W. W. & Murray, D. W. (2003), Real-time localisation <strong>and</strong> mapping with<br />

wearable active vision, in ‘IEEE International Symposium on Mixed <strong>and</strong> Augmented Reality’,<br />

IEEE Computer Society Press, pp. 18–27.<br />

Davison, A. & Murray, D. (2002), ‘Simultaneous localization <strong>and</strong> map-building using active vision’,<br />

IEEE Transactions on Pattern Analysis <strong>and</strong> Machine Intelligence 24(7), 865–880.<br />

De Cubber, G., Dor<strong>of</strong>tei, D., Nalpantidis, L., Sirakoulis, G. C. & Gasteratos, A. (2009), <strong>Stereo</strong>-based<br />

terrain traversability analysis <strong>for</strong> robot navigation, in ‘IARP/EURON Workshop on Robotics <strong>for</strong><br />

Risky Interventions <strong>and</strong> Environmental Surveillance’, Brussels, Belgium.<br />

De Cubber, G., Nalpantidis, L., Sirakoulis, G. C. & Gasteratos, A. (2008), Intelligent robots need<br />

intelligent vision: Visual 3d perception, in ‘IARP/EURON Workshop on Robotics <strong>for</strong> Risky<br />

Interventions <strong>and</strong> Environmental Surveillance’, Benicàssim, Spain.<br />

Di Stefano, L., Marchionni, M. & Mattoccia, S. (2004), ‘A fast area-based stereo matching algorithm’,<br />

Image <strong>and</strong> <strong>Vision</strong> Computing 22(12), 983–1005.<br />

Dissanayake, G., Newman, P., Durrant Whyte, H., Clark, S. & Csorba, M. (2001), ‘A solution to the<br />

simultaneous localisation <strong>and</strong> map building (SLAM) problem’, IEEE Transactions on Robotics<br />

<strong>and</strong> Automation 17(2), 229–241.<br />

Durrant-Whyte, H. & Bailey, T. (2006), ‘Simultaneous localisation <strong>and</strong> mapping (SLAM): Part<br />

I the essential algorithms. robotics <strong>and</strong> automation magazine’, IEEE Robotics <strong>and</strong> Automation<br />

Magazine 2, 2006.<br />

El-Etriby, S., Al-Hamadi, A. & Michaelis, B. (2006), ‘Dense depth map reconstruction by phase<br />

difference-based algorithm under influence <strong>of</strong> perspective distortion’, Machine Graphics <strong>and</strong> <strong>Vision</strong><br />

International Journal 15(3), 349–361.


References 123<br />

El-Etriby, S., Al-Hamadi, A. & Michaelis, B. (2007), Dense stereo correspondence with slanted surface<br />

using phase-based algorithm, in ‘IEEE International Symposium on Industrial Electronics’,<br />

Vigo, Spain, pp. 1807–1813.<br />

Faugeras, O. (1993), Three-dimensional computer vision: a geometric viewpoint, MIT Press, Cambridge,<br />

MA.<br />

Faugeras, O., Hotz, B., Mathieu, H., Vieville, T., Zhang, Z., Fua, P., Theron, E., Moll, L., Berry,<br />

G., Vuillemin, J., Bertin, P. & Proy, C. (1993), Real time correlation based stereo: algorithm<br />

implementations <strong>and</strong> applications, Technical Report RR-2013, INRIA.<br />

Feynman, R. (1982), ‘Simulating physics with computers’, International Journal <strong>of</strong> Theoretical<br />

Physics 21(6), 467–488.<br />

Forsyth, D. A. & Ponce, J. (2002), Computer <strong>Vision</strong>: A modern Approach, PrenticeHall,Upper<br />

Saddle River, NJ, USA.<br />

Gasteratos, A. & S<strong>and</strong>ini, G. (2002), Factors Affecting the Accuracy <strong>of</strong> an Active <strong>Vision</strong> Head, Vol.<br />

2308 <strong>of</strong> Lecture Notes in Computer Science, Springer-Verlag,Berlin-Heidelberg,pp.413–422.<br />

Georgoulas, C., Kotoulas, L., Sirakoulis, G. C., Andreadis, I. & Gasteratos, A. (2008), ‘Real-time<br />

disparity map computation module’, Journal <strong>of</strong> Microprocessors <strong>and</strong> Microsystems 32(3), 159–<br />

170.<br />

Gong, M., Yang, R., Wang, L. & Gong, M. (2007), ‘A per<strong>for</strong>mance study on different cost aggregation<br />

approaches used in real-time stereo matching’, International Journal <strong>of</strong> Computer <strong>Vision</strong><br />

75(2), 283–296.<br />

Gong, M. & Yang, Y.-H. (2005a), ‘Fast unambiguous stereo matching using reliability-based<br />

dynamic programming’, IEEE Transactions on Pattern Analysis <strong>and</strong> Machine Intelligence<br />

27(6), 998–1003.<br />

Gong, M. & Yang, Y.-H. (2005b), Near real-time reliable stereo matching using programmable<br />

graphics hardware, in ‘IEEE Computer Society Conference on Computer <strong>Vision</strong> <strong>and</strong> Pattern<br />

Recognition’, Vol. 1, pp. 924–931.<br />

Gonzalez, R. C. & Woods, R. E. (1992), Digital Image Processing, Addison-WesleyLongmanPublishing<br />

Co., Inc., Boston, MA, USA. 573607.<br />

Gu, Z., Su, X., Liu, Y. & Zhang, Q. (2008), ‘Local stereo matching with adaptive support-weight,<br />

rank trans<strong>for</strong>m <strong>and</strong> disparity calibration’, Pattern Recognition Letters 29(9), 1230–1235.<br />

Guivant, J. & Nebot, E. (2001), ‘Optimization <strong>of</strong> the simultaneous localization <strong>and</strong> map building<br />

algorithm <strong>for</strong> real time implementation’, IEEE Transactions on Robotics <strong>and</strong> Automation<br />

17(3), 242–257.<br />

Gutierrez, S. & Marroquin, J. L. (2004), ‘Robust approach <strong>for</strong> disparity estimation in stereo vision’,<br />

Image <strong>and</strong> <strong>Vision</strong> Computing 22(3), 183–195.<br />

Gutmann, J.-S., Fukuchi, M. & Fujita, M. (2005), A floor <strong>and</strong> obstacle height map <strong>for</strong> 3d navigation<br />

<strong>of</strong> a humanoid robot, in ‘IEEE International Conference on Robotics <strong>and</strong> Automation’, pp. 1066<br />

–1071.<br />

Hariyama, M., Kobayashi, Y., Sasaki, H. & Kameyama, M. (2005), ‘Fpga implementation <strong>of</strong> a stereo<br />

matching processor based on window-parallel-<strong>and</strong>-pixel-parallel architecture’, IEICE Transactions<br />

on Fundamentals <strong>of</strong> Electronics, Communications <strong>and</strong> Computer Science 88(12), 3516–<br />

3522.<br />

Hariyama, M., Sasaki, H. & Kameyama, M. (2005), ‘Architecture <strong>of</strong> a stereo matching vlsi processor<br />

based on hierarchically parallel memory access’, IEICE Transactions on In<strong>for</strong>mation <strong>and</strong> <strong>Systems</strong><br />

E88-D(7), 1486–1491.


124 References<br />

Hariyama, M., Takeuchi, T. & Kameyama, M. (2000), Reliable stereo matching <strong>for</strong> highly-safe intelligent<br />

vehicles <strong>and</strong> its vlsi implementation, in ‘IEEE Intelligent Vehicles Symposium’, pp. 128–133.<br />

Hartley, R. & Zisserman, A. (2004), Multiple View Geometry in Computer <strong>Vision</strong>, secondedn,<br />

Cambridge University Press.<br />

Hirschmuller, H. (2005), Accurate <strong>and</strong> efficient stereo processing by semi-global matching <strong>and</strong> mutual<br />

in<strong>for</strong>mation, in ‘IEEE Computer Society Conference on Computer <strong>Vision</strong> <strong>and</strong> Pattern Recognition’,<br />

Vol. 2, pp. 807–814.<br />

Hirschmuller, H. (2006), <strong>Stereo</strong> vision in structured environments by consistent semi-global matching,<br />

in ‘IEEE Computer Society Conference on Computer <strong>Vision</strong> <strong>and</strong> Pattern Recognition’, Vol. 2,<br />

pp. 2386– 2393.<br />

Hirschmuller, H. & Scharstein, D. (2007), Evaluation <strong>of</strong> cost functions <strong>for</strong> stereo matching, in<br />

‘IEEE Computer Society Conference on Computer <strong>Vision</strong> <strong>and</strong> Pattern Recognition’, Minneapolis,<br />

Minnesota, USA.<br />

Hogue, A., German, A. & Jenkin, M. (2007), Underwater environment reconstruction using stereo<br />

<strong>and</strong> inertial data, in ‘IEEE International Conference on <strong>Systems</strong>, Man <strong>and</strong> Cybernetics’, Montreal,<br />

Canada, pp. 2372–2377.<br />

Holmes, S. A., Klein, G. & Murray, D. W. (2009), ‘An O(N 2 ) square root unscented kalman filter<br />

<strong>for</strong> visual simultaneous localization <strong>and</strong> mapping’, IEEE Transactions on Pattern Analysis <strong>and</strong><br />

Machine Intelligence 31(7), 1251–1263.<br />

Hong, L. & Chen, G. (2004), Segment-based stereo matching using graph cuts, in ‘IEEE Conference<br />

on Computer <strong>Vision</strong> <strong>and</strong> Pattern Recognition’, Vol. 1, pp. 74–81.<br />

Hosni, A., Bleyer, M., Gelautz, M. & Rhemann, C. (2009), Local stereo matching using geodesic<br />

support weights, in ‘IEEE International Conference on Image Processing’, pp. 2093–2096.<br />

Hua, X., Yokomichi, M. & Kono, M. (2005), <strong>Stereo</strong> correspondence using color based on competitivecooperative<br />

neural networks, in ‘International Conference on Parallel <strong>and</strong> Distributed Computing<br />

Applications <strong>and</strong> Technologies’, Dalian, China, pp. 856–860.<br />

Huang, S., Wang, Z. & Dissanayake, G. (2008), ‘Sparse local submap joining filter <strong>for</strong> building<br />

large-scale maps’, IEEE Transactions on Robotics 24(5), 1121–1130.<br />

Huang, X. & Dubois, E. (2004), Dense disparity estimation based on the continuous wavelet trans<strong>for</strong>m,<br />

in ‘Canadian Conference on Electrical <strong>and</strong> Computer Engineering’, Vol. 1, pp. 465–468.<br />

Iocchi, L. & Konolige, K. (1998), A multiresolution stereo vision system <strong>for</strong> mobile robots, in ‘Italian<br />

AI Association Workshop on New Trends in Robotics Research’.<br />

Jain, R., Kasturi, R. & Schunck, B. G. (1995), Machine vision, McGraw-Hill, New York, USA.<br />

Jeong, H. & Park, S. (2004), Generalized trellis stereo matching with systolic array, in ‘International<br />

Symposium on Parallel <strong>and</strong> Distributed Processing <strong>and</strong> Applications’, Vol. 3358, Springer Verlag,<br />

pp. 263–267.<br />

Jia, Y., Xu, Y., Liu, W., Yang, C., Zhu, Y., Zhang, X. & An, L. (2003), A miniature stereo vision<br />

machine <strong>for</strong> real-time dense depth mapping, in ‘International Conference on Computer <strong>Vision</strong><br />

<strong>Systems</strong>’, Vol. 2626 <strong>of</strong> Lecture Notes in Computer Science, pp.268–277.<br />

Jobson, D. J., ur Rahman, Z. & Woodell, G. A. (1997), ‘A multiscale retinex <strong>for</strong> bridging the<br />

gap between color images <strong>and</strong> the human observation <strong>of</strong> scenes’, IEEE Transactions on Image<br />

Processing 6(7), 965–976.<br />

Jung, H. (1994), ‘Visual navigation <strong>for</strong> a mobile robot using l<strong>and</strong>marks’, Advanced Robotics<br />

9(4), 429–442.


References 125<br />

Kalomiros, J. A. & Lygouras, J. (2008), ‘Hardware implementation <strong>of</strong> a stereo co-processor in a<br />

medium-scale field programmable gate array’, IET Computers <strong>and</strong> Digital Techniques 2(5), 336<br />

–346.<br />

Kalomiros, J. & Lygouras, J. (2009), ‘Comparative study <strong>of</strong> local sad <strong>and</strong> dynamic programming <strong>for</strong><br />

stereo processing using dedicated hardware’, EURASIP Journal on Advances in Signal Processing<br />

2009, 1–18.<br />

Kelly, A. & Stentz, A. (1998), <strong>Stereo</strong> vision enhancements <strong>for</strong> low-cost outdoor autonomous vehicles,<br />

in ‘International Conference on Robotics <strong>and</strong> Automation, Workshop WS-7, Navigation <strong>of</strong><br />

Outdoor Autonomous Vehicles’.<br />

Khatib, O. (1996), ‘Motion coordination <strong>and</strong> reactive control <strong>of</strong> autonomous multi-manipulator<br />

system’, Journal <strong>of</strong> Robotic <strong>Systems</strong> 15(4), 300–319.<br />

Khatib, O. (1999), ‘Robot in human environments: basic autonomous capabilities’, International<br />

Journal <strong>of</strong> Robotics Research 18(7), 684–696.<br />

Kim, H. & Sohn, K. (2005), ‘3d reconstruction from stereo images <strong>for</strong> interactions between real <strong>and</strong><br />

virtual objects’, Signal Processing: Image Communication 20(1), 61–75.<br />

Kim, J. C., Lee, K. M., Choi, B. T. & Lee, S. U. (2005), A dense stereo matching using two-pass<br />

dynamic programming with generalized ground control points, in ‘IEEE Conference on Computer<br />

<strong>Vision</strong> <strong>and</strong> Pattern Recognition’, Vol. 2, pp. 1075–1082.<br />

Klancar, G., Kristan, M. & Karba, R. (2004), ‘Wide-angle camera distortions <strong>and</strong> non-uni<strong>for</strong>m<br />

illumination in mobile robot tracking’, Journal <strong>of</strong> Robotics <strong>and</strong> Autonomous <strong>Systems</strong> 46, 125–<br />

133.<br />

Klaus, A., Sormann, M. & Karner, K. (2006), Segment-based stereo matching using belief propagation<br />

<strong>and</strong> a self-adapting dissimilarity measure, in ‘18th International Conference on Pattern<br />

Recognition’, Vol. 3, Hong Kong, China, pp. 15–18.<br />

Klippenstein, J. & Zhang, H. (2007), Quantitative evaluation <strong>of</strong> feature extractors <strong>for</strong> visual SLAM,<br />

in ‘Fourth Canadian Conference on Computer <strong>and</strong> Robot <strong>Vision</strong>’, pp. 157–164.<br />

Kohler, W. (1969), The task <strong>of</strong> Gestalt psychology, PrincetonUniversityPress,Princeton,N.J.<br />

Konolige, K., Agrawal, M., Bolles, R. C., Cowan, C., Fischler, M. & Gerkey, B. P. (2006), Outdoor<br />

mapping <strong>and</strong> navigation using stereo vision, in ‘International Symposium on Experimental<br />

Robotics’, Vol. 39, Springer, Brazil, pp. 179–190.<br />

Kotoulas, L., Gasteratos, A., Sirakoulis, G. C., Georgoulas, C. & Andreadis, I. (2005), Enhancement<br />

<strong>of</strong> fast acquired disparity maps using a 1-d cellular automation filter, in ‘IASTED International<br />

Conference on Visualization, Imaging <strong>and</strong> Image Processing’, Benidorm, Spain, pp. 355–359.<br />

Kotoulas, L., Georgoulas, C., Gasteratos, A., Sirakoulis, G. C. & Andreadis, I. (2005), A novel<br />

three stage technique <strong>for</strong> accurate disparity maps, in ‘EOS Conference on Industrial Imaging <strong>and</strong><br />

Machine <strong>Vision</strong>’, Munich, Germany, pp. 13–14.<br />

Kunchev, V., Jain, L., Ivancevic, V. & Finn, A. (2006), Path planning <strong>and</strong> obstacle avoidance<br />

<strong>for</strong> autonomous mobile robots: A review, in ‘International Conference on Knowledge-Based <strong>and</strong><br />

Intelligent In<strong>for</strong>mation <strong>and</strong> Engineering <strong>Systems</strong>’, Vol. 4252 <strong>of</strong> LNCS, Springer-Verlag,pp.537–<br />

544.<br />

Kyung Hyun, C., Minh Ngoc, N. & M. Asif Ali, R. (2008), ‘A real time collision avoidance algorithm<br />

<strong>for</strong> mobile robot based on elastic <strong>for</strong>ce’, International Journal <strong>of</strong> Mechanical, Industrial <strong>and</strong><br />

Aerospace Engineering 2(4), 230–233.<br />

Labayrade, R., Aubert, D. & Tarel, J.-P. (2002), Real time obstacle detection in stereovision on non<br />

flat road geometry through V-disparity representation, in ‘IEEE Intelligent Vehicle Symposium’,<br />

Vol. 2, Versailles, France, pp. 646–651.


126 References<br />

Lee, S., Yi, J. & Kim, J. (2005), Real-time stereo vision on a reconfigurable system, in ‘International<br />

Workshop on Embedded Computer <strong>Systems</strong>: Architectures, Modeling, <strong>and</strong> Simulation’, Vol. 3553<br />

<strong>of</strong> Lecture Notes in Computer Science, Springer,pp.299–307.<br />

Lei, C., Selzer, J. & Yang, Y.-H. (2006), ‘Region-tree based stereo using dynamic programming<br />

optimization’, Computer <strong>Vision</strong> <strong>and</strong> Pattern Recognition, IEEE Computer Society Conference<br />

on 2, 2378–2385.<br />

Lemaire, T., Berger, C., Jung, I. & Lacroix, S. (2007), ‘<strong>Vision</strong>-based SLAM: <strong>Stereo</strong> <strong>and</strong> monocular<br />

approaches’, International Journal <strong>of</strong> Computer <strong>Vision</strong> 74(3), 343–364.<br />

Liu, C., Pei, W., Niyokindi, S., Song, J. & Wang, L. (2006), ‘Micro stereo matching based on wavelet<br />

trans<strong>for</strong>m <strong>and</strong> projective invariance’, Measurement Science <strong>and</strong> Technology 17(3), 565–571.<br />

Lowe, D. G. (2004), ‘Distinctive image features from scale-invariant keypoints’, International Journal<br />

<strong>of</strong> Computer <strong>Vision</strong> 60(2), 91–110.<br />

Maimone, M. W. & Shafer, S. A. (1996), A taxonomy <strong>for</strong> stereo computer vision experiments, in<br />

‘ECCV Workshop on Per<strong>for</strong>mance Characteristics <strong>of</strong> <strong>Vision</strong> Algorithms’, pp. 59–79.<br />

Manz, A., Liscano, R. & Green, D. (1993), A comparison <strong>of</strong> realtime obstacle avoidance methods<br />

<strong>for</strong> mobile robots, in ‘Experimental Robotics II’, Springer-Verlag, pp. 299–316.<br />

Manzotti, R., Gasteratos, A., Metta, G. & S<strong>and</strong>ini, G. (2001), ‘Disparity estimation on log-polar<br />

images <strong>and</strong> vergence control’, Computer <strong>Vision</strong> <strong>and</strong> Image Underst<strong>and</strong>ing 83(2), 97–117.<br />

Mardiris, V., Sirakoulis, G. C., Mizas, C., Karafyllidis, I. & Thanailakis, A. (2008), ‘A cad system<br />

<strong>for</strong> modeling <strong>and</strong> simulation <strong>of</strong> computer networks using cellular automata’, IEEE Transactions<br />

on <strong>Systems</strong>, Man, <strong>and</strong> Cybernetics, Part C 38(2), 253–264.<br />

Marr, D. & Poggio, T. (1976), ‘Cooperative computation <strong>of</strong> stereo disparity’, Science<br />

194(4262), 283–287.<br />

Masrani, D. K. & MacLean, W. J. (2006), A real-time large disparity range stereo-system using<br />

fpgas, in ‘IEEE International Conference on Computer <strong>Vision</strong> <strong>Systems</strong>’, Vol. 3852, pp. 13–20.<br />

Mayoral, R., Lera, G. & Perez-Ilzarbe, M. J. (2006), ‘Evaluation <strong>of</strong> correspondence errors <strong>for</strong> stereo’,<br />

Image <strong>and</strong> <strong>Vision</strong> Computing 24(12), 1288–1300.<br />

Mead, C. (1990), ‘Neuromorphic electronic systems’, Procceedings <strong>of</strong> the IEEE 78(10), 1629–1636.<br />

Mei, C., Sibley, G., Cummins, M., Newman, P. & Reid, I. (2009), A constant time efficient stereo<br />

SLAM system, in ‘British Machine <strong>Vision</strong> Conference’.<br />

Metta, G., Gasteratos, A. & S<strong>and</strong>ini, G. (2004), ‘Learning to track colored objects with log-polar<br />

vision’, Mechatronics 14(9), 989–1006.<br />

Mingxiang, L. & Yunde, J. (2006), ‘Trinocular cooperative stereo vision <strong>and</strong> occlusion detection’,<br />

IEEE International Conference on Robotics <strong>and</strong> Biomimetics pp. 1129–1133.<br />

Miyajima, Y. & Maruyama, T. (2003), A real-time stereo vision system with fpga, in ‘International<br />

Conference on Field-Programmable Logic <strong>and</strong> Applications’, Vol. 2778 <strong>of</strong> Lecture Notes in<br />

Computer Science, Springer,pp.448–457.<br />

Montemerlo, M. (2003), FastSLAM: A Factored Solution to the Simultaneous Localization <strong>and</strong><br />

Mapping Problem with Unknown Data Association, PhD thesis, Robotics Institute, Carnegie<br />

Mellon University, Pittsburgh, PA.<br />

Montemerlo, M. & Thrun, S. (2007), FastSLAM: A Scalable Method <strong>for</strong> the Simultaneous Localization<br />

<strong>and</strong> Mapping Problem in Robotics, Springer.<br />

Moravec, P. (1987), Certainty grids <strong>for</strong> mobile robots, in ‘NASA/JPL Space Telerobotics Workshop’,<br />

Vol. 3, pp. 307–312.<br />

Mordohai, P. & Medioni, G. G. (2006), ‘<strong>Stereo</strong> using monocular cues within the tensor voting<br />

framework’, IEEE Transactions on Pattern Analysis <strong>and</strong> Machine Intelligence 28(6), 968–982.


References 127<br />

Moreno, F., Blanco, J. & Gonzalez, J. (2009), ‘<strong>Stereo</strong> vision specific models <strong>for</strong> particle filter-based<br />

SLAM’, Robotics <strong>and</strong> Autonomous <strong>Systems</strong> 57(9), 955 – 970.<br />

Muhlmann, K., Maier, D., Hesser, J. & Manner, R. (2002), ‘Calculating dense disparity maps<br />

from color stereo images, an efficient implementation’, International Journal <strong>of</strong> Computer <strong>Vision</strong><br />

47(1-3), 79–88.<br />

Murray, D. & Jennings, C. (1997), <strong>Stereo</strong> vision based mapping <strong>and</strong> navigation <strong>for</strong> mobile robots,<br />

in ‘IEEE International Conference on Robotics <strong>and</strong> Automation’, Vol. 2, pp. 1694–1699.<br />

Murray, D. & Little, J. J. (2000), ‘Using real-time stereo vision <strong>for</strong> mobile robot navigation’, Autonomous<br />

Robots 8(2), 161–171.<br />

Nalpantidis, L. & Gasteratos, A. (2010a), ‘Biologically <strong>and</strong> psychophysically inspired adaptive support<br />

weights algorithm <strong>for</strong> stereo correspondence’, Robotics <strong>and</strong> Autonomous <strong>Systems</strong> 58, 457–<br />

464.<br />

Nalpantidis, L. & Gasteratos, A. (2010b), ‘<strong>Stereo</strong> vision <strong>for</strong> robotic applications in the presence <strong>of</strong><br />

non-ideal lighting conditions’, Image <strong>and</strong> <strong>Vision</strong> Computing 28, 940–951.<br />

Nalpantidis, L. & Kostavelis, I. (2009), ‘http://robotics.pme.duth.gr/reposit/stereoroutes.zip’.<br />

Group <strong>of</strong> Robotics <strong>and</strong> Cognitive <strong>Systems</strong>.<br />

Nalpantidis, L., Kostavelis, I. & Gasteratos, A. (2009), <strong>Stereo</strong>vision-based algorithm <strong>for</strong> obstacle<br />

avoidance, in ‘International Conference on Intelligent Robotics <strong>and</strong> Applications’, Vol. 5928 <strong>of</strong><br />

Lecture Notes in Computer Science, Springer-Verlag,Singapore,pp.195–204.<br />

Nalpantidis, L., Sirakoulis, G. C. & Gasteratos, A. (2007), Review <strong>of</strong> stereo matching algorithms <strong>for</strong><br />

3d vision, in ‘16th International Symposium on Measurement <strong>and</strong> Control in Robotics’, Warsaw,<br />

Pol<strong>and</strong>, pp. 116–124.<br />

Nalpantidis, L., Sirakoulis, G. C. & Gasteratos, A. (2008a), A dense stereo correspondence algorithm<br />

<strong>for</strong> hardware implementation with enhanced disparity selection, in ‘5th Hellenic conference on<br />

Artificial Intelligence’, Vol. 5138 <strong>of</strong> Lecture Notes in Computer Science, Springer-Verlag,Syros,<br />

Greece, pp. 365–370.<br />

Nalpantidis, L., Sirakoulis, G. C. & Gasteratos, A. (2008b), ‘Review <strong>of</strong> stereo vision algorithms:<br />

from s<strong>of</strong>tware to hardware’, International Journal <strong>of</strong> Optomechatronics 2(4), 435–462.<br />

Nister, D., Naroditsky, O. & Bergen, J. R. (2006), ‘Visual odometry <strong>for</strong> ground vehicle applications’,<br />

Journal <strong>of</strong> Field Robotics 23(1), 3–20.<br />

Ogale, A. S. (2009), ‘http://www.cs.umd.edu/users/ogale/download/code.html’.<br />

Ogale, A. S. & Aloimonos, Y. (2005a), Robust contrast invariant stereo correspondence, in ‘IEEE<br />

International Conference on Robotics <strong>and</strong> Automation’, pp. 819–824.<br />

Ogale, A. S. & Aloimonos, Y. (2005b), ‘Shape <strong>and</strong> the stereo correspondence problem’, International<br />

Journal <strong>of</strong> Computer <strong>Vision</strong> 65(3), 147–162.<br />

Ogale, A. S. & Aloimonos, Y. (2007), ‘A roadmap to the integration <strong>of</strong> early visual modules’,<br />

International Journal <strong>of</strong> Computer <strong>Vision</strong> 72(1), 9–25.<br />

Ohya, A., Kosaka, A. & Kak, A. (1998), ‘<strong>Vision</strong>-based navigation <strong>of</strong> mobile robot with obstacle<br />

avoidance by single camera vision <strong>and</strong> ultrasonic sensing’, IEEE Transactions on Robotics <strong>and</strong><br />

Automation 14(6), 969–978.<br />

Park, S. & Jeong, H. (2007), Real-time stereo vision fpga chip with low error rate, in ‘International<br />

Conference on Multimedia <strong>and</strong> Ubiquitous Engineering’, pp. 751–756.<br />

Pinoli, J. C. & Debayle, J. (2007), ‘Logarithmic adaptive neighborhood image processing (LANIP):<br />

Introduction, connections to human brightness perception, <strong>and</strong> application issues’, EURASIP<br />

Journal on Advances in Signal Processing 2007(1), 114–135.


128 References<br />

Reignier, P. (1994), ‘Fuzzy logic techniques <strong>for</strong> mobile robot obstacle avoidance’, Robotics <strong>and</strong><br />

Autonomous <strong>Systems</strong> 12(3-4), 143–153.<br />

Ruigang, Y., Welch, G. & Bishop, G. (2002), ‘Real-time consensus-based scene reconstruction using<br />

commodity graphics hardware’, 10th Pacific Conference on Computer Graphics <strong>and</strong> Applications<br />

pp. 225–234.<br />

Russell, R. A., Taylor, G., Kleeman, L. & Purnamadjaja, A. H. (2004), ‘Multi-sensory synergies in<br />

humanoid robotics’, International Journal <strong>of</strong> Humanoid Robotics 1(2), 289–314.<br />

Sabe, K., Fukuchi, M., Gutmann, J.-S., Ohashi, T., Kawamoto, K. & Yoshigahara, T. (2004), Obstacle<br />

avoidance <strong>and</strong> path planning <strong>for</strong> humanoid robots using stereo vision, in ‘IEEE International<br />

Conference on Robotics <strong>and</strong> Automation’, Vol. 1, pp. 592 – 597.<br />

Salmen, J., Schlipsing, M., Edelbrunner, J., Hegemann, S. & Luke, S. (2009), Real-time stereo<br />

vision: Making more out <strong>of</strong> dynamic programming, in ‘International Conference on Computer<br />

Analysis <strong>of</strong> Images <strong>and</strong> Patterns’, pp. 1096–1103.<br />

Santini, F., Nambisan, R. & Rucci, M. (2009), ‘Active 3d vision through gaze relocation in a<br />

humanoid robot’, International Journal <strong>of</strong> Humanoid Robotics 6(3), 481–503.<br />

Scharstein, D. & Pal, C. (2007), Learning conditional r<strong>and</strong>om fields <strong>for</strong> stereo, in ‘IEEE Conference<br />

on Computer <strong>Vision</strong> <strong>and</strong> Pattern Recognition’, pp. 1–8.<br />

Scharstein, D. & Szeliski, R. (2002), ‘A taxonomy <strong>and</strong> evaluation <strong>of</strong> dense two-frame stereo correspondence<br />

algorithms’, International Journal <strong>of</strong> Computer <strong>Vision</strong> 47(1-3), 7–42.<br />

Scharstein, D. & Szeliski, R. (2003), High-accuracy stereo depth maps using structured light, in<br />

‘IEEE Computer Society Conference on Computer <strong>Vision</strong> <strong>and</strong> Pattern Recognition’, Vol. 1,<br />

pp. 195–202.<br />

Scharstein, D. & Szeliski, R. (2010), ‘http://vision.middlebury.edu/stereo/’.<br />

Schirmacher, H., Li, M. & Seidel, H.-P. (2001), On-the-fly processing <strong>of</strong> generalized lumigraphs, in<br />

‘EUROGRAPHICS’, pp. 165–173.<br />

Scholl, B. J. (2001), ‘Objects <strong>and</strong> attention: the state <strong>of</strong> the art.’, Cognition 80(1-2), 1–46.<br />

Schreer, O. (1998), <strong>Stereo</strong> vision-based navigation in unknown indoor environment, in ‘5th European<br />

Conference on Computer <strong>Vision</strong>’, Vol. 1, pp. 203–217.<br />

Shimonomura, K., Kushima, T. & Yagi, T. (2008), ‘Binocular robot vision emulating disparity<br />

computation in the primary visual cortex’, Neural Networks 21(2-3), 331–340.<br />

Siciliano, B., Sciavicco, L., Villani, L. & Oriolo, G. (2008), Robotics: Modelling, Planning <strong>and</strong><br />

Control, Springer Publishing Company, Incorporated.<br />

Siegwart, R. & Nourbakhsh, I. R. (2004), Introduction to Autonomous Mobile Robots, MIT Press,<br />

Massachusetts.<br />

Sim, R., Elinas, P. & Little, J. (2007), ‘A study <strong>of</strong> the rao-blackwellised particle filter <strong>for</strong> efficient<br />

<strong>and</strong> accurate vision-based SLAM’, International Journal <strong>of</strong> Computer <strong>Vision</strong> 74(3), 303–318.<br />

Sim, R. & Little, J. J. (2009), ‘Autonomous vision-based robotic exploration <strong>and</strong> mapping using<br />

hybrid maps <strong>and</strong> particle filters’, Image <strong>and</strong> <strong>Vision</strong> Computing 27(1-2), 167 – 177. Canadian<br />

Robotic <strong>Vision</strong> 2005 <strong>and</strong> 2006.<br />

Sirakoulis, G. C., Karafyllidis, I. & Thanailakis, A. (2003), ‘A cad system <strong>for</strong> the construction <strong>and</strong><br />

vlsi implementation <strong>of</strong> cellular automata algorithms using vhdl’, Microprocessors <strong>and</strong> Microsystems<br />

27(8), 381–396.<br />

Soquet, N., Aubert, D. & Hautiere, N. (2007), Road segmentation supervised by an extended<br />

V-disparity algorithm <strong>for</strong> autonomous navigation, in ‘IEEE Intelligent Vehicles Symposium’,<br />

Istanbul, Turkey, pp. 160–165.


References 129<br />

Stentz, A., Fox, D. & Montemerlo, M. (2003), FastSLAM: A factored solution to the simultaneous<br />

localization <strong>and</strong> mapping problem with unknown data association, in ‘AAAI National Conference<br />

on Artificial Intelligence’, AAAI, pp. 593–598.<br />

Sternberg, R. J. (2002), Cognitive Psychology, WadsworthPublishing.<br />

Strecha, C., Fransens, R. & Van Gool, L. J. (2006), Combined depth <strong>and</strong> outlier estimation in multiview<br />

stereo, in ‘IEEE Computer Society Conference on Computer <strong>Vision</strong> <strong>and</strong> Pattern Recognition’,<br />

Vol. 2, pp. 2394–2401.<br />

Sun, J., Li, Y., Kang, S. B. & Shum, H.-Y. (2005), Symmetric stereo matching <strong>for</strong> occlusion h<strong>and</strong>ling,<br />

in ‘IEEE Computer Society Conference on Computer <strong>Vision</strong> <strong>and</strong> Pattern Recognition’, Vol. 2,<br />

pp. 399–406.<br />

Sunyoto, H., van der Mark, W. & Gavrila, D. M. (2004), A comparative study <strong>of</strong> fast dense stereo<br />

vision algorithms, in ‘IEEE Intelligent Vehicles Symposium’, pp. 319–324.<br />

Thevenaz, P., Blu, T. & Unser, M. (2000), ‘Interpolation revisited’, IEEE Transactions on Medical<br />

Imaging 19(7), 739–758.<br />

Torra, P. H. S. & Criminisi, A. (2004), ‘Dense stereo using pivoted dynamic programming’, Image<br />

<strong>and</strong> <strong>Vision</strong> Computing 22(10), 795–806.<br />

Ulam, S. (1952), R<strong>and</strong>om processes <strong>and</strong> trans<strong>for</strong>mations, in ‘International Congress on Mathematics’,<br />

Vol. 2, Cambridge, USA, pp. 264–275.<br />

V<strong>and</strong>orpe, J., Van Brussel, H. & Xu, H. (1996), Exact dynamic map building <strong>for</strong> a mobile robot<br />

using geometrical primitives produced by a 2d range finder, in ‘IEEE International Conference<br />

on Robotics <strong>and</strong> Automation’, Minneapolis, USA, pp. 901–908.<br />

Veksler, O. (2002), ‘Dense features <strong>for</strong> semi-dense stereo correspondence’, International Journal <strong>of</strong><br />

Computer <strong>Vision</strong> 47(1-3), 247–260.<br />

Veksler, O. (2003), Extracting dense features <strong>for</strong> visual correspondence with graph cuts, in ‘IEEE<br />

Computer <strong>Vision</strong> <strong>and</strong> Pattern Recognition’, Vol. 1, pp. 689–694.<br />

Veksler, O. (2005), <strong>Stereo</strong> correspondence by dynamic programming on a tree, in ‘IEEE Computer<br />

Society Conference on Computer <strong>Vision</strong> <strong>and</strong> Pattern Recognition’, Vol. 2, pp. 384–390.<br />

Veksler, O. (2006), Reducing search space <strong>for</strong> stereo correspondence with graph cuts, in ‘British<br />

Machine <strong>Vision</strong> Conference’, Vol. 2, pp. 709–718.<br />

Von Neumann, J. (1966), Theory <strong>of</strong> Self-Reproducing Automata, University <strong>of</strong> Illinois Press, Urbana,<br />

Illinois.<br />

Vonikakis, V. (2009), ‘http://electronics.ee.duth.gr/vonikakis.htm’.<br />

Vonikakis, V., Andreadis, I. & Gasteratos, A. (2008), ‘Fast centre-surround contrast modification’,<br />

IET Image Processing 2(1), 19–34.<br />

Wang, L., Liao, M., Gong, M., Yang, R. & Nister, D. (2006), High-quality real-time stereo using<br />

adaptive cost aggregation <strong>and</strong> dynamic programming, in ‘Third International Symposium on 3D<br />

Data Processing, Visualization, <strong>and</strong> Transmission’, pp. 798–805.<br />

Wheatstone, C. (1838), ‘Contributions to the physiology <strong>of</strong> vision—part the first. on some remarkable,<br />

<strong>and</strong> hitherto unobserved, phenomena <strong>of</strong> binocular vision’, Philosophical Transactions <strong>of</strong> the<br />

Royal Society <strong>of</strong> London pp. 371–394.<br />

Wieg<strong>and</strong>, T., Sullivan, G., Bjntegaard, G. & Luthra, A. (2003), ‘Overview <strong>of</strong> the H.264/AVC video<br />

coding st<strong>and</strong>ard’, IEEE Transactions on Circuits <strong>and</strong> <strong>Systems</strong> <strong>for</strong> Video Technology 13(7), 560–<br />

576.<br />

Wilburn, B., Smulski, M., Lee, K. & Horowitz, M. A. (2002), The light field video camera, in ‘Media<br />

Processors’, pp. 29–36.<br />

Wolfram, S. (1986), Theory <strong>and</strong> Applications <strong>of</strong> Cellular Automata, WorldScientific,Singapore.


130 References<br />

Yang, J. C., Everett, M., Buehler, C. & Mcmillan, L. (2002), A real-time distributed light field<br />

camera, in ‘Eurographics Workshop on Rendering’, pp. 77–86.<br />

Yang, Q., Wang, L. & Ahuja, N. (2010), A constant-space belief propagation algorithm <strong>for</strong> stereo<br />

matching, in ‘IEEE Conference on Computer <strong>Vision</strong> <strong>and</strong> Pattern Recognition’.<br />

Yang, Q., Wang, L. & Yang, R. (2006), Real-time global stereo matching using hierarchical belief<br />

propagation, in ‘British Machine <strong>Vision</strong> Conference’, Vol. 3, pp. 989–998.<br />

Yang, Q., Wang, L., Yang, R., Stewenius, H. & Nister, D. (2009), ‘<strong>Stereo</strong> matching with colorweighted<br />

correlation, hierarchical belief propagation <strong>and</strong> occlusion h<strong>and</strong>ling’, IEEE Transactions<br />

on Pattern Analysis <strong>and</strong> Machine Intelligence 31(3), 492–504.<br />

Yi, J., Kim, J., Li, L., Morris, J., Lee, G. & Leclercq, P. (2004), Real-time three dimensional vision,<br />

in ‘Asia-Pacific Conference on Advances in Computer <strong>Systems</strong> Architecture’, Vol. 3189 <strong>of</strong> Lecture<br />

Notes in Computer Science, Springer,pp.309–320.<br />

Yin, P., Tourapis, H., Tourapis, A. & Boyce, J. (2003), Fast mode decision <strong>and</strong> motion estimation<br />

<strong>for</strong> JVT/H.264, in ‘IEEE International Conference on Image Processing’, Vol. 3, pp. 853–856.<br />

Yoon, K.-J. & Kweon, I. S. (2006a), ‘Adaptive support-weight approach <strong>for</strong> correspondence search’,<br />

IEEE Transactions on Pattern Analysis <strong>and</strong> Machine Intelligence 28(4), 650–656.<br />

Yoon, K.-J. & Kweon, I. S. (2006b), Correspondence search in the presence <strong>of</strong> specular highlights<br />

using specular-free two-b<strong>and</strong> images, in ‘7th Asian Conference on Computer <strong>Vision</strong>’, Vol. 3852,<br />

Springer, Hyderabad, India, pp. 761–770.<br />

Yoon, K.-J. & Kweon, I. S. (2006c), <strong>Stereo</strong> matching with symmetric cost functions, in ‘IEEE<br />

Computer Society Conference on Computer <strong>Vision</strong> <strong>and</strong> Pattern Recognition’, Vol. 2, pp. 2371–<br />

2377.<br />

Yoon, S., Park, S.-K., Kang, S. & Kwak, Y. K. (2005), ‘Fast correlation-based stereo matching with<br />

the reduction <strong>of</strong> systematic errors’, Pattern Recognition Letters 26(14), 2221–2231.<br />

Yu, T., Lin, R.-S., Super, B. & Tang, B. (2007), ‘Efficient message representations <strong>for</strong> belief propagation’,<br />

IEEE International Conference on Computer <strong>Vision</strong> pp. 1–8.<br />

Zach, C., Karner, K. & Bisch<strong>of</strong>, H. (2004), Hierarchical disparity estimation with programmable 3d<br />

hardware, in ‘International Conference in Central Europe on Computer Graphics, Visualization<br />

<strong>and</strong> Computer <strong>Vision</strong>’, pp. 275–282.<br />

Zhao, J., Katupitiya, J. & Ward, J. (2007), Global correlation based ground plane estimation using<br />

V-disparity image, in ‘IEEE International Conference on Robotics <strong>and</strong> Automation’, Rome, Italy,<br />

pp. 529–534.<br />

Zhu, Z., Oskiper, T., Samarasekera, S., Kumar, R. & Sawhney, H. S. (2007), ‘Ten-fold improvement<br />

in visual odometry using l<strong>and</strong>mark matching’, IEEE International Conference on Computer<br />

<strong>Vision</strong> pp. 1–8.<br />

Zitnick, C. L. & Kanade, T. (2000), ‘A cooperative algorithm <strong>for</strong> stereo matching <strong>and</strong> occlusion<br />

detection’, IEEE Transactions on Pattern Analysis <strong>and</strong> Machine Intelligence 22(7), 675–684.<br />

Zitnick, C. L. & Kang, S. (2007), ‘<strong>Stereo</strong> <strong>for</strong> image-based rendering using image over-segmentation’,<br />

International Journal <strong>of</strong> Computer <strong>Vision</strong> 75(1), 49–65.<br />

Zitnick, C. L., Kang, S. B., Uyttendaele, M., Winder, S. & Szeliski, R. (2004), ‘High-quality video<br />

view interpolation using a layered representation’, ACM Transactions on Graphics 23(3), 600–<br />

608.


Abbreviations<br />

AD Absolute Differences<br />

ASIC Application-Specific Integrated Circuit<br />

ASW Adaptive Support Weight<br />

CA Cellular Automaton<br />

CIE Commission Internationale d’Eclairage (International Commission <strong>of</strong> Illumination)<br />

CPU Central Processing Unit<br />

DP Dynamic Programming<br />

DSI Disparity Space Image<br />

DSP Digital Signal Processor<br />

EDA Electronic Design Automation<br />

EKF Extended Kalman Filter<br />

EM Expectation Maximization<br />

FIS Fuzzy Inference System<br />

FPGA Field-Programmable Gate Array<br />

GGCP Generalized Ground Control Points<br />

GPU Graphics Processing Unit<br />

CWT Continuous Wavelet Trans<strong>for</strong>m<br />

HFoV Horizontal Field <strong>of</strong> View<br />

HSL Hue Saturation Luminosity/Lightness (Color model)<br />

HSV Hue Saturation Value (Color model)<br />

HVS Human Visual System<br />

IR Infrared<br />

LCDM Luminosity-Compensated Dissimilarity Measure<br />

LoG Laplacian <strong>of</strong> Gaussian<br />

LWPC Local Weighted Phase-Correlation<br />

MF Membership Function<br />

MRF Markov R<strong>and</strong>om Field<br />

NCC Normalized Cross Correlation<br />

NMSE Normalized Mean Square Error<br />

NN Neural Network<br />

NURBS Non-Uni<strong>for</strong>m Rational B-Splines<br />

PC Personal Computer<br />

131


132 Abbreviations<br />

RDP Reliability-based Dynamic Programming<br />

RGB Red Green Blue (Color model)<br />

SAD Sum <strong>of</strong> Absolute Differences<br />

SD Squared Differences<br />

SIFT Scale-Invariant Feature Trans<strong>for</strong>m<br />

SLAM Simultaneous Localization <strong>and</strong> Mapping<br />

SSD Sum <strong>of</strong> Squared Differences<br />

SURF Speeded-Up Robust Features<br />

VR Virtual Reality<br />

WTA Winner Takes All<br />

ZNCC Zero-Normalized Cross Correlation


Thesis Publications<br />

Journals:<br />

1. L. Nalpantidis <strong>and</strong> A. Gasteratos. <strong>Stereo</strong>vision-based fuzzy obstacle avoidance method. International<br />

Journal <strong>of</strong> Humanoid Robotics, in press.<br />

2. L. Nalpantidis, A. Amanatiadis, G. C. Sirakoulis, <strong>and</strong> A. Gasteratos. An efficient hierarchical<br />

matching algorithm <strong>for</strong> processing uncalibrated stereo vision images <strong>and</strong> its hardware architecture.<br />

IET Image Processing, in press.<br />

3. L. Nalpantidis <strong>and</strong> A. Gasteratos. Biologically <strong>and</strong> psychophysically inspired adaptive support<br />

weights algorithm <strong>for</strong> stereo correspondence. Robotics <strong>and</strong> Autonomous <strong>Systems</strong>, 58:457-464,<br />

2010.<br />

4. L. Nalpantidis <strong>and</strong> A. Gasteratos. <strong>Stereo</strong> vision <strong>for</strong> robotic applications in the presence <strong>of</strong> nonideal<br />

lighting conditions. Image <strong>and</strong> <strong>Vision</strong> Computing, 28:940-951, 2010.<br />

5. L. Nalpantidis, G. C. Sirakoulis, <strong>and</strong> A. Gasteratos. Review <strong>of</strong> stereo vision algorithms: from<br />

s<strong>of</strong>tware to hardware. International Journal <strong>of</strong> Optomechatronics, 2(4):435-462, 2008.<br />

Conferences:<br />

1. L. Nalpantidis, G. C. Sirakoulis, A. Carbone, <strong>and</strong> A. Gasteratos. Computationally effective stereovision<br />

SLAM. In IEEE International Conference on Imaging <strong>Systems</strong> <strong>and</strong> Techniques, Thessaloniki,<br />

Greece, July 2010.<br />

2. I. Kostavelis, L. Nalpantidis, <strong>and</strong> A. Gasteratos. Comparative presentation <strong>of</strong> real-time obstacle<br />

avoidance algorithms using solely stereo vision. In IARP/EURON International Workshop on<br />

Robotics <strong>for</strong> risky interventions <strong>and</strong> Environmental Surveillance-Maintenance, Sheffield, UK,<br />

January 2010.<br />

3. L. Nalpantidis, I. Kostavelis, <strong>and</strong> A. Gasteratos. <strong>Stereo</strong>vision-based algorithm <strong>for</strong> obstacle avoidance.<br />

In International Conference on Intelligent Robotics <strong>and</strong> Applications, volume 5928 <strong>of</strong> Lecture<br />

Notes in Computer Science, pages 195-204, Singapore, December 2009. Springer-Verlag.<br />

4. L. Nalpantidis, D. Chrysostomou, <strong>and</strong> A. Gasteratos. Obtaining reliable depth maps <strong>for</strong> robotic<br />

applications with a quad-camera system. In International Conference on Intelligent Robotics<br />

<strong>and</strong> Applications, volume 5928 <strong>of</strong> Lecture Notes in Computer Science, pages 906-916, Singapore,<br />

December 2009. Springer-Verlag.<br />

133


134 Thesis Publications<br />

5. Y. Baudoin, D. Dor<strong>of</strong>tei, G. De Cubber, S. A. Berrabah, C. Pinzon, F. Warlet, J. Gancet, E.<br />

Motard, M. Ilzkovitz, L. Nalpantidis, <strong>and</strong> A. Gasteratos. View-finder: Robotics assistance to firefighting<br />

services <strong>and</strong> crisis management. In IEEE International Workshop on Safety, Security,<br />

<strong>and</strong> Rescue Robotics, pages 1-6, Denver, Colorado, USA, November 2009.<br />

6. I. Kostavelis, L. Nalpantidis, <strong>and</strong> A. Gasteratos. Real-time algorithm <strong>for</strong> obstacle avoidance. In<br />

Third Panhellenic Scientific Student Conference on In<strong>for</strong>matics, Corfu, Greece, September 2009.<br />

7. L. Nalpantidis, A. Amanatiadis, G. C. Sirakoulis, N. Kyriakoulis, <strong>and</strong> A. Gasteratos. Dense<br />

disparity estimation using a hierarchical matching technique from uncalibrated stereo vision. In<br />

IEEE International Workshop on Imaging <strong>Systems</strong> <strong>and</strong> Techniques, pages 427-431, Shenzhen,<br />

China, May 2009.<br />

8. G. De Cubber, D. Dor<strong>of</strong>tei, L. Nalpantidis, G. C. Sirakoulis, <strong>and</strong> A. Gasteratos. <strong>Stereo</strong>-based<br />

terrain traversability analysis <strong>for</strong> robot navigation. In IARP/EURON Workshop on Robotics <strong>for</strong><br />

Risky Interventions <strong>and</strong> Environmental Surveillance, Brussels, Belgium, 2009.<br />

9. L. Nalpantidis, G. C. Sirakoulis, <strong>and</strong> A. Gasteratos. A dense stereo correspondence algorithm<br />

<strong>for</strong> hardware implementation with enhanced disparity selection. In 5th Hellenic conference on<br />

Artificial Intelligence, volume 5138 <strong>of</strong> Lecture Notes in Computer Science, pages 365-370, Syros,<br />

Greece, 2008. Springer-Verlag.<br />

10. G. De Cubber, L. Nalpantidis, G. C. Sirakoulis, <strong>and</strong> A. Gasteratos. Intelligent robots need<br />

intelligent vision: Visual 3d perception. In IARP/EURON Workshop on Robotics <strong>for</strong> Risky<br />

Interventions <strong>and</strong> Environmental Surveillance, Benic-ssim, Spain, 2008.<br />

11. L. Nalpantidis, G. C. Sirakoulis, <strong>and</strong> A. Gasteratos. Review <strong>of</strong> stereo matching algorithms <strong>for</strong><br />

3d vision. In 16th International Symposium on Measurement <strong>and</strong> Control in Robotics, pages<br />

116-124, Warsaw, Pol<strong>and</strong>, 2007.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!