PhD Thesis Poppinga: RRT - Jacobs University

PhD Thesis Poppinga: RRT - Jacobs University PhD Thesis Poppinga: RRT - Jacobs University

jacobs.university.de
from jacobs.university.de More from this publisher
11.03.2014 Views

Towards Autonomous Navigation for Robots with 3D Sensors by Jann Poppinga A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Approved Dissertation Committee Prof. Dr. Andreas Birk Prof. Dr. Andreas Nüchter Prof. Dr.-Ing. Heiko Mosemann Defended on September 17th, 2010 School of Engineering and Science

Towards Autonomous Navigation for Robots with 3D<br />

Sensors<br />

by<br />

Jann <strong>Poppinga</strong><br />

A thesis submitted in partial fulfillment<br />

of the requirements for the degree of<br />

Doctor of Philosophy<br />

in Computer Science<br />

Approved Dissertation Committee<br />

Prof. Dr. Andreas Birk<br />

Prof. Dr. Andreas Nüchter<br />

Prof. Dr.-Ing. Heiko Mosemann<br />

Defended on September 17th, 2010<br />

School of Engineering and Science


Abstract<br />

Autonomous navigation is an important challenge in many application fields of mobile robotics. This<br />

thesis contributes to mobile robot autonomous navigation at different levels. In chapter 2, different<br />

methods for 3D sensing are discussed. In particular, we evaluate a time-of-flight range camera in<br />

various situations. We develop a method to detect and, if desirable, heuristically correct the most<br />

inhibitive error source: Range is detected via phase shift. Because of the way phase shift is measured,<br />

values beyond 360° will be mapped back into the [0°, 360°[ interval. The derived range is consequently<br />

also affected. The error is inherent not only to this model, but to all range cameras measuring<br />

the time-of-flight in the same way. Our correction makes it possible to use the range camera on a<br />

mobile robot for high-frequency 3D sensing. In chapter 3, we present several 3D point cloud datasets<br />

gathered with the time-of-flight camera presented in the preceding chapter and other 3D sensors: an<br />

actuated laser range-finder, a stereo camera, and a 3D sonar. The datasets represent a wide variety of<br />

real-world and set-up scenes, indoors and outdoors, underwater and land, and small scale and large<br />

scale. These datasets are used in the experiments in the following chapters. In chapter 4, we present a<br />

robust short-range obstacle detection algorithm that runs fast enough to actually make use of the range<br />

camera’s high data rate. It is based on the Hough transform for planes in 3D point clouds. The bins are<br />

relatively coarse which turned out to be enough for testing for drivability. The method allows a mobile<br />

robot to reliably detect the drivability of the terrain it faces. With the same method and finer bins it<br />

is possible to classifiy the terrain, albeit not as reliably as checking drivability. Experiments with two<br />

types of sensors on data from indoors and outdoors demonstrated the algorithm’s performance. The<br />

processing time typically lies between 5 and 50 ms. This is enough for real-time processing on a robot<br />

moving at reasonable speed. In chapter 5, we develop the Patch Map data-structure for memory efficient<br />

3D mapping based on planar surfaces extracted from 3D point cloud data. It is flexible enough<br />

to allow for different kind of surface representations, different methods of collision detection, and different<br />

kinds of roadmap algorithms. Surface representations are planar polygons and trimeshes. We<br />

survey and benchmark different methods of generating planar patches from a point cloud segmented<br />

into planar regions. We also implement a point cloud based variant of the map data-structure that<br />

allows for comparison with this standard data-structure. Collision detection is implemented in a way<br />

to exploit the fact that planar patches are polygons and also based on two external collision detection<br />

libraries, Bullet and OPCODE. The implemented roadmap algorithms are Rapidly-exploring Random<br />

Tree (<strong>RRT</strong>), Probabilistic Roadmap Method (PRM) and variants of these, most notably variants of<br />

<strong>RRT</strong> and PRM that place vertices on the medial axes of the map without explicitly computing it. We<br />

benchmark all collision detection methods with all roadmap algorithms on synthetic data to find out<br />

the most efficient ones. <strong>RRT</strong> is the best roadmap algorithm and Bullet the fastest collision detection.<br />

Thus, the Patch map data-structure shows its flexibility and usability in practice. In chapter 6, we<br />

thoroughly test the Patch Map data-structure developed in the preceding chapter on real-world data.<br />

The Patch Map consisting only of planar patches is 18 times smaller than the point clouds it is based<br />

on. We perform both roadmap generation with PRM and <strong>RRT</strong> and way finding from start to goal,<br />

3


ased on <strong>RRT</strong>. We compare our approach to the established methods of trimeshes and point clouds<br />

and find that it performs an order of magnitude faster on 3D LRF data and also considerably better on<br />

sonar data. We also show that this speed advantage does not come at the cost of loss of precision. To<br />

this end, we compare the total explorable space on the different map representations and found they<br />

only marginally differ. In summary, the Patch Map is proven to be a viable alternative to standard<br />

methods that is much more economical with memory.<br />

4


Preface<br />

This thesis is based on my work in the <strong>Jacobs</strong> Robotics Group at <strong>Jacobs</strong> <strong>University</strong> Bremen. It would<br />

not have been possible without the work, help, collaboration, and inspiration of, with and by (in<br />

alphabetical order) Rares Ambrus, Hamed Bastani, Andreas Birk, Heiko Bülow, Winai Chonnaparamutt,<br />

Ivan Delchev, Stefan Markov, Mohammed Nour, Yashodan Nevatia, Kaustubh Pathak, Max<br />

Pfingsthorn, Ravi Rathnam, Sören Schwertfeger, Todor Stoyanov, and Narūnas Vaškevičius.<br />

Furthermore, I would like to thank my wife Danni and my daughter Lina for keeping my spirits<br />

up during the stressful days of completing this thesis. Brigitte Dörr provided the invaluable help of<br />

proof-reading together with her cousin. Stefan May clarified open questions in optics.<br />

Together with my co-authors, I layed the basis for this thesis in the following publications (the<br />

order corresponds to the order of their contribution of the thesis):<br />

• Jann <strong>Poppinga</strong> and Andreas Birk. A Novel Approach to Wrap Around Error Correction for<br />

a Time-Of-Flight 3D Camera. In Luca Iocchi, Hitoshi Matsubara, Alfredo Weitzenfeld, and<br />

Changjiu Zhou, editors, RoboCup 2008: Robot WorldCup XII, Lecture Notes in Artificial Intelligence<br />

(LNAI). Springer, 2009, henceforth referred to as [<strong>Poppinga</strong> and Birk, 2009].<br />

• Jann <strong>Poppinga</strong>, Andreas Birk, and Kaustubh Pathak. Hough based Terrain Classification for<br />

Realtime Detection of Drivable Ground. In Journal of Field Robotics, 25(1-2):67–88, 2008,<br />

henceforth referred to as [<strong>Poppinga</strong> et al., 2008a].<br />

• Andreas Birk, Todor Stoyanov, Yashodhan Nevatia, Rares Ambrus, Jann <strong>Poppinga</strong>, and Kaustubh<br />

Pathak. Terrain Classification for Autonomous Robot Mobility: from Safety, Security<br />

Rescue Robotics to Planetary Exploration. In Planetary Rovers Workshop, International Conference<br />

on Robotics and Automation (ICRA). IEEE, 2008.<br />

• Birk, A., <strong>Poppinga</strong>, J., Stoyanov, T., and Nevatia, Y. Planetary Exploration in USARsim: A<br />

Case Study including Real World Data from Mars. In Iocchi, L., Matsubara, H., Weitzenfeld,<br />

A., and Zhou, C., editors, RoboCup 2008: Robot WorldCup XII, Lecture Notes in Artificial<br />

Intelligence (LNAI). Springer, .<br />

• Narunas Vaskevicius, Andreas Birk, Kaustubh Pathak, and Jann <strong>Poppinga</strong>. Fast Detection of<br />

Polygons in 3D Point Clouds from Noise-Prone Range Sensors. In International Workshop<br />

on Safety, Security, and Rescue Robotics (SSRR). IEEE Press, 2007, henceforth referred to as<br />

[Vaskevicius et al., 2007].<br />

• Jann <strong>Poppinga</strong>, Narunas Vaskevicius, Andreas Birk, and Kaustubh Pathak. Fast Plane Detection<br />

and Polygonalization in noisy 3D Range Images. In International Conference on Intelligent<br />

Robots and Systems (IROS), pages 3378 – 3383, Nice, France, 2008. IEEE Press, henceforth<br />

referred to as [<strong>Poppinga</strong> et al., 2008b].<br />

5


• Jann <strong>Poppinga</strong>, Max Pfingsthorn, Soeren Schwertfeger, Kaustubh Pathak, and Andreas Birk.<br />

Optimized Octtree Datastructure and Access Methods for 3D Mapping. In IEEE Safety, Security,<br />

and Rescue Robotics (SSRR). IEEE Press, 2007,<br />

henceforth referred to as [<strong>Poppinga</strong> et al., 2007].<br />

The patch map datastructure is built based on the work by Kaustubh Pathak, published in:<br />

• Andreas Birk, Kaustubh Pathak, Narunas Vaskevicius, Max Pfingsthorn, Jann <strong>Poppinga</strong>, and<br />

Soeren Schwertfeger. Surface Representations for 3D Mapping: A Case for a Paradigm Shift.<br />

KI - German Journal on Artificial Intelligence, 2010.<br />

• Kaustubh Pathak, Andreas Birk, Narunas Vaskevicius, Max Pfingsthorn, Soeren Schwertfeger,<br />

and Jann <strong>Poppinga</strong>. Online 3D SLAM by Registration of Large Planar Surface Segments and<br />

Closed Form Pose-Graph Relaxation. Journal of Field Robotics, Special Issue on 3D Mapping,<br />

27(1):52–84, 2010.<br />

• Kaustubh Pathak, Narunas Vaskevicius, Jann <strong>Poppinga</strong>, Max Pfingsthorn, Soeren Schwertfeger,<br />

and Andreas Birk. Fast 3D Mapping by Matching Planes Extracted from Range Sensor Point-<br />

Clouds. In International Conference on Intelligent Robots and Systems (IROS). IEEE Press,<br />

2009.<br />

• Andreas Birk, Narunas Vaskevicius, Kaustubh Pathak, Soeren Schwertfeger, Jann <strong>Poppinga</strong>,<br />

and Heiko Buelow. 3-D Perception and Modeling: Motion-Level Teleoperation and Intelligent<br />

Autonomous Functions. IEEE Robotics and Automation Magazine (RAM), December, 2009.<br />

• K. Pathak, A. Birk, N. Vaškevičius, and J. <strong>Poppinga</strong>. Fast registration based on noisy planes<br />

with unknown correspondences for 3-d mapping. Robotics, IEEE Transactions on, 26(3):424<br />

–441, June 2010, henceforth referred to as [Pathak et al., 2010b].<br />

Data collected by me and library functions implemented by me were used in object classification:<br />

• Soeren Schwertfeger, Jann <strong>Poppinga</strong>, and Andreas Birk. Towards Object Classification using<br />

3D Sensor Data. In ECSIS Symposium on Learning and Adaptive Behaviors for Robotic Systems<br />

(LAB-RS). IEEE, 2008.<br />

In an unrelated educational project, I worked with humanoid robots:<br />

• Andreas Birk, Jann <strong>Poppinga</strong>, and Max Pfingsthorn. Using different Humanoid Robots for<br />

Science Edutainment of Secondary School Pupils. In Luca Iocchi, Hitoshi Matsubara, Alfredo<br />

Weitzenfeld, and Changjiu Zhou, editors, RoboCup 2008: Robot WorldCup XII, Lecture Notes<br />

in Artificial Intelligence (LNAI). Springer, 2009.<br />

Based on my work with the SwissRanger sensor, K. Pathak implemented a forward sensor model,<br />

sensor fusion, and achieved sub-pixel accuracy:<br />

6<br />

• Kaustubh Pathak, Andreas Birk, Soeren Schwertfeger, and Jann <strong>Poppinga</strong>. 3D Forward Sensor<br />

Modeling and Application to Occupancy Grid Based Sensor Fusion. In International Conference<br />

on Intelligent Robots and Systems (IROS), pages 2059 – 2064, San Diego, USA, 2007.<br />

IEEE Press.<br />

• Kaustubh Pathak, Andreas Birk, Jann <strong>Poppinga</strong>, and Sören Schwertfeger. 3d forward sensor<br />

modeling and application to occupancy grid based sensor fusion. In IEEE/RSJ Int. Conf. on<br />

Intelligent Robots and Systems, San Diego, Nov 2007.


• Kaustubh Pathak, Andreas Birk, and Jann <strong>Poppinga</strong>. Subpixel Depth Accuracy with a Time of<br />

Flight Sensor using Multimodal Gaussian Analysis. In International Conference on Intelligent<br />

Robots and Systems (IROS), Nice, France, 2008. IEEE Press.<br />

Parts of the data was collected in the <strong>Jacobs</strong> Robotics Arena, documented in:<br />

• Andreas Birk, Kaustubh Pathak, Jann <strong>Poppinga</strong>, Sören Schwertfeger, Max Pfingsthorn, and<br />

Heiko Bülow. The <strong>Jacobs</strong> Test Arena for Safety, Security, and Rescue Robotics (SSRR). In<br />

WS on Performance Evaluation and Benchmarking for Intelligent Robots and Systems, Intern.<br />

Conf. on Intelligent Robots and Systems (IROS). IEEE Press, 2007.<br />

My general contributions to the <strong>Jacobs</strong> Robotics robot system are published as:<br />

• Andreas Birk, Kaustubh Pathak, Jann <strong>Poppinga</strong>, Soeren Schwertfeger, and Winai Chonnaparamutt.<br />

Intelligent Behaviors in Outdoor Environments. In 13th International Conference on<br />

Robotics and Applications, Special Session on Outdoor Robotics - Taking Robots off road.<br />

IASTED, 2007.<br />

7


Contents<br />

1 Introduction 19<br />

1.1 Obstacle avoidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19<br />

1.2 3D Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21<br />

1.2.1 3D mapping, not obstacle avoidance . . . . . . . . . . . . . . . . . . . . . . 21<br />

1.2.2 Mapping: from 2D to 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22<br />

1.2.3 3D mapping – state of the art . . . . . . . . . . . . . . . . . . . . . . . . . . 23<br />

1.3 3D Path Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br />

1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26<br />

2 3D Sensing 27<br />

2.1 3D laser range finders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br />

2.2 Stereo cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28<br />

2.3 Time-of-flight cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30<br />

2.4 Sonar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />

2.5 Case study: The SwissRanger time-of-flight camera . . . . . . . . . . . . . . . . . . 31<br />

2.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />

2.5.2 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33<br />

2.5.3 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34<br />

2.5.4 Types of Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34<br />

2.5.5 A novel solution to wrap-around: Adaptive Amplitude Threshold . . . . . . 41<br />

2.5.6 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44<br />

3 Datasets 51<br />

3.1 Datasets with range cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53<br />

3.1.1 Arena . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53<br />

3.1.2 Outdoor 1-3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53<br />

3.1.3 Planar/Round/Holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54<br />

3.2 Datasets with actuated LRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54<br />

3.2.1 Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56<br />

3.2.2 Crashed Car Park . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56<br />

3.2.3 Dwelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56<br />

3.2.4 Hannover ’09 Hall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56<br />

3.2.5 Hannover ’09 Arena . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59<br />

3.3 Dataset with sonar: Lesumsperrwerk . . . . . . . . . . . . . . . . . . . . . . . . . . 59<br />

3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59<br />

9


CONTENTS<br />

4 Near Field 3D Navigation with the Hough Transform 65<br />

4.1 Approach and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65<br />

4.1.1 The Hough transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65<br />

4.1.2 Plane Parameterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66<br />

4.1.3 Processing of the Hough Space for Classification . . . . . . . . . . . . . . . 68<br />

4.2 Experiments and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69<br />

4.2.1 Hough space examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71<br />

4.2.2 General Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 71<br />

5 Patch Map Data-Structure 79<br />

5.1 Mapping with planar patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79<br />

5.2 Surface Patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81<br />

5.2.1 Planar Patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81<br />

5.2.2 Trimesh Patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82<br />

5.2.3 Other Patch Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82<br />

5.3 Generation of patches from point clouds . . . . . . . . . . . . . . . . . . . . . . . . 82<br />

5.3.1 Trimesh Patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82<br />

5.3.2 Planar Patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82<br />

5.4 Patch Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98<br />

5.4.1 Patch Map Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99<br />

5.4.2 Patch Map Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99<br />

5.5 Algorithms on patch maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105<br />

5.5.1 Evasion <strong>RRT</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105<br />

5.5.2 Medial Axis algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107<br />

5.5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107<br />

5.5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109<br />

6 3D Roadmaps for Unmanned Aerial Vehicles on Planar Patch Map 113<br />

6.1 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113<br />

6.1.1 Explored Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115<br />

6.1.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115<br />

6.1.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117<br />

6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117<br />

6.2.1 Results of <strong>RRT</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117<br />

6.2.2 Results of PRM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120<br />

6.2.3 Lesumsperrwerk dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127<br />

7 Conclusion 131<br />

A Addenda & Errata 133<br />

Bibliography 134<br />

10


List of Figures<br />

1.1 Rugbot at a drill and at ELROB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21<br />

1.2 Rugbot at RoboCup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22<br />

1.3 An indicator for 3D mapping: Collapsed Building at Disaster City training site in Texas 23<br />

2.1 Functional principle of stereo vision . . . . . . . . . . . . . . . . . . . . . . . . . . 29<br />

2.2 Wave properties and their measurement . . . . . . . . . . . . . . . . . . . . . . . . 30<br />

2.3 A simple scene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />

2.4 Sample scene shot with different (3D) cameras . . . . . . . . . . . . . . . . . . . . 32<br />

2.5 A lab scene at different amplitude thresholds . . . . . . . . . . . . . . . . . . . . . . 33<br />

2.6 Deviation at different distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35<br />

2.7 Inhomogeneous lighting on white homogeneous surface . . . . . . . . . . . . . . . . 36<br />

2.8 SR errors caused by ambient light . . . . . . . . . . . . . . . . . . . . . . . . . . . 36<br />

2.9 Amplitude influences range measurement . . . . . . . . . . . . . . . . . . . . . . . 37<br />

2.10 Light scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38<br />

2.11 A ghost image of the rod can be seen on both edges of the image (circled). . . . . . . 39<br />

2.12 Errors caused by a moving object (AT=500). . . . . . . . . . . . . . . . . . . . . . . 40<br />

2.13 Reflections, here on a floor, cause the measurement of false distances. . . . . . . . . 40<br />

2.14 An object at various too short distances to the camera, AT 0 . . . . . . . . . . . . . . 41<br />

2.15 Irregular distribution of near-IR light from the camera’s illumination unit . . . . . . . 43<br />

2.16 Three test scenes comparing the proposed method to the standard method . . . . . . 44<br />

2.17 Three test scenes comparing the proposed method to the standard AT method of the<br />

SwissRanger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br />

2.18 Webcam images for scenes demonstrating the correction of errors other than wraparound<br />

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46<br />

2.19 SwissRanger images for scenes demonstrating the correction of errors other than<br />

wrap-around . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47<br />

2.20 Different error exclusion criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48<br />

2.21 An example for correcting wrapped-around pixels . . . . . . . . . . . . . . . . . . . 49<br />

3.1 The autonomous version of a Rugbot with some important on-board sensors pointed out 51<br />

3.2 Example range images from the Planar/Round/Holes dataset . . . . . . . . . . . . . 52<br />

3.3 A SwissRanger point cloud from the Arena dataset . . . . . . . . . . . . . . . . . . 53<br />

3.4 Photos from the Arena dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54<br />

3.5 Photos of the different types of scenes encountered in the Outdoor 1 dataset. The<br />

photos are taken by a webcam which is mounted right next to the 3D sensors. . . . . 54<br />

11


LIST OF FIGURES<br />

3.6 The data returned by the SwissRanger (left) and the stereo camera (right) for the<br />

scenes from the Outdoor 1 dataset. Photos in figure 3.5. . . . . . . . . . . . . . . . . 55<br />

3.7 Photos of the robotics lab with a locomotion test arena in form of a high bay rack. . . 56<br />

3.8 ALRF range image from the lab dataset . . . . . . . . . . . . . . . . . . . . . . . . 57<br />

3.9 The Crashed Car Park in Disaster City in Texas where one of the datasets used in the<br />

experiments was recorded. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57<br />

3.10 Perspective view of two point clouds from the Dwelling dataset from Disaster City . 58<br />

3.11 The Hannover ’09 Hall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59<br />

3.12 Two views of a point cloud containing 78528 points from the Hannover ’09 Arena<br />

dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60<br />

3.13 An overview of the Lesumsperrwerk as seen from the river’s surface. . . . . . . . . . 61<br />

3.14 Two views of a sonar point cloud from the Lesumsperrwerk dataset . . . . . . . . . . 61<br />

4.1 The definition for the angles ρ x and ρ y . . . . . . . . . . . . . . . . . . . . . . . . . 67<br />

4.2 An example scene where the stereo camera delivers few data points; especially ground<br />

information is missing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70<br />

4.3 Two-dimensional depictions of the three dimensional parameter (Hough-)space for<br />

several example snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72<br />

4.4 Mean processing time and cardinality of point clouds . . . . . . . . . . . . . . . . . 73<br />

5.1 Applying a 3D transform to a 2D polygon . . . . . . . . . . . . . . . . . . . . . . . 80<br />

5.2 Different methods of projecting the point of a sub point cloud to the optimal plane<br />

fitted to them . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85<br />

5.3 Problems with the two types of projection . . . . . . . . . . . . . . . . . . . . . . . 85<br />

5.4 Differences between orthogonal projection and projection along the beam . . . . . . 86<br />

5.5 Late projection can cause intersections when applied to ALRF data . . . . . . . . . . 87<br />

5.6 An α-shape (grey) of a 2D point cloud (orange) . . . . . . . . . . . . . . . . . . . . 88<br />

5.7 Some examples for convex polygons on a grid . . . . . . . . . . . . . . . . . . . . . 89<br />

5.8 An example iteration of the algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 89<br />

5.9 Triangulation of convex grid polygons . . . . . . . . . . . . . . . . . . . . . . . . . 91<br />

5.10 Comparison of triangulation with/without the restriction to the area covered by the<br />

naive triangulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92<br />

5.11 Triangles with annotated spikyness . . . . . . . . . . . . . . . . . . . . . . . . . . . 92<br />

5.12 Time taken, number of polygons and spikyness for the different triangulation algorithms<br />

on the German Open ’09 Arena dataset. . . . . . . . . . . . . . . . . . . . . 93<br />

5.13 Time taken, number of polygons and spikyness for the different triangulation algorithms<br />

on the German Open ’09 Hall dataset. . . . . . . . . . . . . . . . . . . . . . 94<br />

5.14 Time taken, number of polygons and spikyness for the different triangulation algorithms<br />

on the Planar/Round/Holes dataset. . . . . . . . . . . . . . . . . . . . . . . . 96<br />

5.15 An example of outlining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97<br />

5.16 Class structure of the different variants of the patch map framework . . . . . . . . . 98<br />

5.17 One dimensional case of bounding box based broad phase intersection test . . . . . . 102<br />

5.18 A run of the kD-tree based collision detection algorithm detecting no collision. . . . 104<br />

5.19 Map used in preliminary experiments . . . . . . . . . . . . . . . . . . . . . . . . . 108<br />

6.1 Real-world example for an explored volume . . . . . . . . . . . . . . . . . . . . . . 116<br />

6.2 Using explored volume to compare different map types . . . . . . . . . . . . . . . . 117<br />

12


LIST OF FIGURES<br />

6.3 The lab model the roadmap experiments are run on. . . . . . . . . . . . . . . . . . . 118<br />

6.4 Time taken by the <strong>RRT</strong> algorithm on the four different map types on the lab dataset. . 121<br />

6.5 Time taken by the <strong>RRT</strong> algorithm on the four different map types on the Crashed Car<br />

Park dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122<br />

6.6 A visualization of an <strong>RRT</strong> generated in 100 iterations in the <strong>Jacobs</strong> Robotics lab hybrid<br />

map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123<br />

6.7 Time taken by the PRM algorithm on the four different map types on the lab dataset. 124<br />

6.8 Time taken by the PRM algorithm on the four different map types on the Crashed Car<br />

Park dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125<br />

6.9 Collision detection distance for PRM . . . . . . . . . . . . . . . . . . . . . . . . . . 126<br />

6.10 Without using a bounding box, road-maps may be planned outside the valid area –<br />

even above water as in this case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127<br />

6.11 Fraction of runs where the goal was reached for <strong>RRT</strong> on the first two scans of the<br />

Lesumsperrwerk dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128<br />

6.12 Time taken for different parameters on different scans of the Lesumsperrwerk data set 129<br />

A.1 Unsimplifyable outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134<br />

13


14<br />

LIST OF FIGURES


List of Tables<br />

2.1 Specification of the 3D Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28<br />

2.2 Technical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33<br />

2.3 Reflectivity of different materials for light with 900 nm wavelength . . . . . . . . . . 38<br />

2.4 Results on all six scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46<br />

3.1 Number of points in the datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62<br />

3.2 Spatial properties of the datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63<br />

4.1 Stereo and TOF point clouds in the four datasets used in the experiments. . . . . . . 69<br />

4.2 Percentages of snapshots excluded from Hough transform via preprocessing . . . . . 70<br />

4.3 Success rates and computation times for drivability detection. . . . . . . . . . . . . . 72<br />

4.4 Human generated ground truth labels for the stereo camera data of the different scenes 74<br />

4.5 Human generated ground truth labels for the SwissRanger data of the different scenes 75<br />

4.6 Classification rates and run times for stereo camera data processed at different angular<br />

resolutions of the Hough space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76<br />

4.7 Classification rates and run times for SwissRanger data processed at different angular<br />

resolutions of the Hough space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77<br />

5.1 A comparison of the sizes of different storage methods for surface patches . . . . . . 81<br />

5.2 Required interfaces for patch map based algorithms . . . . . . . . . . . . . . . . . . 105<br />

5.3 Median wall time taken to reach the ROIs from the starting points . . . . . . . . . . 109<br />

5.4 Lower and upper quartile of wall time taken to reach the ROIs from the starting points 109<br />

5.5 Median added length of paths from starting point to all ROIs . . . . . . . . . . . . . 109<br />

5.6 Lower and upper quartile of added length of paths from starting point to all ROIs . . 110<br />

5.7 Median steep turns on paths from starting point to ROIs . . . . . . . . . . . . . . . . 110<br />

5.8 Lower quartile of steep turns on paths from starting point to ROIs . . . . . . . . . . 110<br />

5.9 Upper quartile of steep turns on paths from starting point to ROIs . . . . . . . . . . . 111<br />

5.10 Median curviness of paths from starting point to ROIs . . . . . . . . . . . . . . . . . 111<br />

5.11 Lower and upper quartile of curviness of paths from starting point to ROIs . . . . . . 111<br />

6.1 The actual file sizes of the different map types of the Lab dataset . . . . . . . . . . . 118<br />

6.2 Explored volume on different maps for the <strong>RRT</strong> on the lab dataset . . . . . . . . . . 119<br />

6.3 Explored volume on different maps for the <strong>RRT</strong> on the Crashed Car Park dataset . . 120<br />

6.4 Explored volume on different maps for the PRM on the lab dataset . . . . . . . . . . 123<br />

6.5 Explored volume on different maps for the PRM on the Crashed Car Park dataset . . 126<br />

15


16<br />

LIST OF TABLES


List of Algorithms<br />

1 General form of the Hough transform algorithm . . . . . . . . . . . . . . . . . . . . 66<br />

2 Hough transform applied for plane detection . . . . . . . . . . . . . . . . . . . . . . 66<br />

3 The ground classification algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 68<br />

4 The first phase of trimesh growing . . . . . . . . . . . . . . . . . . . . . . . . . . . 83<br />

5 The second phase of trimesh growing . . . . . . . . . . . . . . . . . . . . . . . . . 84<br />

6 Generating the α-shape A for point cloud P . . . . . . . . . . . . . . . . . . . . . . 88<br />

7 The scanLine algorithm for range image triangulation . . . . . . . . . . . . . . . . . 90<br />

8 Construction of a link polygon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91<br />

9 The original Douglas-Peucker algorithm for open polylines . . . . . . . . . . . . . 98<br />

10 Collision detection for a capsule in a point cloud . . . . . . . . . . . . . . . . . . . . 103<br />

11 The Evasion <strong>RRT</strong> algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106<br />

12 Medial Axis PRM adapted to patch maps . . . . . . . . . . . . . . . . . . . . . . . 107<br />

13 The original <strong>RRT</strong> algorithm from [LaValle, 1998] . . . . . . . . . . . . . . . . . . . 114<br />

14 The original PRM algorithm from [Kavraki et al., 1996] . . . . . . . . . . . . . . . . 114<br />

17


18<br />

LIST OF ALGORITHMS


Chapter 1<br />

Introduction<br />

Robots have come a long way since the first modern robot Unimate. Traditionally, robots were robot<br />

arms, fixed in place in a precisely known environment, almost always manufacturing. More and more,<br />

robots are becoming mobile. The first mobile robots relied on special markers to navigate. Nowadays,<br />

mobile robots are used in practice in manufacturing, military and security, disaster mitigation, and in<br />

hospitals. In the last decade, there is increasingly successful research on enabling mobile robots to<br />

navigate autonomously in unstructured environment. This thesis aims at improving mobile robots’<br />

autonomous navigation in this setting.<br />

1.1 Obstacle avoidance<br />

Mobile robots are used in a great many scenarios. These include urban search and rescue (USAR),<br />

exploration, e.g. underwater, in abandoned mines, or on Mars and other celestial bodies, in combat,<br />

in defusing explosives, and also for entertainment. Traditionally, robots have been operated fully<br />

tele-operated by human operators. However, in any scenario, even partial autonomy can reduce the<br />

cognitive load of the robot operators [Parasuraman et al., 2000]. This increases the chances of success<br />

of any mission. For a robot to move at any degree of autonomy, it needs obstacle avoidance.<br />

Obviously, the speed of the computations for obstacle avoidance limits the speed with which the robot<br />

can move. Traditionally, obstacle avoidance has been conducted in 2D with 2D sensors. This applies<br />

to real robots [Simmons, 1996, Borenstein and Koren, 1989, Thrun, 2002, Lapierre et al., 2007,<br />

Blanco et al., 2007] (even Unmanned Aerial Vehicles (UAV) [Bouabdallah et al., 2007]), simulation<br />

[Kim and Khosla, 1991, Barate and Manzanera, 2008], and theory [Ogren and Leonard, 2005].<br />

2D sensing is obviously not enough in a 3D world. 2D laser range finders mounted parallel to<br />

the floor fail to see ditches in the floor or obstacles above the scanning plane. A first approach to<br />

see beyond a strict 2D world is to extract features from a camera image [Hwang and Chang, 2007,<br />

Chen and Tsai, 2000, Ulrich and Nourbakhsh, 2000, LeCun et al., 2005]. Others rely on optic flow<br />

either from a classical camera [Lewis, 2002] or a 1D camera [Zufferey and Floreano, 2005].<br />

Accurate and complete sensing of the 3D world is the easiest with real 3D sensors. There have<br />

been attempts from early on to make use of the third dimension. They relied on vision, namely<br />

binocular stereo [Matthies et al., 1995, Simmons et al., 1996], slider stereo 1 [Moravec, 1980] or structure<br />

from motion [Charnley and Blissett, 1989, Ohya et al., 1998]. [Moravec, 1980] tries to match<br />

1 Slider stereo is a variant of binocular stereo. Only one camera is used, which is slid along a fixed track on the robot. At<br />

certain evenly spaced spots images are taken. They can be processed similar to the ones in a binocular stereo system.<br />

19


Introduction<br />

all of the few detected 3D points in each image set with all points in the next.<br />

Due to limitations<br />

in computing power at that time this took very long. It was also very error prone. Moravec<br />

assumes an even floor and performs path-planning in 2D. The structure from motion approach in<br />

[Charnley and Blissett, 1989] uses a corner detector which produces relatively sparse features. Based<br />

on these, the authors evaluate the flatness of the terrain as seen from an autonomous vehicle. However,<br />

they consider their inquiry in this direction a first tentative step, to cite:<br />

“ It is clear that we do not drive using Structure-from-Motion alone and we are not<br />

proposing that an autonomous vehicle should do so either. ”<br />

Yet they do manage to extract point clouds and even to calculate the vehicle pose offline by matching<br />

features in consecutive point clouds. By this, they contribute to making their caution a bit less justified<br />

than it was at the time.<br />

One of the first robotic systems with fully working 3D obstacle avoidance used binocular stereo<br />

[Matthies et al., 1995]. They assume a horizontal ground plane, though not a fixed altitude. By<br />

scanning columns in the disparity image, they detect deviation which they classify as obstacles.<br />

[Simmons et al., 1996] uses a similar approach. They also assume a horizontal ground plane, but at a<br />

fixed height. From the points derived from the stereo camera, they generate a local elevation map that<br />

is used for obstacle avoidance. A drastically different approach is pursued in [Ohya et al., 1998]. It is<br />

aimed at office environments where many horizontal and vertical lines can be detected in camera images<br />

and where the assumption of a horizontal ground holds everywhere. Detected lines are matched<br />

to the expected lines given by an a priori wire-frame model of the environment for self-localization.<br />

Obstacle avoidance is done in 2D.<br />

Stereo-vision has remained popular [Chao et al., 2009, Haddad et al., 1998, Schäfer et al., 2005a,<br />

Schäfer et al., 2005b, Okada et al., 2001, Sabe et al., 2004, Larson et al., 2006], but other 3D sensing<br />

methods have gained in use in obstacle avoidance: time-of-flight range cameras [<strong>Poppinga</strong> et al., 2008a,<br />

Mihailidis et al., 2007] (light or laser based), vision augmented with 2D laser range-finders (LRF)<br />

[Michels et al., 2005], object tracking [Michel et al., 2008], and most prominently LRF. In obstacle<br />

avoidance, actuated ones are used [Kelly et al., 2006, Schafer et al., 2008], although in this case adequate<br />

scanning speed is a challenge [Wulf and Wagner, 2003]. When LRFs are fixed in a position<br />

not parallel to all driving directions, they are either inclined [Thrun et al., 2006] or perpendicular<br />

to the floor. The latter case is found in 3D mapping [Howard et al., 2004, Thrun et al., 2000,<br />

Hähnel et al., 2003], but is not useful for obstacle avoidance, since they do not look ahead.<br />

The full scope of the three-dimensional data is not used by most approaches. Typically, they<br />

assume a horizontal ground plane [Chao et al., 2009, Haddad et al., 1998] and enter the originally<br />

3D data into a 2D map. Often the ground floor assumption is used not only to organize data, but<br />

also to identify obstacles (which are defined as too big deviations from it) [Matthies et al., 1995,<br />

Matthies et al., 2002, Simmons et al., 1996, Schäfer et al., 2005b].<br />

This simplification works in man-made environments, but fails once robots have to operate in an<br />

unstructured environment. In this case, one can identify terrain without searching for a ground plane<br />

at all [Schäfer et al., 2005a]. There are, however, a few approaches that do try to find a ground plane<br />

[Kelly et al., 2006, Simmons et al., 1996].<br />

Recently, stereo-vision has been combined with monocular image classification for greater reliability<br />

[Hadsell et al., 2009]. It also combines the longer range of the latter with the 3D map-building<br />

potential of the former. Regions in the monocular image are classified using a neural network and the<br />

ground plane is identified with a post-processed Hough transform. The results are combined into a<br />

local 2D map.<br />

20


1.2 3D Mapping<br />

The major problem of 3D obstacle avoidance is high computational cost. [Simmons et al., 1996]<br />

runs at only 4 Hz, [Hadsell et al., 2009] at 8-10 Hz. [Kelly et al., 2006, Schäfer et al., 2005a] do not<br />

give figures of the speeds for their systems. However, the speed achieved by obstacle avoidance algorithms<br />

is crucial. Together with the sensor range and vehicle inertia, it puts a limit on the maximum<br />

safe speed. In chapter 4, we propose a method that is faster than the above solutions and hence allows<br />

for a higher speed.<br />

Figure 1.1: RUGBOT AT A DRILL AND AT ELROB – A <strong>Jacobs</strong> <strong>University</strong> robot at a drill dealing with<br />

a hazardous material road accident (upper row) and at the European Land Robotics Trials, ELROB-<br />

2006 (lower row).<br />

1.2 3D Mapping<br />

1.2.1 3D mapping, not obstacle avoidance<br />

In many unstructured environments, e.g. in disaster scenarios or underwater, it is beneficial to use<br />

robots in addition to human workers. As humans are needed to operate the robots, any amount of<br />

autonomy in the robots helps reduce their cognitive load [Murphy et al., 2001]. While obstacle avoidance<br />

can be enough to ease steering and reach short-distance goals, mapping is needed if a robot is to<br />

perform more complex tasks. Examples include returning to the starting point of a mission or even<br />

exploration.<br />

While 2D mapping for ground vehicles in static environments with planar floors is considered<br />

a solved problem [Thrun, 2002], UAV often relied on obstacle avoidance [Bouabdallah et al., 2007,<br />

Pflimlin et al., 2006, Zufferey and Floreano, 2005], local [Meister et al., 2009], or non-metric maps<br />

[Courbon et al., 2009].<br />

21


Introduction<br />

Figure 1.2: RUGBOT AT ROBOCUP – The RoboCup rescue competition features a very complex<br />

test environment (left), which includes several standardized test elements. The team demonstrated a<br />

combined usage of a tele-operated with a fully autonomous robot at the world championship 2006<br />

(right).<br />

1.2.2 Mapping: from 2D to 3D<br />

The world is three-dimensional, so the most accurate representation must be a 3D map. However,<br />

a two-dimensional map is easier to handle for humans and less computationally intensive. Since<br />

the inherent simplification works most of the time, 2D mapping is popular even when 3D sensing is<br />

present (e.g. in [Thrun et al., 2003]).<br />

Strictly two-dimensional mapping stops to be a useful abstraction of the real world<br />

1. when significant obstacles are invisible to the 2D sensors,<br />

2. when the mapper is not the only user of the map,<br />

3. when multiple levels one above another are to be mapped, or<br />

4. when there is no dominant ground plane.<br />

Condition 1 only occurs when insufficient sensors are used. A typical case is that of a 2D LRF<br />

scanning parallel to the floor and overlooking obstacles above and below the scanning plane. They<br />

cannot be entered into the map, yet they still pose problems to the robot and later to human users<br />

of the map. This situation can be addressed by using 3D sensors. The map might even remain 2D.<br />

Three-dimensional data can be simplified if no intended later user of the map needs it (condition 2).<br />

This breaks down into two different sub-cases: Either a three-dimensional map is desired as the most<br />

accurate representation for human consumers, or a robot with mobility differing from the mapping<br />

robot needs a map for path-planning. Example: A large mapping robot might not fit under a table, so<br />

it enters it as an obstacle. A smaller robot might fit or a flying robot might be able to fly over it. They<br />

cannot fully use their mobility if they rely on a 2D map generated from 3D data which was simplified<br />

based on the large robot’s properties.<br />

Condition 3 can occur when mapping multi-story buildings. This case can be handled with separate<br />

2D maps for each floor (e.g. in [Sakenas et al., 2007]). This approach is limited to intact buildings.<br />

It fails in highly unstructured environments such as outdoors or in partially collapsed buildings<br />

where there is no clear separation into stories.<br />

22


1.2 3D Mapping<br />

Figure 1.3: An indicator for 3D mapping: Collapsed Building at Disaster City training site in Texas<br />

Condition 4 is the strongest indicator for true 3D mapping (together with 2). In man-made environments<br />

with a well-defined ground-plane (mainly floor, pavement, road) in almost all areas, 2D<br />

scan-matching will work well. The ground-plane does not even have to be perfectly planar. 2D<br />

mapping can still work on slightly hilly terrain: Since scan matching is probabilistic anyway, minor<br />

perturbances are compensated for. In collapsed buildings or very rough terrain, the robot’s roll and<br />

pitch will vary more such that scan matching is no longer possible (see figure 1.3 for an example).<br />

One approach is to compensate this by actuating a 2D LRF such that it is always level [Pellenz, 2007].<br />

However, while roll and pitch may stay the same this way, the robots altitude will change such that the<br />

LRF always hits different obstacles, rendering scan matching moot. Consequently, there is no choice<br />

but to use both 3D sensors and 3D mapping.<br />

1.2.3 3D mapping – state of the art<br />

Attempts at 3D sensing have a long history in mobile robotics [Yakimovsky and Cunningham, 1978,<br />

Moravec, 1980, Harris and Pike, 1988]. These early approaches rely on stereo vision, which, at the<br />

time, only delivered very sparse feature points. This greatly reduced the data load. Still, only short sequences<br />

of images were mapped. Yakimovsky and Cunningham [Yakimovsky and Cunningham, 1978]<br />

present a robot with a binocular stereo camera and an arm. The stereo camera determines the 3D position<br />

of certain image regions. The arm can be controlled by choosing a position on the screen. Harris<br />

and Pike [Harris and Pike, 1988] extract 3D information from an image series from a monocular camera.<br />

Another more recent approach is to assume a flat floor, match scans in 2D, and transform 3D data<br />

accordingly [Howard et al., 2004, Thrun et al., 2000, Hähnel et al., 2003, Thrun et al., 2003]. Howard<br />

et al. [Howard et al., 2004] equipped a Segway base with a horizontal and a vertical laser range finder<br />

(LRF). They steered it around their campus on a 2 km tour. The data from the horizontal LRF was<br />

used for scan matching. They used resulting positions to assemble a 6 million point 3D point cloud in<br />

real time. Thrun et al. [Thrun et al., 2000] use a Pioneer robot base with the same LRF configuration.<br />

They use the same process on several indoor datasets. In one instance, they transform the point<br />

cloud to 82,899 polygons which are then simplified to 8,299. In [Hähnel et al., 2003], this approach<br />

is refined by fitting planes in the complete point cloud to simplify the polygon set more drastically.<br />

When the authors try to extend it to use in underground mines [Thrun et al., 2003], they perform loop<br />

closing with 3D ICP (see below) to compensate for the uneven floor. Sabe et al. [Sabe et al., 2004]<br />

assume presence of a flat floor, but not that it is horizontal. They first detect the ground plane with<br />

23


Introduction<br />

a 3D Hough transform and then use a 2D obstacle grid on the detected plane for path planning of a<br />

humanoid robot.<br />

Even now, real-time 3D mapping remains a challenge. range cameras return tens of thousands, 3D<br />

laser range-finders even hundreds of thousands of points in each scan. (Taking the higher frequency<br />

of the TOF cameras into account, the data rate is similar.) Stereo cameras return a varying number<br />

of points between these two figures. Operations entailed by the mapping, like rigid transformations<br />

involving quite some multiplications, have to be performed for each point individually.<br />

The most popular approach for mapping with complete point clouds is Iterative closest point<br />

(ICP) [Besl and McKay, 1992]. In each iteration, it correlates points in two point clouds with one<br />

another and then analytically calculates the relative pose that minimize average distance of correlated<br />

points. However, ICP tends to get caught in local minima [Pathak et al., 2010b]. Also, pure ICP<br />

is expensive, so different optimizations have been proposed. [Nüchter et al., 2004] subsample the<br />

point clouds before applying an approximate ICP. [Saez and Escolano, 2004] also subsample point<br />

clouds. Additionally, they assume a flat floor. [Cole and Newman, 2006] focus on effectiveness of<br />

ICP and propose a different error criterion that necessitates an optimization step for relative pose<br />

determination. In [Newman et al., 2006] they detect loops with vision to reduce the computational<br />

load brought about by handling point clouds. [May et al., 2009] apply an ICP based approach to point<br />

clouds from a time-of-flight-camera. Also for point cloud based approaches other than ICP, a main<br />

research focus is reducing the computational load. [Sun et al., 2001] propose a subsampling strategy.<br />

In another approach to tackle the high computational load of 3D data, map representations often<br />

use less memory intensive data structures earlier [Weingarten and Siegwart, 2006, Surmann et al., 2003,<br />

Viejo and Cazorla, 2007] or later [Rusu et al., 2008a] in the pipeline. [Weingarten and Siegwart, 2006]<br />

grow planes in single point clouds and use them for scan matching. Rusu, Marton, et al. register<br />

point clouds together, segment the resulting point cloud, and then fit cuboids into the segments<br />

[Rusu et al., 2008a]. Also in [Rusu et al., 2008b], Rusu, Sundaresan, et al. first register multiple point<br />

clouds and then derive a polygonal model, this time consisting of rectangles in a mesh. Watanabe<br />

et al. [Watanabe et al., 2005] detect line segments in the camera image and matched, though only in<br />

simulation. Surmann et al. [Surmann et al., 2003] operate line by line on a scan from an actuated LRF.<br />

They detect line segments in each scan line and fuse these to planar segments, often over-segmenting<br />

the actual surfaces. Several of these patches are then grouped together in bounding boxes. Likewise<br />

on a single scan, [Unnikrishnan and Hebert, 2003] detects walls as object-aligned bounding boxes. In<br />

contrast, Silveira et al. [Silveira et al., 2006] produce planar patches very early. They combine this<br />

with the generation of 3D information from a sequence of intensity images. However, their patches<br />

are rather sparse and incomplete. [Viejo and Cazorla, 2007] use over-segmenting incomplete planar<br />

patches derived analytically from local neighborhoods in the point cloud. These patches are later on<br />

used for Simultaneous Localization And Mapping (SLAM).<br />

Milroy [Milroy et al., 1996] is the first to use the term “patch” to describe a representation of a<br />

surface. There and in the following (e.g. [Gao et al., 2006]), patches are used in reverse engineering.<br />

Mechanical parts like gear wheel, for which the original plans have been lost are scanned in with 3D<br />

laser scanners. Patches are detected in the point cloud which are then used in computer-aided manufacturing<br />

(CAM) to reproduce the part. Gao et al. [Gao et al., 2006] even go so far as to compensate<br />

for wear.<br />

As we have seen, many attempts at 3D mapping use patches in one form or the other. In this vein,<br />

we propose a data structure for mapping that uses planar patches where possible and general trimeshes<br />

where necessary: the Patch Map. This allows to reduce the data in complexity and volume, while still<br />

allowing for correct collision detection. In chapter 5 we will discuss the data structure in detail and in<br />

chapter 6, we will demonstrate roadmap algorithms on it.<br />

24


1.3 3D Path Planning<br />

1.3 3D Path Planning<br />

After constructing a 3D map, the next step is to plan a path in it. For online path planning, possible<br />

changes of the map should be taken into account. New data will be added and the relative pose of<br />

previous scans can change due to data registration and relaxation after loop-closing. Therefore, constructing<br />

expensive data structures like a Voronoi diagram should be avoided. Even data-structures<br />

that are computationally inexpensive to construct such as occupancy grids should be avoided to minimize<br />

memory consumption and to maintain flexibility for SLAM.<br />

Probabilistic path planning is typically used for high-dimensional control spaces, but it is also<br />

useful for object maps: [Koyuncu and Inalhan, 2008] and [Andert and Adolf, 2009] use it on extruded<br />

polygons, [Pettersson and Doherty, 2006] on object-oriented bounding boxes (OBB). Probabilistic<br />

path planning algorithms only require sparse sampling which makes their application on object<br />

maps easier than the generation of a derived structure like e.g. Voronoi diagrams. Two popular<br />

probabilistic path planning algorithms are the Rapidly Exploring Random Tree (<strong>RRT</strong>) proposed by<br />

LaValle [LaValle, 1998] and the Probabilistic Roadmap Method (PRM) proposed by Kavraki et al<br />

[Kavraki et al., 1996]. An overview of uses of the <strong>RRT</strong> can be found in [Lavalle and Kuffner, 2000]<br />

along with an analysis of how the algorithm converges.<br />

Some examples of <strong>RRT</strong> usage include [Kagami et al., 2003]. The authors enter 3D surfaces detected<br />

by a stereo camera into an 3D mesh map with the marching cube algorithm and grow a twosided<br />

<strong>RRT</strong> in the configuration space of the 6-DOF arm of the standing humanoid robot. In contrast,<br />

[Koyuncu and Inalhan, 2008] grow the <strong>RRT</strong> for a helicopter in a simulated city environment in the<br />

workspace. They later on modify the derived trajectory (using B-splines) to fit the kinodynamic constraints<br />

of the vehicle. A similar approach is used in [Andert and Adolf, 2009]. In their work, a<br />

roadmap is constructed with PRM. Obstacles are represented as prisms, the path is smoothed with<br />

cubic splines. It is also intended for a helicopter, but it also uses both synthetic and real-world data.<br />

PRM was also used in [Pettersson and Doherty, 2006], albeit offline. They conducted real-world experiments<br />

using pre-computed maps consisting of object-oriented bounding boxes (OBB) for a helicopter.<br />

In [Wzorek and Doherty, 2006], the same group additionally uses an <strong>RRT</strong>. Another helicopter<br />

UAV is used in [Hrabar, 2008]. It uses a 3D occupancy grid for mapping (but no scan-matching or<br />

even SLAM). PRM is used to construct a roadmap on which a D* Lite graph search is performed<br />

in software and hardware simulation. An a priori roadmap on a height map was generated with the<br />

<strong>RRT</strong> algorithm for a fixed wing UAV in [Saunders et al., 2005]. This is amended by reactive obstacle<br />

avoidance based on 1D LRF, tested in simulation.<br />

The popularity of <strong>RRT</strong> and PRM can also be seen in the number of proposed variants. To maximize<br />

distance of the roadmap to obstacles, [Wilmarth et al., 1999] propose placing the nodes only<br />

on the maps medial axes (“Medial Axis PRM”). They even avoid having to explicitly calculate the<br />

medial axes and their approach works in any dimensionality. However, we tried it (section 5.5) and<br />

found that the increased amount of time needed does not justify the increased quality of the paths.<br />

A similar approach is presented in [Amato et al., 1998], called “Obstacle Based PRM”. The authors<br />

generate nodes close to obstacles. This follows the observation that that is where planning is hard. In<br />

our case, this approach seems promising at first, since it is relatively simple to generate nodes around<br />

patches. However, due to the nature of the data, patches often intersect and are close to each other<br />

in cluttered configurations. This problem is avoided with the Medial Axis PRM. LaValle himself coauthored<br />

[Kuffner and LaValle, 2000], proposing growing two <strong>RRT</strong>s at once – one from the starting<br />

point, one from the goal. This brings the problem of having to check for connectability. This entails<br />

many more collision checks. A different kind of improvement is proposed in [Branicky et al., 2001].<br />

Instead of using standard random number generators in roadmap algorithms, the authors propose us-<br />

25


Introduction<br />

ing pseudo-random numbers for a more even distribution of nodes. In [Andert and Adolf, 2009], this<br />

sampling strategy is compared with the standard random number generator and a third sampling strategy.<br />

Since this work focusses on comparing different map types and different roadmap algorithms,<br />

sampling strategies were out of scope.<br />

Another popular approach to 3D path planning is derived from the A* algorithm [Hart et al., 1968].<br />

For example, [Hwangbo et al., 2007] plans paths for a simulated fixed-wing UAV around polygonal<br />

obstacles with A* planning based on a mesh (with constraints), combined with a local planner. In<br />

[Petres et al., 2007] A* in a continuous domain is used for a simulated AUV in an anisotropic environment<br />

(underwater currents).<br />

Somewhat insular approaches include a graph search on the graph of cells of an occupancy grid for<br />

an UAV with later path refinement [Jun and D’Andrea, 2003] and an Evolutionary Algorithm using B-<br />

splines on a height map for a UAV [Hasircioglu et al., 2008]. Both were tested in simulation only. Path<br />

planning with temporal logic is proposed by [Fainekos et al., 2005], also only tested in simple simulation.<br />

Mark Overmars and Jean-Clause Latombe, who co-authored [Kavraki et al., 1996], propose the<br />

“Corridor Map Method”, which plans 2D paths in 3D environments [Overmars and Geraerts, 2007].<br />

It is parameterizable on the clearance it keeps from obstacles. It is only tested in simulation, albeit<br />

with sophisticated collision detection. The “Elastic Algorithm” is proposed by [Chen et al., 2006]. It<br />

tries to avoid obstacles by first planning a path as if it were a rubber band spanned between start and<br />

goal. This results in paths being very close to obstacles. Also, it is unclear how they intend to find<br />

the way out of spirals or similar configurations. Consequently, they only test on simple maps and also<br />

only in 2D simulation.<br />

1.4 Overview<br />

The rest of this thesis is structured as follows: In chapter 2, we will introduce different methods of<br />

gathering 3D data: stereo cameras, time-of-flight cameras, 3D laser range-finders, and sonars. We<br />

will also discuss the sensors used to gather the data used in this thesis. Furthermore, we investigate<br />

sources of errors and solutions in one specific sensor.<br />

In chapter 3, we present the datasets that have been collected with the sensors from the previous<br />

chapter. These datasets will be used in experiments throughout the thesis. We illustrate the nature of<br />

the data with photos of the sites where they were gathered and renderings of the actual data.<br />

In chapter 4, we demonstrate a method for reactive obstacle avoidance on a mobile land robot.<br />

The method is based on the Hough transform, a method for feature detection. In experiments we<br />

investigate the results achieved in practice.<br />

In chapter 5, we introduce a data structure for 3D mapping: the Patch Map. Its purpose is to<br />

reduce memory consumption of the typically very memory-expensive 3D data. We present several<br />

algorithms that can be performed on the Patch Map and we examine their performance in tentative<br />

experiments on synthetic data.<br />

In chapter 6, we perform extensive experiments with the Patch Map on voluminous real-world<br />

data. We compare the Patch Map to traditional mapping methods like point clouds and trimeshes both<br />

in terms of speed and in terms of accuracy. Chapter 7 concludes this thesis. It summarizes our results<br />

and unique contributions.<br />

26


Chapter 2<br />

3D Sensing<br />

Mobile robotics in unstructured or semi-structured environments inherently relies on range sensing.<br />

From early on, there have been attempts at 3D range sensing [Moravec, 1980]. But only in the last<br />

decade, computers have become powerful enough to handle 3D data in real time. The most popular<br />

approaches to 3D sensing in mobile robotics are laser range finders and stereoscopic cameras<br />

(stereo cameras for short). Recently, time-of-flight cameras have also become a viable option. In the<br />

underwater domain, 3D sonars are well-established for bathemetry, sea-life surveys, and fishing.<br />

These proper 3D sensors are either expensive, slow, or both. Consequently, there is ongoing<br />

research in extracting 3D information from monocular cameras. The most common approach is structure<br />

from motion (see [Oliensis, 2000] for a survey). While allowing for lower costs, these approaches<br />

deliver data that is lacking in density and reliability. Since the main focus of this thesis is 3D data<br />

processing and not 3D sensing, we will concentrate on dedicated 3D sensors that reliably deliver 3D<br />

data.<br />

All these sensors collect data in different ways and produce raw data in different formats. In the<br />

end, however, all 3D data is transformed into a 3D point cloud, a set of Cartesian points. Point clouds<br />

serve as an abstraction that allows diverse geometric algorithms to work with data from different<br />

sensors. Typically, point clouds maintain a structure that mirrors the layout of the sensor. For example<br />

in a range camera, the points may be serialized row-wise.<br />

In the following, we will discuss the 3D sensing methods mentioned above in principle and in<br />

concrete systems. Table 2.1 gives an overview of the basic properties of the sensors surveyed here.<br />

We will investigate a time-of-flight camera in detail and propose a remedy to the error source most<br />

inhibitive for 3D mapping.<br />

2.1 3D laser range finders<br />

Laser range finders (LRF) rely on the time of flight principle: They send out a laser beam and measure<br />

the time it takes for it to be reflected back. Based on the known constant speed of light c, the distance<br />

to the reflecting surface is calculated and reported. The advantages of laser beams over regular light<br />

are that they reduce scattering and interference, resulting in more reliable measurements and higher<br />

range. The downside is that laser beams are more difficult and energy-intensive to generate.<br />

Single laser range finders are used together with quickly actuated mirrors to scan in multiple<br />

directions. Typically, these directions all lie in one plane (2D LRF), but there are also systems available<br />

which distribute the beam three-dimensionally. Please refer to [Point of Beginning, 2006] for a<br />

survey. These systems are intended for geodesic usage and hence provide high resolutions and preci-<br />

27


3D Sensing<br />

Table 2.1: SPECIFICATION OF THE 3D SENSORS – JR=<strong>Jacobs</strong> Robotics, NIR=near-infrared,<br />

TOF=time-of-flight, PS=phase shift, FOV=field of view<br />

ALRF SwissRanger Stereo Camera Eclipse<br />

Manufacturer SICK/JR MESA Img. Videre Design Tritech Int’l<br />

Model S300 SR-3000 STOC Eclipse<br />

Medium Laser NIR light light ultrasound<br />

Principle TOF, direct TOF, via PS Feature triangul. TOF, direct<br />

Range 0 − 20 m .6 − 7.5 m .686 − ∞ m 0.4 − 120 m<br />

Horiz. FOV 270 ◦ 47 ◦ 65.5 ◦ 120 ◦<br />

Vert. FOV 180 ◦ 39 ◦ 51.5 ◦ 30 ◦<br />

Resolution 541 × 361 176 × 144 640 × 480 241 × 31<br />

Approx. data freq. 0.1 Hz 4.6 − 57 Hz 30 0.1 Hz<br />

sion. They are not suited for mobile robotics due to their high weight, power consumption, and slow<br />

scanning speed.<br />

Mobile roboticists have hence taken to generating 3D data with 2D laser scanners. One popular<br />

approach is to rotate the device around an axis in steps, taking one scan at every step. The<br />

single scans can later be combined into a 3D scans since the angle at every step is known. The rotational<br />

axis is typically either perpendicular to the ground (creating yawing motion) or parallel to<br />

the ground and perpendicular to the forward direction of the robot it is mounted on (creating pitching<br />

motion). The actuation may also be continuous, be it yawing [Wulf and Wagner, 2003] or pitching<br />

[Cole and Newman, 2006].<br />

Another approach to generating 3D data with a 2D LRF in mobile robotics is to mount it such that<br />

the scanning plane is not parallel to the ground. In this variant, the device is not actuated, but due to<br />

the motion of the robot, 3D data can be generated [Hähnel et al., 2003]. It is also possible to combine<br />

these two approaches: [Viejo and Cazorla, 2007] actuate the LRF while the robot is moving. Static or<br />

actuated LRF, when the robot is moving between 2D scans, the precision of the 3D data relies on the<br />

precision of the relative poses of the robot.<br />

At <strong>Jacobs</strong> Robotics, we use a stepwise pitching SICK S300 LRF. The robot stops for each scan.<br />

It has a horizontal field of view of 270 ◦ of 541 beams. The minimum and maximum pitching angles<br />

and the angle step can be configured dynamically, typical values are +90 ◦ and −90 ◦ .<br />

2.2 Stereo cameras<br />

Stereo cameras are a system of two cameras mounted in a specific relative position. Visible light<br />

cameras are most common, but infrared can also be used [Hajebi and Zelek, 2007]. The position<br />

of features in both camera’s images are correlated resulting in a three-dimensional position for that<br />

feature.<br />

In most systems, the two cameras are mounted next to each other since knowing the relative<br />

position of the cameras accurately is crucial for the calculation of the 3D positions. Early stereo<br />

cameras include the JPL stereo system [Yakimovsky and Cunningham, 1978]. Nowadays, one can<br />

buy stereo systems off the shelf.<br />

Figure 2.1 gives an overview of the functional principle of stereo vision. A feature of an object<br />

28


2.2 Stereo cameras<br />

object<br />

r<br />

f<br />

b<br />

f<br />

d<br />

left image<br />

l<br />

d<br />

r<br />

right image<br />

Figure 2.1: FUNCTIONAL PRINCIPLE OF STEREO VISION – Blue lines are beams, black lines are<br />

distances. After [Konolige, 1999].<br />

is perceived differently by two cameras with co-planar image planes. Assuming a point feature, the<br />

feature will project to a point on the image plane. The distance to the image center is d l for the left<br />

image and d r for the right one. The distance between the two focal points is the baseline b. The<br />

distance of the feature to the baseline, the range r, is calculated by<br />

r =<br />

bf<br />

d l − d r<br />

(f is the cameras’ focal length). Consequently, stereo vision is dependent on features being in the<br />

image.<br />

In commercially available systems, the above assumptions hold: The image planes are co-planar<br />

and the cameras are identical, especially the focal lengths are the same. Furthermore, the image planes<br />

are precisely mounted next to one another such that the imagers’ rows are co-linear.<br />

In other systems, freely placed but calibrated cameras are used. These may even be more then<br />

two (e.g. in [Mulligan et al., 2002]). The approach “structure from motion” applies triangulation, on<br />

which stereo vision is built, to consecutive images from a single camera at different positions (see<br />

[Oliensis, 2000] for a survey).<br />

The stereo camera used in this work is the stereo-on-a-chip (STOC) camera from Videre Design<br />

LLC 1 [Videre-Design, 2006, Konolige and Beymer, 2006]. The cameras have a resolution of 640 ×<br />

480 which puts the upper theoretical limit of points per point cloud at 307,200. Since the range for<br />

a pixel can only be determined if it is part of a feature, the point clouds are typically considerably<br />

smaller. A distinctive feature of this camera is that the stereo computations are done in a dedicated<br />

hardware inside the camera, so that the processing load of the CPU is available for other purposes.<br />

1 http://www.videredesign.com/<br />

29


3D Sensing<br />

φ<br />

B<br />

A<br />

τ 1 τ 2 τ 3 τ 4<br />

Figure 2.2: WAVE PROPERTIES AND THEIR MEASUREMENT – Phase ϕ, amplitude A, and intensity<br />

B can be derived from four samples τ i . After [Oggier et al., 2003] and [Lange, 2000].<br />

2.3 Time-of-flight cameras<br />

Time-of-flight (TOF) cameras use an active sensing approach to give a range measurement for each<br />

pixel in the image. They typically work with near-infrared light emitted by a modulated light source.<br />

The phase of the light coming in at each pixel of the imager is compared to the current phase at the<br />

light source. This allows the phase shift to be determined and from that, the time of flight of the<br />

light. As with LRF, due to the speed of light c being constant, the time of flight is proportionate to the<br />

distance.<br />

A common approach in TOF cameras is measuring the TOF indirectly by determining the phase<br />

shift ϕ of the emitted modulated light. To this end, the received light in each pixel is sampled four<br />

times at 0°, 90°, 180°, 270°, see figure 2.2. (These numbers refer to the phase of the illumination unit.)<br />

Using these measurements τ i , phase, amplitude A, and intensity B can be determined [Lange, 2000].<br />

ϕ = atan τ 3 − τ 1<br />

τ 2 − τ 0<br />

(2.1)<br />

A =<br />

√<br />

(τ3 − τ 1 ) 2 + (τ 2 − τ 0 ) 2<br />

2<br />

(2.2)<br />

B = τ 0 + τ 1 + τ 2 + τ 3<br />

4<br />

(2.3)<br />

Because of the range of the atan function, ϕ will always lie in [0 ◦ , 360 ◦ [, even if the actual phase<br />

shift was higher. This range is called the non-ambiguity range. Its metric dimension depends on the<br />

modulation frequency of the illumination unit. At 20 MHz, it corresponds to [0 m,7.5 m[.<br />

For further details of the functional principle of time-of-flight cameras, see [B. Buettgen et al., 2005,<br />

Lange, 2000, Oggier et al., 2003].<br />

30


2.4 Sonar<br />

(a) Distance image (b) Amplitude image (c) Color image (not by SR)<br />

Figure 2.3: A simple scene<br />

2.4 Sonar<br />

While the previous three sensing methods rely on light, sonar relies on sound. Sonars are similar to<br />

time-of-flight range cameras in that they send out a signal and measure the distance to any reflectors<br />

based on the time-of-flight of the reflected signal. Differences arise because of the difference between<br />

light and sound. Light is a transverse wave, i.e. oscillating perpendicular to the direction of propagation.<br />

Sound is a longitudinal wave, i.e. oscillating in the direction of propagation. This means<br />

that sonars need phased arrays to send and receive sound only in one specific direction. Phased arrays<br />

consist of a series of combined sender/receiver units each of which works with a phase that is<br />

incrementally shifted along the array. A short introduction into multi-beam sonar can be found in<br />

[Mayer et al., 2002]. For a general introduction into sonar, refert to [Waite, 2002].<br />

The sonar used at <strong>Jacobs</strong> Robotics is an Eclipse by Tritech International. It operates with ultrasound<br />

at 240 kHz. The maximum range is variable and inversely proportionate to the data rate. With<br />

a maximum range of 100 m, the data rate is 7 Hz, with 5 m, it is 140 Hz. (This data rate does not refer<br />

to a complete 3D scan but rather to a single column.) What sets the Eclipse apart from the light-based<br />

sensors in terms of data structure is that it can receive multiple echos per beam. This means that it<br />

does not have a fixed maximum number possible data points. It also means that data has to be postprocessed<br />

to conform to algorithms assuming point clouds as delivered by light-based sensors. To this<br />

end, only the echo with highest intensity in each beam is retained.<br />

2.5 Case study: The SwissRanger time-of-flight camera<br />

2.5.1 Introduction<br />

We studied two models of the SwissRanger time-of-flight camera: the SwissRanger SR-3000 and<br />

its predecessor SR-3100. They were originally developed by the “Centre Suisse d’Electronique et<br />

de Microtechnique” (CSEM), today they are maintained and sold by the spin-off MESA Imaging 2 .<br />

There are only minor variations between the two. Evaluations of the predecessor SR-2 can be found in<br />

[May et al., 2006] and [Gut, 2004]. The sensors connect to a computer via USB. They are delivered<br />

with an API and example applications for Windows, Linux and Mac OS X. For Windows, three<br />

applications are available: a calibration program, a viewer with various adjustable parameters and a<br />

2 http://www.mesa-imaging.ch/<br />

31


3D Sensing<br />

(a) Normal image<br />

(b) SwissRanger distance image<br />

(c) Disparity image of the stereo camera<br />

(d) Picture of the point cloud of the stereo camera<br />

Figure 2.4: SAMPLE SCENE SHOT WITH DIFFERENT (3D) CAMERAS – (a) standard camera;<br />

(b) SwissRanger SR-3000 (gray-values encode the measured distance); (c) Videre STOC stereo camera<br />

disparity image; (d) corresponding point cloud<br />

sample application. For Linux, only a viewer program written in Java is available 3 .<br />

In figure 2.3, there is an example for a simple scene. From left to right you can see the grayscale<br />

encoded distance image 2.3(a), the amplitude image as delivered by the SwissRanger 2.3(b) and a<br />

color picture taken by a webcam directly next to the SwissRanger 2.3(c). The sensor generates range<br />

images as well as grayscale images. The first correspond to the measured phase shift, the second to<br />

the amplitude of the signal. For a comparison to data delivered by a stereo camera, please refer to<br />

figure 2.4.<br />

In the images presented here, measured ranges are encoded in grayscale where brighter means<br />

nearer. White pixels indicate either that the measurement could not be performed correctly, or that the<br />

pixel’s measurement was discarded, mostly because too little modulated light was received. Furthermore,<br />

the range are Cartesian, i.e. the distance to the image plane, not the distances to the focal point<br />

that are calculated as an intermediate step. In table 2.2 we present selected technical data given at the<br />

SwissRanger website and, where applicable, compare it to our own measurements.<br />

3 The choice of Java proved to be a disadvantage when running this program remotely on the robot with ssh and X forwarding.<br />

While our self-developed viewer (written in C++ using Qt) runs at a decent speed, the Java viewer is considerably<br />

slower as compared to local execution.<br />

32


2.5 Case study: The SwissRanger time-of-flight camera<br />

Parameter Unit Value Note Verification<br />

Resolution Pixel 2 √<br />

176 × 144 –<br />

√<br />

Power supply V 12 –<br />

Power consumption W 12 depends on integration time 4 3 – 11.04<br />

√<br />

Modulation frequency MHz 5 – 40 only 8 possible values<br />

Update rate Hz ≤ 50 depends on integration time 4.6 – 57<br />

2.5.2 Parameters<br />

Table 2.2: Technical Data<br />

In this section we will discuss the different parameters of the camera which can be set via the<br />

driver API. Information stem from an electronic manual [MESA Imaging AG, 2006], comments in<br />

the header file of the driver and our experiment. Generally it can be said that documentation is work<br />

in progress.<br />

The most important parameter is the the Amplitude Threshold (AT). When used with this parameter,<br />

the camera only returns range values for pixels with a certain minimal amplitude and sets the range<br />

for the filtered out pixels to zero. This parameter is useful for eliminating various kinds of errors as<br />

discussed later on. Roughly speaking, a higher AT leads to less data points but the data points are of<br />

higher quality. It is, however, rather coarse in many applications, i.e. it produces either many false<br />

positives or false negatives. An example showing the effect of setting an AT can be found in figure 2.5.<br />

Figure 2.5: A LAB SCENE AT DIFFERENT AMPLITUDE THRESHOLDS – On the left, there is the scene<br />

shot with a normal camera, next to it the effects of the Amplitude Threshold (AT) on range images<br />

with a SR-3000. The AT used are 0, 30, and 75.<br />

While AT can serve as a basic remedy in many situations, its problem is to find a suited amplitude<br />

threshold for the current conditions in a particular environment. This can also be seen in figure 2.5.<br />

There is random noise on some objects. This most prominently occurs on dark objects like the black<br />

computer screens and on the glass bricks left of center on the upper edge of the image. If the AT is<br />

chosen too high, correct data points are discarded (rightmost image). If it is chosen too low, invalid<br />

points are kept (second image from the right).<br />

For the reader to get an impression of what different AT values mean in practice, we have annotated<br />

the example pictures in the following with the value that was used acquiring the images.<br />

Integration Time<br />

Another important parameter is the integration time. It is the time used for acquiring each frame. The<br />

perceivable brightness of the illumination unit’s LEDs increases with the integration time. The inte-<br />

4 Please note that the power consumption depends on the last set integration time even if the SwissRanger is not used.<br />

33


3D Sensing<br />

gration time determines the frame rate and also the power consumption as it influences the brightness<br />

of the illumination LEDs. The quality of the images can increase with the exposure time, but also the<br />

minimal distance increases.<br />

Auto-illumination<br />

An auto-illumination feature for the device can be used to automatically determine integration time<br />

and illumination intensity. The user can supply a certain range for the integration time and the best<br />

value within the boundaries is chosen. This setting yields good results, so it should be used unless<br />

there are reasons not to do so, e.g., the need to save energy or to achieve a certain frame rate.<br />

Modulation Frequency<br />

There is a trade off between range and accuracy. With higher modulation frequencies measurements<br />

get more exact but the non-ambiguity range will decrease. While the SR-3000 supported setting a wide<br />

range of modulation frequencies (5, 6.6, 10, 19, 20, 21, and 30 MHz), the SR-3100 only supports 19,<br />

20, and 21 MHz. Configuration is only supplied for 20 MHz for both models, making the use of any<br />

other frequency inadvisable. However, especially the low values would be interesting for applications<br />

domains that involve large distances like 3D mapping. A modulation frequency of 5 MHz corresponds<br />

to a non-ambiguity range of 30 m.<br />

2.5.3 Accuracy<br />

To measure accuracy, range images were captured of a planar board 1.2 m × 2 m which was posed<br />

at 8 different distances from the SwissRanger: at 1 m, 2 m, ..., 7 m, and at 7.4 m, just short of the<br />

SwissRanger’s maximum distance of 7.5 m. At each distance, 100 samples were taken. Figure 2.6<br />

shows how the range measurements on the central 6 × 6 pixels were distributed. These pixels were<br />

chosen since they are least error prone. Also the central small section are guaranteed to lie on the<br />

board, even at greater distances when its perceived size decreases. One can see that the uncertainty<br />

increases super-linearly with the distance.<br />

2.5.4 Types of Errors<br />

When working with the SwissRanger, several kinds of errors occurred. Partially, this can be attributed<br />

to the CMOS technology being relatively recent. Each pixel contains quite some functionality (binning,<br />

phase correlation) which makes for a demanding manufacturing process. Some of the errors,<br />

however, are inherent in TOF measurement.<br />

Following [May et al., 2009], we divide the errors into random errors and systematic errors.<br />

While random errors have a null arithmetic mean when gathering enough samples, systematic errors<br />

do not.<br />

In the following, each type of error will be explained by means of an example. In these example<br />

pictures, measured distances are encoded by grayscale. Brighter means closer, darker means further<br />

away. For a white pixel, there was no useful information measured, mostly because too little light was<br />

received. Some of the range images are accompanied by a photo to better illustrate the scene. These<br />

are acquired not by the SwissRanger itself (it can only capture grayscale images), but by a camera<br />

right to the left of it, looking down a bit.<br />

34


2.5 Case study: The SwissRanger time-of-flight camera<br />

Figure 2.6: DEVIATION AT DIFFERENT DISTANCES – Boxes are quartiles, whiskers are extrema.<br />

Random Errors<br />

Inhomogeneous Lighting The main portion of the modulated light emitted by the LEDs is centered<br />

in the middle of the sensor. So when using a relatively low exposure time, the corners of the picture<br />

do not get enough light for decent measurements. An example is given in figure 2.7(a). Please note<br />

that the scale for this image is extremely short; the distance between white and black is 40 cm. When<br />

only the pixels with highest amplitude are used, i.e., AT is accordingly adjusted, the circular shape<br />

of the lighting becomes visible (figure 2.7(b)). Additionally, the peripheral lighting inhomogeneity<br />

further worsens the limitation of the SwissRangers’ small field of view (47 ◦ × 39 ◦ ).<br />

Light scattering A further problem is that objects close to the camera can make objects near them on<br />

the picture appear closer to the camera than they are; especially if the far objects are dark and the material<br />

of the near object is bright, i.e., highly reflective for near-IR light. (see figure 2.10). The problem<br />

is multi-path reflection within the camera: Light not absorbed by the imager is reflected to the back of<br />

the lens and from there back to the imager. This is also described in [MESA Imaging AG, 2006] and<br />

in [May et al., 2006].<br />

Bright background light The SwissRanger is very sensitive to ambient light conditions. Especially<br />

sunlight requires some manual tuning of AT. According to the specifications, the SwissRanger can<br />

suppress background light of up to 400 W/m 2 /µm, which is half of the possible sunlight on the surface<br />

of the earth. When this limit is exceeded, random noise will appear (see figure 2.8). Noise from<br />

external light sources (lamps) has also been described by [May et al., 2006]. The common cause is<br />

that these light sources emit a considerable amount of near-infrared light which disturbs the sensing.<br />

35


3D Sensing<br />

(a) When using a short exposure<br />

time and AT=0, there is a significant<br />

amount of noise in the corners<br />

of the range image<br />

(b) When increasing the AT to<br />

500, any information from these<br />

regions is completely discarded<br />

and a almost perfect circular<br />

shape caused by the narrow cone<br />

of the modulated illumination<br />

can be seen<br />

(c) Variance on each pixel<br />

across 100 samples with a<br />

distance of 1 m. The variance is<br />

between 5.606 × 10 −6 (black)<br />

and 3.868 × 10 −4 (white), a<br />

non-linear color-scale is used<br />

(4th root).<br />

Figure 2.7: INHOMOGENEOUS LIGHTING ON WHITE HOMOGENEOUS SURFACE – Though the Swiss-<br />

Rangers already have an extremely limited field of view, artifacts from the focused modulated light<br />

sources are apparent.<br />

(a) Photograph of a scene with some<br />

sunlight.<br />

(b) Random noise due to bright light,<br />

AT=0<br />

(c) Corrected image with AT=300<br />

Figure 2.8: SR ERRORS CAUSED BY AMBIENT LIGHT – The SwissRanger is very sensitive to ambient<br />

light conditions. When confronted with a scene including a bit of sunlight (left), a significant amount<br />

of noise occurs with AT=0 (center). Only when tuning AT, here to 300, the problem is solved (right).<br />

36


2.5 Case study: The SwissRanger time-of-flight camera<br />

(a) Distance image<br />

(b) Real image of the scene<br />

Figure 2.9: AMPLITUDE INFLUENCES RANGE MEASUREMENT – The sticker on the chair appears<br />

correctly, the main part of the chair does not.<br />

Systematic errors<br />

Range ambiguity: wrap-around There is a maximum distance which can be measured. Obstacles<br />

which are further away will appear somewhere in the range between 0 and this maximum distance.<br />

This is inherent to measuring the time-of-flight indirectly via phase shift. A shift of 10° cannot be<br />

told apart from a phase shift of 370°and both phase shifts will be associated with the same distance<br />

information. We call this the wrap-around error. See figure 2.5 for an illustration of this problem: The<br />

background wall in the top center of the image is light gray, indicating very low reported distances.<br />

There are a few possible ways around the errors caused by wrap-around of the phase shift. The<br />

simplest, using built-in camera parameters straightforwardly is to apply an amplitude threshold as<br />

mentioned before. This works quite often, as the amount of light reflected by an obstacle is proportional<br />

to its distance to the source of light. But when dealing with obstacles with different reflective<br />

properties, this fails.<br />

Another possibility is to use two modulation frequencies in an alternating way. As each modulation<br />

frequency entails a specific non-ambiguity range, the pixels of alternating frames wrap around<br />

at different distances. Thus, the errors in one frame can be evened out by comparing it to another<br />

frame taken at a different modulation frequency. While frames retrieved by this method still contain<br />

some errors and are also not totally dense, this method basically works. The downside is that the<br />

frame rate is reduced, especially as two frames have to be dropped immediately after the change of<br />

the modulation frequency. Also, for modulation frequencies other than 20 MHz, the SwissRangers<br />

are not calibrated. A further disadvantage is that the vulnerability to motion blur is increased.<br />

In the following, we will try to remedy the wrap-around error with the built-in AT. In section 2.5.5,<br />

we will propose a better solution.<br />

Amplitude related range error For pixels with a very low amplitude, the reported range will be<br />

too low, see figure 2.9. This occurs in objects with poor reflectivity for near-infrared light. Table 2.3<br />

show typical reflectivity values for common materials. As a rough orientation based on these values,<br />

wood pallets (25%) offer enough reflectivity for the SwissRanger, asphalt (17%) is critical, and tires<br />

(2%) reflect decidedly too little. A remedy to this problem has been proposed in [May et al., 2009].<br />

37


3D Sensing<br />

Table 2.3: REFLECTIVITY OF DIFFERENT MATERIALS FOR LIGHT WITH 900 NM WAVELENGTH –<br />

Reproduced from [Lange, 2000].<br />

material reflectivity material reflectivity<br />

White paper up to 100% Carbonate sand (dry) 57%<br />

Snow 94% Carbonate sand (wet) 41%<br />

White masonry 85% Rough clean wood pallet 25%<br />

Limestone, clay up to 75% Smooth concrete 24%<br />

Newspaper 69% Dry asphalt with pebbles 17%<br />

Deciduous trees typ. 60% Black rubber tire 2%<br />

Coniferous trees typ. 30%<br />

(a) Erroneous image due to bright obstacle (left quarter<br />

of the image)<br />

(b) Without the obstacle no pixel appears to close<br />

Figure 2.10: LIGHT SCATTERING – Stray light from the foreground object makes background obstacles<br />

appear closer than they are (AT 100).<br />

38


2.5 Case study: The SwissRanger time-of-flight camera<br />

Figure 2.11: A ghost image of the rod can be seen on both edges of the image (circled).<br />

Ghost image An other form of irregular distortion appears when an object is close to the camera.<br />

Then, a faint ghost image of it will often appear on the opposite of the frame. The actual object<br />

is depicted correctly though (see figure 2.11). Unfortunately, this error can only be convincingly<br />

illustrated with moving images: The ghost images will move synchronously with the actual object.<br />

Fixed pattern noise Even when calibrated correctly, the camera will show some distortions at the<br />

sides of the image when showing a plane.<br />

Movement Yet another source for distortions are moving objects (figure 2.12). When sampling<br />

them, there are always distortions at their edges, even with low exposure times. (see figure 2.12).<br />

This has already been noted in [MESA Imaging AG, 2006]. The error also occurs in slowly moving<br />

objects, albeit to a lesser degree.<br />

The cause is that the phase is measured via four samples. If some of the samples are taken while<br />

an object is present and otheres while it is not, nonsense values for will be derived for the affected<br />

pixels.<br />

Reflection The SwissRangers behaves much like a regular camera and it is hence can be mislead<br />

by reflections. When a reflective surface appears in their view, the SwissRangers will not measure<br />

the distance to where the reflection occurs, but the perceived distance to the objects visible on the<br />

reflective surface. The measured distance is then the distance from the camera to the reflective surface<br />

plus the distance from the surface to the object. An example is shown in figure 2.13.<br />

39


3D Sensing<br />

(a) Range image of a non-moving rod. (b) The rod moving down fast (about (c) The same rod moving up slowly<br />

10 m/s). The large blue and green bars (about 0.25 m/s). The errors become<br />

are movement errors.<br />

smaller but are still present.<br />

Figure 2.12: Errors caused by a moving object (AT=500).<br />

(a) Photograph of a scene with reflections<br />

due to a glossy floor.<br />

(b) Distance information with<br />

AT 500<br />

Figure 2.13: Reflections, here on a floor, cause the measurement of false distances.<br />

40


2.5 Case study: The SwissRanger time-of-flight camera<br />

(a) 60 cm (b) 55 cm (c) 45 cm<br />

Figure 2.14: An object at various too short distances to the camera, AT 0<br />

Minimum distance Objects that are too close to the camera give very great distortions (see figure<br />

2.14). Please also note that the background shade changes too, indicating that the error at the<br />

center also has an impact on the range measurement outside the distorting object. This is due to light<br />

scattering as described above.<br />

2.5.5 A novel solution to wrap-around: Adaptive Amplitude Threshold<br />

Many of the errors presented above either have a null arithmetic mean or they can be tackled with<br />

calibration [Fuchs and Hirzinger, 2008, Fuchs and May, 2007]. Also, some errors can be tackled to a<br />

degree by setting an amplitude threshold: Insufficient received light, inhomogeneous lighting, bright<br />

background light, and light scattering. Depending on the scene, amplitude thresholds between 75 and<br />

500 have successfully been used. The downside is that the number of useable pixels also decreases<br />

considerably.<br />

Furthermore, especially the wrap-around error is not easily handled thusly. As we have seen in<br />

section 2.5.4, it arises as the SwissRangers cannot detect whether a measurement is erroneous due to<br />

being beyond the maximum range. This is because it measures distance via phase shift of the light<br />

emitted from its own light source. Since phase shift is assumed to be in [0 ◦ , 360 ◦ [, a phase shift of<br />

370 ◦ is measured as 10 ◦ . Obviously, this also corrupts the distance measurement. An object beyond<br />

the maximum distance of 7.5 m (assuming a modulation frequency of 20 MHz), which is e.g. at 8 m, is<br />

reported as being at 0.5 m. This the wrap-around error is a fundamental problem of any time-of-flight<br />

measurement via phase shift.<br />

The standard solution to this problem is to set an AT, i.e., to drop pixels which are too dark. As<br />

will be shown in the experiment section, this solution is very crude and produces large portions of<br />

both false negatives and false positives.<br />

In the following section we first explain how we identify pixels with a wrapped around measurement,<br />

then we use this knowledge and the wrong measurement to extend the SwissRanger’s range.<br />

This has already been published [<strong>Poppinga</strong> and Birk, 2009].<br />

Identification of erroneous pixels<br />

The proposed approach is to sanity-check the reported distance by relating it to the amplitude. The<br />

basic idea is that a wrapped around pixel p will be reported with a low range value r p (measured from<br />

the image plane), but its amplitude a p is significantly darker than it were in case of a correct short<br />

range. Since the illumination is not uniform, the pixel’s position in the image array x p , y p is also<br />

41


3D Sensing<br />

taken into account. These two factors are used to calculate a minimum expected amplitude for each<br />

pixel:<br />

a p > aw pAAT<br />

rp<br />

2 , (2.4)<br />

where AAT is the advanced amplitude threshold and aw p is the amplitude of pixel p when viewing a<br />

white wall at roughly one meter, approximated by<br />

aw p := Â − ((x p − δx) 2 + (y p − δy) 2 ) (2.5)<br />

(Â being the amplitude constant). The approximation is reasonably close, except for the very edges<br />

of the image (see figure 2.15). An alternative would be to use a look-up table storing empirical values<br />

for each pixel. However, we chose the more memory-efficient method of approximation.<br />

The reasoning for relating distance and amplitude inversely quadratically is that the measured amplitude<br />

is based solely on light emitted by the camera’s illumination unit, hence it decreases quadratically<br />

with the distance. The second observation is that the amplitude produced by the imager decreases<br />

towards the edges of the images (see figure 2.15(a)). This is an effect which is common to time-offlight<br />

cameras due to the usage of multiple near infrared LEDs as light sources. They are grouped in a<br />

pattern around the camera’s lens. Hence, the center of the image is better illuminated than the edges.<br />

To compensate for this effect, the amplitude is also normalized.<br />

In the SR-3000, according to our measurements, the center of the amplitude pattern is not in the<br />

center of the image. Instead, it is shifted upward. Consequently, the offset in y direction is shifted,<br />

namely δy = 61. In x direction it is at half the image height, i.e., δx = 88, as one would expect.<br />

In the SR-3100, the image is brightest in the image center. In experiments calibrating the parameters<br />

 (amplitude constant) and the AAT in various scenes, we determined the values 12, 000 and 0.2<br />

respectively as working very well.<br />

Additional Error Sources<br />

Material that reflects near-IR light poorly naturally appears very dark to the SwissRanger. Objects<br />

made of such materials can be filtered out alongside with the wrapped-around pixels. As the measurement<br />

of these objects tend to be noisy and sometimes also offset, filtering the resulting pixels out is<br />

desirable.<br />

Furthermore, bright sunlight can cause areas of random noise in the SwissRanger. This also<br />

applies to indirect perception through glass bricks or when reflected by other objects. Most of these<br />

pixels are filtered out by the proposed method, except for those which by coincident have an acceptable<br />

combination of amplitude and distance.<br />

As a third case, pixels that appear to close due to light scattering are excluded. As mentioned in<br />

[CSEM, 2006], bright light from close obstacles can fail to get completely absorbed by the imager.<br />

The remaining light can get reflected back to the imager e.g. by the lens. This way, rather many pixels<br />

can get tainted, showing too close values. A large portion of these will be excluded by the presented<br />

criterion.<br />

Correction of wrong values<br />

Once the incorrect pixels according to the advanced amplitude threshold are identified, they do not<br />

have to be dropped. Since the cause of the error known and constant, the error can be undone. We call<br />

this process unwrapping.<br />

42


2.5 Case study: The SwissRanger time-of-flight camera<br />

(a) Grayscale image of a white wall as captured by<br />

the SwissRanger<br />

(b) Measured intensity and its approximation by eq. 2.5<br />

Figure 2.15: Irregular distribution of near-IR light from the camera’s illumination unit<br />

43


3D Sensing<br />

(a) Rack and door (b) Corner window (c) Gates<br />

Figure 2.16: THREE TEST SCENES COMPARING THE PROPOSED METHOD TO THE STANDARD<br />

METHOD – Normal images of the scenes from a webcam next to the SwissRanger are shown, range<br />

images are in figure 2.17.<br />

To unwrap a pixel, the corresponding beam is considered. The length of the non-ambiguity range<br />

(7.5 m at a modulation frequency of 20 MHz) is added. The new end point of the beam is considered<br />

the actual measurement. If the pixel was correctly identified as wrapped-around once, it gives the<br />

correct range.<br />

Given a spacious environment with reflective surfaces, pixels can also wrap around more than<br />

once. However, depending on the type of environment, obstacles further away than 15 m are only<br />

rarely encountered. In roomy environments it may be advisable to use a low AT in addition to the<br />

AAT to filter out multiply wrapped-around pixels which typically have a very low amplitude. Another<br />

problem with unwrapping is that pixels filtered out for other reasons (e.g. from obstacles with poor<br />

reflectivity) are misinterpreted as being far away. In practice, rather few pixels are affected. Still, it<br />

has to be considered for each application if a dense and partially incorrect image with a doubled range<br />

is prefarable, or a sparse, short ranged but almost perfectly correct one.<br />

2.5.6 Experiments and Results<br />

We applied the approach presented in the previous section to snapshots taken around our lab with the<br />

SwissRanger SR-3100. First, we only excluded wrong pixels. To compare it to the standard abilities of<br />

the device, we also used its default error exclusion technique, the AT (excluding pixels only based on<br />

amplitude, regardless of their distance). Depending on the situations, different AT work best. These,<br />

however, have to be manually chosen every time. Unfortunately, when using a widely applicable AT<br />

value, this standard technique excludes far too many pixels (60.3 % on average). As an alternative, we<br />

use a lower AT value which preserves more pixels (it only discards an average of 44.8 %). As can be<br />

expected, quite some of these are wrong. In figure 2.17, three example images are presented together<br />

with a complete distance image of the scene and a color image that was taken by a web-cam which is<br />

mounted on the robot right next to the SwissRanger. Refer to figure 2.16 for webcam images of the<br />

scenes.<br />

As mentioned earlier, not only wrapped around pixels can be filtered out. This is illustrated in<br />

figure 2.19. Here, the improvement in performance towards the conventional method is not as big as<br />

with the wrap-around, but still apparent. For a summary of all results, refer to table 2.4.<br />

The exclusion criterion (eq. 2.4) consists of two factors: Wrap around is detected by comparing<br />

the reported amplitude to inverted squared reported range on the one hand and by making the threshold<br />

dependend on the pixels position in the image on the other. Figure 2.20 shows the contributions of the<br />

two components to the overall result.<br />

The information which pixels are invalid due to wrap-around can be used to correct their value.<br />

44


2.5 Case study: The SwissRanger time-of-flight camera<br />

(a) Unfiltered distance image:<br />

wrap-around in the corner and<br />

behind the door<br />

(b) Unfiltered distance image:<br />

wrap-around in the corner,<br />

noise on the window<br />

(c) Unfiltered distance image:<br />

mostly wrapped around, noise<br />

on windows<br />

(d) Proposed method: few<br />

wrong pixels remain – 16.7 %<br />

d. p.<br />

(e) Proposed method: no<br />

wrapped-around pixels remain<br />

– 21.7 % d. p.<br />

(f) Proposed method: no wraparound,<br />

very little noise –<br />

56.3 % d. p.<br />

(g) AT 160: some wrong pixels<br />

remain, correct ones are discarded<br />

– 46.3 % d. p.<br />

(h) AT 160: some wrong pixels<br />

remain, correct ones are discarded<br />

– 43.3 % d. p.<br />

(i) AT 160: almost all valid<br />

pixels removed while some<br />

wrap-around is kept – 92.8 %<br />

d. p.<br />

(j) AT 240 – 60.3 % d. p. (k) AT 240 – 54.7 % d. p. (l) AT 240 – 2 valid pixels<br />

Figure 2.17: THREE TEST SCENES COMPARING THE PROPOSED METHOD TO THE STANDARD AT<br />

METHOD OF THE SWISSRANGER – Two AT are values presented: 240, high enough to exclude every<br />

wrapped around pixel and 160, which only almost high enough, but not as over-restrictive. For each<br />

filtered image, the fraction of discarded pixels (d. p.) is given. Webcam images in figure 2.16.<br />

45


3D Sensing<br />

(a) Close boxes (b) Dark material (c) Sun<br />

Figure 2.18: WEBCAM IMAGES FOR SCENES DEMONSTRATING THE CORRECTION OF ERRORS<br />

OTHER THAN WRAP-AROUND – The webcam is mounted directly next to the SwissRanger. The<br />

SwissRanger images can be found in figure 2.19.<br />

Table 2.4: Results on all six scenes<br />

Scene Discarded Pixels [%]<br />

Proposed method AT 160 AT 240<br />

Rack and door 16.7 46.3 60.3<br />

Corner window 21.7 43.3 54.7<br />

Gates 56.3 92.8 100<br />

Close boxes 38.1 35.0 –<br />

Dark material 13.1 28.0 43.1<br />

Sun 44.9 73.1 84.6<br />

mean 31.8 53.1 68.5<br />

median 44.8 29.9 60.3<br />

46


2.5 Case study: The SwissRanger time-of-flight camera<br />

(a) Unfiltered distance image: (b) No wrap-around, but noisy<br />

Dark obstacles in the back appear<br />

too close<br />

dark<br />

and too close trouser due to<br />

material<br />

(c) Sun causes random noise<br />

(d) Unfiltered distance image<br />

with objects removed, for comparison<br />

(e) Proposed method: dark<br />

pixels removed – 13.1 % d. p.<br />

(f) Proposed method: very little<br />

noise remains – 44.9 % d. p.<br />

(g) Proposed method: correct<br />

pixels of both back- and foreground<br />

kept – 38.1 % d. p.<br />

(h) AT 160: dark objects discarded,<br />

on edges more than in<br />

center – 28.0 % d. p.<br />

(i) AT 160: no noise, but parts<br />

of floor and locker removed –<br />

73.1 % d. p.<br />

(j) AT 160: pixels in the front<br />

preserved, those in the back<br />

gone – 35.0 % d. p.<br />

(k) AT 240 – 43.1 % d. p. (l) AT 240 – 84.6 % d. p.<br />

Figure 2.19: SWISSRANGER IMAGES FOR SCENES DEMONSTRATING THE CORRECTION OF ER-<br />

RORS OTHER THAN WRAP-AROUND – From left column to right: light scattering, too low reflectivity,<br />

and sun light. Here, an AT of 160 is sufficient, most AT 240 images are still given for completeness.<br />

We also give the percentage of discarded pixels (d.p.). Webcam images of the scenes are in figure 2.18.<br />

47


3D Sensing<br />

(a) Full image<br />

(b) Intensity image<br />

(c) Proposed error correction (d) Conventional error correction: AT 160<br />

(e) Only relating distance and amplitude: Wrap-around (f) Only using position dependent amplitude threshold:<br />

removed, but too many excluded pixels, esp. on the Not increasing strictness towards the edges (as in (d) and<br />

edges of the image<br />

(e)), but keeping wrap-around<br />

Figure 2.20: DIFFERENT ERROR EXCLUSION CRITERIA – Conventional and proposed error correction<br />

and the latter’s components, exemplified on the scene in figure 2.16(b)<br />

48


2.5 Case study: The SwissRanger time-of-flight camera<br />

(a) Webcam image<br />

(b) Distance image<br />

(c) Invalid pixels removed<br />

(d) Invalid pixels corrected<br />

Figure 2.21: An example for correcting wrapped-around pixels<br />

This is demonstrated in figure 2.21. One problem here is that some of the pixels classified as invalid<br />

are not wrapped around but excluded due to other reasons, e.g. because the surface they represent has<br />

poor reflectivity. These very few pixels hence get a wrong value in the corrected image.<br />

49


50<br />

3D Sensing


Chapter 3<br />

Datasets<br />

In this chapter, we will present 3D datasets collected with the sensors presented in chapter 2. The<br />

datasets are used in the experiments in the following chapters. They were either collected with <strong>Jacobs</strong><br />

Robotics’ mobile land robot, the “Rugbot” (short for “rugged robot”, see figure 3.1) or with standalone<br />

sensors. Apart from the 3D data, the Rugbot also collects data from its other sensors, most<br />

notably from the gyroscope and odometry from its motor encoders. However, this data is not used<br />

here.<br />

Pan-Tilt-<br />

Zoom<br />

Camera<br />

Laser<br />

Range<br />

Finder<br />

(LRF)<br />

Inclined<br />

LRF<br />

Webcam<br />

Thermo<br />

Camera<br />

Stereo<br />

Camera<br />

Swiss<br />

Ranger<br />

Figure 3.1: THE AUTONOMOUS VERSION OF A Rugbot WITH SOME IMPORTANT ON-BOARD SEN-<br />

SORS POINTED OUT – The SwissRanger SR-3000 and the stereo camera deliver 3D data.<br />

The datasets have been divided into three groups according to the sensor used: range cameras,<br />

actuated laser range-finder (ALRF), or sonar.<br />

51


Datasets<br />

(a) Planar: a box<br />

(b) Planar: a ramp<br />

(c) Round: a trashcan<br />

(d) Round: an umbrella<br />

(e) Holes: a box with a hole<br />

(f) Holes: a shelf<br />

Figure 3.2: EXAMPLE RANGE IMAGES FROM THE PLANAR/ROUND/HOLES DATASET – This dataset<br />

is characterized by hand-chosen simple scenes, mostly with one object, sometimes with several.<br />

52


3.1 Datasets with range cameras<br />

Figure 3.3: A SWISSRANGER POINT CLOUD FROM THE ARENA DATASET – On the left a perspective<br />

view of the point cloud, on the right, the point cloud has been entered into a grid to give the reader<br />

an idea of its geometry. The viewing points and directions match. The point cloud has been rotated<br />

by 33° so that the recorded wall lines up with the grid. A grid cell is only shown if it has at least 20<br />

points in it, the cell side length is 50 mm.<br />

3.1 Datasets with range cameras<br />

As seen in figure 3.1, the <strong>Jacobs</strong> Robotics “Rugbot” is equipped with a stereo camera (see section 2.2)<br />

and a SwissRanger time-of-flight camera (see section 2.3). We operated the Rugbot in different surroundings,<br />

recording data with both range cameras simultaneously. As the time to acquire each scan<br />

is very short, there is no need to stop the robot to take scans.<br />

3.1.1 Arena<br />

The <strong>Jacobs</strong> Robotics Group has a dedicated training arena for Urban Search and Rescue (USAR). In<br />

this arena, we recorded a dataset with the Rugbot. We chose scenes that pose challenges to either the<br />

SwissRanger (bad reflectivity for near-infrared light) or to the stereo camera (few visible features).<br />

The <strong>Jacobs</strong> Robotics Rescue Arena can be seen in figure 3.4. The photos show mostly the area where<br />

the dataset was recorded. One example of a point cloud can be found in figure 3.3.<br />

3.1.2 Outdoor 1-3<br />

On three occasions, we took the Rugbot out on campus. We recorded datasets consisting of the<br />

features found there, e.g. concrete and grass surfaces, small hills, building exteriors, and shrubbery.<br />

Photos of these features can be found in figure 3.5, the associated range data is depicted in figure 3.6.<br />

Outdoor 1 was recorded in relatively open surroundings, Outdoor 2 and 3 were recorded closer to<br />

buildings. This means that in Outdoor 1, a lower ratio of valid points could be achieved with the<br />

SwissRanger.<br />

53


Datasets<br />

Figure 3.4: Photos from the Arena dataset<br />

(a) Grass (b) Hill (c) Bush<br />

Figure 3.5: Photos of the different types of scenes encountered in the Outdoor 1 dataset. The photos<br />

are taken by a webcam which is mounted right next to the 3D sensors.<br />

3.1.3 Planar/Round/Holes<br />

The Planar/Round/Holes dataset was collected with a stand-alone SwissRanger. Scenes have been<br />

chosen or even set up to present one or at most a few objects with certain characteristics. These are<br />

• dominating planar surfaces,<br />

• dominating spherical, conical, or cylindrical surfaces, or<br />

• non-convex planar surfaces or planar surfaces with holes in them.<br />

Consequently, the point clouds’ bounding box is even smaller than necessitated by the SwissRanger’s<br />

small range. Representative range images can be found in figure 3.2.<br />

3.2 Datasets with actuated LRF<br />

In addition to the Rugbot used in the above experiments, we have one whose only 3D sensor is the<br />

ALRF presented in section 2.1. We used to record datasets in different settings. The pitching angle<br />

goes from −90 ◦ to +90 ◦ on all datasets, but the step size differs. While the potential maximal size<br />

of the sample depends on the step size, the actual size also depends on how many points are invalid<br />

because they were not reflected due to poor reflectivity, none at all (sky), or were excluded because<br />

they hit the actuating mechanism.<br />

54


3.2 Datasets with actuated LRF<br />

(a) Grass: SwissRanger range image.<br />

(b) Grass: Perspective view of point cloud returned<br />

by the stereo camera.<br />

(c) Hill: SwissRanger range image.<br />

(d) Hill: Perspective view of point cloud returned<br />

by the stereo camera.<br />

(e) Bush: SwissRanger range image.<br />

(f) Bush: Perspective view of point cloud returned<br />

by the stereo camera.<br />

Figure 3.6: The data returned by the SwissRanger (left) and the stereo camera (right) for the scenes<br />

from the Outdoor 1 dataset. Photos in figure 3.5.<br />

55


Datasets<br />

3.2.1 Lab<br />

The first dataset was recorded indoor: a circuit around our lab. It consists of 29 point clouds recorded<br />

with an actuated LRF. The step size of the pitching servo was 0.5 ◦ . This gives a 3D point-cloud of<br />

potentially 541 × 361 = 195, 301 points per sample. Photos of the site can be seen in figure 3.7. A<br />

range image from the ARLF can be found in figure 3.8.<br />

Figure 3.7: Photos of the robotics lab with a locomotion test arena in form of a high bay rack.<br />

3.2.2 Crashed Car Park<br />

Two datasets were recorded in the “Disaster City” USAR training site in Texas. The first of these was<br />

collected outdoors at a partially collapsed car park. The data was collected in front of the building<br />

and in the still partially accessible ground floor. It consists of 35 point clouds, also collected with the<br />

actuated LRF. The step size of the pitching servo was 0.5 ◦ resulting in a 3D point-cloud of potentially<br />

541 × 361 = 195, 301 points per sample. Photos can be found in figure 3.9.<br />

3.2.3 Dwelling<br />

The second dataset from Disaster City was recorded partially outdoors and partially indoors. The site<br />

was a house with an attached car-port used by a boat. It consists of 96 ALRF point clouds. The step<br />

size of the pitching servo was 0.5 ◦ resulting in a 3D point-cloud of potentially 541 × 91 = 49, 231<br />

points per sample.<br />

3.2.4 Hannover ’09 Hall<br />

Two further ALRF datasets were recorded at RoboCup German Open ’09 which was co-located with<br />

Hannover Fair. The RoboCup is a competition for robots divided into different leagues. In 2009,<br />

the <strong>Jacobs</strong> Robotics Group competed in the Rescue League which simulates an Urban Search and<br />

Rescue (USAR) scenario. Both datasets gathered in Hannover are characterized by not only showing<br />

common obstacles close to the ground, but also the unusually high ceiling (c. 12.5 m). The first<br />

Hannover dataset consists of 78 point clouds from a circuit of a fair hall. The step size of the pitching<br />

56


3.2 Datasets with actuated LRF<br />

Figure 3.8: ALRF RANGE IMAGE FROM THE LAB DATASET – Each 2D LRF scan corresponds to one<br />

row in the image. Since the FOV of the LRF is greater than 180°, the left and right edges of the image<br />

show parts of the environment behind the sensor upside down. The top left and right corners have no<br />

information because they correspond to beams reflected from the actuation mechanism.<br />

Figure 3.9: The Crashed Car Park in Disaster City in Texas where one of the datasets used in the<br />

experiments was recorded.<br />

57


Datasets<br />

(a) Outside view of a house with attached car-port: 35784 points<br />

(b) Entering the building: 41350 points<br />

Figure 3.10: Perspective view of two point clouds from the Dwelling dataset from Disaster City<br />

58


3.3 Dataset with sonar: Lesumsperrwerk<br />

Figure 3.11: The Hannover ’09 Hall<br />

servo was 1 ◦ resulting in a 3D point-cloud of potentially 541 × 181 = 97, 921 points per sample. A<br />

photographic overview of the site can be found in figure 3.11.<br />

3.2.5 Hannover ’09 Arena<br />

The second Hannover dataset was recorded in the RoboCup Rescue Arena while the Rugbot moved<br />

through it. The arena consists of 1.20 m×1.20 m elements that are bounded by 1.20 m walls on up to<br />

three sides while being open at the top. At roughly 5 m above ground, there was a frame above the<br />

arena for lights and cameras. It can be seen alongside the very high ceiling. The dataset consists of 79<br />

point clouds. The step size of the pitching servo was 1 ◦ resulting in a 3D point-cloud of potentially<br />

541 × 181 = 97, 921 points per sample. One point cloud is illustrated in figure 3.12.<br />

3.3 Dataset with sonar: Lesumsperrwerk<br />

The last dataset is an underwater set collected at the “Lesumsperrwerk”, a flood gate with a lock<br />

near our university (53° 9’ 36” N, 8° 38’ 49” E, http://www.lesumsperrwerk.de/). It was<br />

gathered with the Eclipse stand-alone 3D sonar (presented in section 2.4) operated from a wharf. The<br />

point clouds show the flood gates and the entrance to the lock, but not the banks or the ground of the<br />

river, nor any vessels. We operated the Eclipse at a maximum range of 120 m. Photos of the site can<br />

be found in figure 3.13. Two views of a point cloud are in figure 3.14.<br />

3.4 Summary<br />

An overview over all datasets can be found in tables 3.1 and 3.2. Table 3.1 presents statistics on<br />

the number of points in the datasets. It can be seen that the SwissRanger returns a high ratio of<br />

valid points, except when working outside in the open (Outdoor 1). The ALRF achieves a good ratio<br />

of 62-83%. The stereo delivers a low ratio of valid points. This is due to the fact that it relies on<br />

detecting features, so featureless area will be completely discarded, except for their edges. In the<br />

worst case (Outdoor 1), the STOC only returned 4.3% valid points. The remaining ∼13,000 points<br />

are not enough for most applications, especially since many of these are erroneous. We could not give<br />

a ratio for the sonar since it returns various numbers of points per beam, so there is no fixed maximum<br />

number of points.<br />

Table 3.2 shows the spatial properties of the gathered data. On can see that the stereo camera delivers<br />

extremely spread out point clouds with very low density. This is due to the type of error occurring<br />

in feature matching: imprecisely matched features smear very far away from the camera’s focal point.<br />

The SwissRanger delivers the densest points, followed by the ALRF and the sonar. In SR and ALRF,<br />

59


Datasets<br />

(a) Bottom up view showing arena, frame, and ceiling<br />

(b) Top down view of arena, frame visible on top, ceiling out of view<br />

Figure 3.12: TWO VIEWS OF A POINT CLOUD CONTAINING 78528 POINTS FROM THE HANNOVER<br />

’09 ARENA DATASET – The dataset is characterized by the unusually high ceiling (ca. 12.5 m). There<br />

is also a frame for lighting fixtures.<br />

60


3.4 Summary<br />

Figure 3.13: An overview of the Lesumsperrwerk as seen from the river’s surface.<br />

(a) Front view<br />

(b) Top view<br />

Figure 3.14: TWO VIEWS OF A SONAR POINT CLOUD FROM THE LESUMSPERRWERK DATASET –<br />

In the left and center, three blocks are visible which are between the individual gates. On the right,<br />

one can see the entrace to the lock. In the front view, the individual scan lines are apparent. In the top<br />

view, one can see multiple echos from the obstacles.<br />

61


Datasets<br />

Table 3.1: NUMBER OF POINTS IN THE DATASETS – RMSE=root mean square error, VP=valid points,<br />

∅=mean<br />

Dataset Sensor #PC Σ#points ∅#points RMSE #points ratio VP<br />

Arena SR 453 10657341 23526.139 2347.125 0.928<br />

Arena STOC 341 7380368 21643.308 7991.060 0.070<br />

Outdoor 1 SR 733 11174009 15244.214 6124.177 0.601<br />

Outdoor 1 STOC 104 3343059 32144.798 20600.686 0.105<br />

Outdoor 2 SR 238 5910801 24835.298 1787.522 0.980<br />

Outdoor 2 STOC 237 3123425 13179.008 6795.924 0.043<br />

Outdoor 3 SR 6457 160074360 24790.825 1238.833 0.978<br />

Outdoor 3 STOC 365 16165611 44289.345 30140.943 0.144<br />

PRH SR 75 1805347 24071.293 3994.622 0.950<br />

CCP ALRF 35 4216448 120469.943 24045.993 0.617<br />

Dwelling ALRF 96 3753320 39097.083 2987.606 0.794<br />

H. Arena ALRF 79 5921144 74951.190 6164.248 0.765<br />

H. Hall ALRF 78 4968246 63695.462 6047.772 0.650<br />

Lab ALRF 29 4706218 162283.379 1454.341 0.831<br />

Lesum Sonar 18 3237337 179852.056 26547.651 –<br />

one can see how the density relates to how confined the environment was. In the SwissRanger, it<br />

ranges from very confined (Arena, 2307 points/m 3 ) to open (Outdoor 1, 161 points/m 3 ). The ALRF<br />

has been used in open (Hannover Arena and Hall), semi-open (Crashed Car Park and Dwelling) and<br />

confined (Lab) environments.<br />

The ratio between median and maximum Euclidean distance from a point to the origin is a measure<br />

of how evenly distributed the points are in the area they cover. This value is highest for SwissRanger<br />

and sonar. For the SwissRanger, this can be attributed to its low range. For the sonar, on the other<br />

hand, it is a property of the dataset: all points lie in one relatively coherent cluster. Also, it does not<br />

see the floor and there are no other short range objects visible. The ratio gives a good illustration of<br />

the difference between the Hannover datasets. While both were recorded at the same venue, the arena<br />

is very confined – the nearest wall is rarely more than one meter away. Yet, the high ceiling is still<br />

visible, making the covered volume almost as high as that of the Hall dataset.<br />

62


3.4 Summary<br />

Table 3.2: SPATIAL PROPERTIES OF THE DATASETS – All averages (denoted by ∅) are means. Volume<br />

and dimensions (depth×width×height) refer to the bounding box of all points. Each dimension<br />

was averaged independently. The maximum Euclidean range is given as r, the average ratio between<br />

median and maximum Euclidean range as ∅˜r /max r.<br />

Dataset Sensor vol. [m 3 ] dens. [m −3 ] ∅ d [m] ×∅ w [m] ×∅ h [m] ∅ max r [m] ∅˜r /max r<br />

Arena SR 10.199 2.307 2.456×1.644×1.253 3.111 0.597<br />

Arena STOC 21755493.302 0.000995 141.677×99.077×300.423 337.192 0.0914<br />

Outdoor 1 SR 94.548 161 6.672×4.509×2.692 6.932 0.192<br />

Outdoor 1 STOC 45727990.220 0.000703 344.375×140.337×604.775 688.080 0.0457<br />

Outdoor 2 SR 74.545 333 5.396×4.292×2.115 5.802 0.346<br />

Outdoor 2 STOC 102076615.894 0.000129 504.941×250.386×669.230 765.292 0.0103<br />

Outdoor 3 SR 70.536 351 6.347×4.063×2.112 6.706 0.158<br />

Outdoor 3 STOC 92873676.283 0.000477 302.231×307.474×609.771 691.367 0.0369<br />

PRH SR 28.044 858 2.814×2.685×2.000 4.114 0.572<br />

CCP ALRF 9940.652 12.1 28.799×37.422×8.705 22.780 0.0703<br />

Dwelling ALRF 1100.680 35.5 10.822×14.029×4.491 11.962 0.139<br />

H. Arena ALRF 22026.424 3.4 33.635×42.183×14.878 28.698 0.039<br />

H. Hall ALRF 23237.480 2.74 38.320×45.818×13.163 29.114 0.102<br />

Lab ALRF 343.592 472 7.835×7.277×5.450 8.178 0.111<br />

Lesum Sonar 1061865.304 0.169 175.580×103.155×58.643 117.874 0.661<br />

63


64<br />

Datasets


Chapter 4<br />

Near Field 3D Navigation with the Hough<br />

Transform<br />

In this chapter, we present a solution to obstacle avoidance. The term designates reactive behavior<br />

that does not use internal state like a map. Obstacle avoidance is useful both to assist a human robot<br />

operator and as a low-level component of full autonomy. In autonomous operation, the speed of<br />

obstacle recognition and avoidance limits the speed the robot can drive without risking collisions or<br />

getting stuck.<br />

In environments for humans, a robot can assume a flat floor and obstacles that are mostly detectable<br />

with a 2D range sensor. In unstructured environments it must be prepared not only for obstacles<br />

out of scope of the 2D sensors, the ground it is facing might not even be drivable.<br />

As outlined in chapter 2, with the STOC stereo camera and the SwissRanger time-of-flight camera,<br />

two high-frequency 3D sensors are available. In the following, we will propose an algorithm that<br />

allows a robot to make use of the potential that these sensor’s high speeds offer: High speed 3D data<br />

based obstacle avoidance.<br />

4.1 Approach and Implementation<br />

4.1.1 The Hough transform<br />

The Hough transform is a feature detection heuristic, originally introduced for lines. It has been<br />

popularized and generalized by [Ballard, 1981] to all parameterized shapes like circles, squares and<br />

the like. Though it is typically used for 2D images, it can be extended to any dimension due to its<br />

general nature. In this article, it is used to detect infinite planes in a 3D point cloud returned by 3D<br />

range sensors.<br />

Generally, the Hough transform can be used to detect shapes in points clouds or raster data. Ideally,<br />

these shapes are described by as few parameters as possible. Originally, 2D lines were used,<br />

which can be defined by their angle with the x-axis and the distance to the origin. A point in the<br />

parameter space then represents one of the shapes being searched for. Each point in the input data set<br />

then can be used as a vote for all parameter combinations of the shapes that pass through it.<br />

The parameter space is discretized by dividing it into equally sized bins. For each point p in<br />

the input dataset, counts in the bins corresponding to parameters of shapes passing through p are<br />

incremented. In the case of the presence of a particular shape in the input data, the bin corresponding<br />

to its parameters has a high count or so-called hits of input points voting for it.<br />

65


Near Field 3D Navigation with the Hough Transform<br />

In the following, the general form of the algorithm for the detection of a geometric shape with m<br />

parameters in n-dimensional space is shortly recapitulated. The input data is given as a point cloud<br />

P C, i.e., a set of points in n-dimensional Cartesian space. The general Hough transform as illustrated<br />

in algorithm 1 iterates over all points p in P C and increments for each all bins that belong to parameter<br />

values of shapes passing through p. In doing so, the parameter v m is quantized, i.e., it is calculated<br />

based on the values for all the other m − 1 parameters and a point from the input data.<br />

Algorithm 1 GENERAL FORM OF THE HOUGH TRANSFORM ALGORITHM – Searching for an m-<br />

parameter shape in an n-dimensional point cloud P C. The m-dimensional parameter space is discretized<br />

as array P S.<br />

for all point p ∈ P C do<br />

for all values v 1 for parameter P 1 do<br />

.<br />

for all values v m−1 for parameter P m−1 do<br />

v m ← calculateV m (p, v 1 , . . . , v m−1 )<br />

P S[v 1 ] . . . [v m ] + +<br />

end for<br />

.<br />

end for<br />

end for<br />

4.1.2 Plane Parameterization<br />

Algorithm 2 HOUGH TRANSFORM APPLIED FOR PLANE DETECTION – Searching for a plane with<br />

its three parameters in a 3D point cloud P C. The 3D parameter space is discretized as array P S.<br />

for all point p ∈ P C do<br />

for all angles ρ x do<br />

for all angles ρ y do<br />

n ← (−sin(ρ x )cos(ρ y ), −cos(ρ x )sin(ρ y ), cos(ρ x )cos(ρ y )) T<br />

d ← np<br />

P S[ρ x ][ρ y ][d] + +<br />

end for<br />

end for<br />

end for<br />

Here the shapes for the Hough transform are planes. The axes are chosen as shown in figure 4.1.<br />

The planes are characterized by the following three parameters:<br />

Two angles: As shown in figure 4.1, the first angle ρ x is the angle of intersection of the given plane<br />

with the xz-plane with the x-axis. The second angle ρ y is the angle of intersection of the given<br />

plane with the yz-plane with the y-axis.<br />

Signed distance to the origin: The signed distance d has the same absolute value as the distance and<br />

the sign is the same as that of the z-axis intercept of the plane<br />

66


4.1 Approach and Implementation<br />

z<br />

Up<br />

Front<br />

x<br />

ρx<br />

O<br />

ρ y<br />

y<br />

Figure 4.1: THE DEFINITION FOR THE ANGLES ρ x AND ρ y – Figure taken from<br />

[<strong>Poppinga</strong> et al., 2008a].<br />

So, given a point from the point cloud returned by one of the sensors, and two angles, it is possible<br />

to compute the signed distance to the origin. In algorithm 2, the distance is used accordingly as the<br />

quantized parameter. In other words, the signed distance d corresponds to the quantized parameter<br />

v m in algorithm 1. The parameterization produces a singularity: when one angle is 90°, the other one<br />

is undefined. As we will see below, this is not a problem in our application domain, since we do not<br />

need to consider angles that high.<br />

There are other ways to parameterize a plane, for example the normal vector and the distance to<br />

the origin or a unit quaternion. We chose the parameterization with two angles and the signed distance<br />

because it has as many parameters as degrees of freedom. Consequently, all parameters can be chosen<br />

independently of one another within their respective ranges. We also found this parameterization to<br />

be most intuitive, especially when one ρ D = 0 (with D ∈ {x, y}).<br />

Ground classification<br />

The basic idea for the ground classification is simple: important classes of terrain are characterized<br />

by one plane. If the robot is facing no obstacles and an even terrain, a plane representing the ground<br />

should be detectable. This ground plane has parameters that correspond to the pose of the sensor<br />

relative to the floor. The corresponding bin of the parameter space should have significantly more hits<br />

than any other bin.<br />

Similarly, if the robot faces a ramp, an elevated plateau, or if it is standing at an edge to lower<br />

ground, the corresponding planes should be easily detectable. If no single plane is detected, it can<br />

be presumed that the robot is facing a kind of non-drivable terrain. One option is to detect walls and<br />

other obstacles perpendicular to the floor by looking at the according bins. An easier alternative used<br />

here is to exclude the corresponding areas of the parameter space. So if points of the input data lie<br />

on a wall or other obstacles perpendicular to the floor, their votes are not stored as the corresponding<br />

bins are not existent. The situation is then the same as for other non-drivable areas: no plane bin has<br />

extra-ordinarily many hits.<br />

In the following, five classes of terrain are used: floor, ramp, canyon, plateau, and obstacle.<br />

For all except the last one, there is one characterizing plane. The ”floor” is flat horizontal ground<br />

at the level of the robot. The ”ramp” is inclined terrain in front of the robot, i.e., when traversing<br />

it only the robot’s pitch changes passively. The ”canyon” and the ”plateau” are even ground below<br />

and respectively above the current position of the robot. The classes, especially ramp, canyon and<br />

plateau can be further subdivided to take the physical capabilities of the robot into account. For our<br />

Rugbot, plateaus and canyons with a step of more than 0.2m and ramps with a combined angle of<br />

67


Near Field 3D Navigation with the Hough Transform<br />

more than 35 ◦ are considered to be not passable. Other ramps, plateaus, and canyons are deemed to<br />

be passable. There is of course the option to have a more refined distinction to adapt for example the<br />

robot’s driving speed or to compute a cost function for path planning.<br />

4.1.3 Processing of the Hough Space for Classification<br />

Algorithm 3 THE GROUND CLASSIFICATION ALGORITHM – It uses three simple criteria organized<br />

in a decision tree like manner. #S is the cardinality of S, bin max is the bin with most hits.<br />

if #bin floor > t h · #PC then<br />

return floor<br />

else<br />

if (#{bin | #bin > t m · #bin max } < t n ) and (#bin max > t p · #P C) then<br />

return type( bin max ) ∈ {floor, plateau, canyon, ramp}<br />

else<br />

return obstacle<br />

end if<br />

end if<br />

The computation of the classification is based on three simple components. The main criterion<br />

is which bin has the most hits. Furthermore, a threshold is used to detect obstacles, i.e., the case<br />

when no bin has significantly more hits than the others. There are two thresholds t h and t p for this<br />

purpose to accommodate different terrain types. As the number of points in the input data can vary,<br />

the thresholds are relative to the cardinality of the processed point cloud. For perfect input data, these<br />

two criteria are sufficient. But as the snapshots from the sensors tend to be very noisy a third heuristic<br />

is used.<br />

This heuristic tries to take the shape of the distribution of the hits into account. It estimates how<br />

peaked the distribution is by determining the cardinality of {bin | #bin > t m · #bin max }, i.e., the set<br />

of bins with more hits than a certain fraction t m of the number of hits of the top bin. This cardinality<br />

is then compared to a threshold t n . The criteria are arranged in a decision tree like manner as shown<br />

in algorithm 3.<br />

The four parameters t h , t p , t m , and t n seem to be quite uncritical as discussed in more detail in<br />

the section presenting experimental results. They have been determined using a few test cases and<br />

performed very well on several thousand input snapshots from four large datasets recorded under<br />

various environment conditions. The used values are: t m = 2 /3, t h = 1 /3, t n = 7, and t p = 1 /5.<br />

The parameters of the discretization of the Hough space are minimum, maximum and resolution<br />

on the three axes. The range on the x-axis (forward) is set to [ -45°, 18°] (positive values running<br />

downhill), since the sensors only detect very few points on ramps that have a stronger downward<br />

inclination than roughly 18°. The range of the y-axis is set to [ -45°, 45°]. A ramp with say 30° roll<br />

might not be drivable from the position it was detected from, yet it can be an indication for the robot to<br />

reposition itself to tackle the ramp. The range for the distance dimension is [ -1m, 2m]. The distance<br />

resolution is set to 0.1m. All these parameters were chosen based on the capabilities and properties of<br />

our robot.<br />

The angular resolution of the discretization is the most interesting of the Hough space parameters.<br />

This parameter influences the run time as well as the potential accuracy of the algorithm. Different<br />

values were hence used for the detailed performance analysis presented in section 4.2.2.<br />

68


4.2 Experiments and results<br />

Table 4.1: Stereo and TOF point clouds in the four datasets used in the experiments.<br />

dataset description point-clouds (PC) aver. points per PC<br />

stereo camera<br />

Arena inside, rescue arena 408 5058<br />

Outdoor 1 outside, university campus 318 71744<br />

Outdoor 3 outside, university campus 414 39762<br />

TOF<br />

Arena inside, rescue arena 449 23515<br />

Outdoor 1 outside, university campus 470 16725<br />

Outdoor 2 outside, university campus 203 25171<br />

Outdoor 3 outside, university campus 5461 24790<br />

4.2 Experiments and results<br />

The approach was intensively tested with the Arena and Outdoor 1-3 real world datasets presented in<br />

chapter 3. They contain more than 7,500 snapshots of range data gathered by the SwissRanger and<br />

the STOC stereo camera (chapter 2) in a large variety of real world situations. The data includes indoor<br />

and outdoor scenarios, different lighting conditions, open and cluttered environments and other<br />

challenges. Due to limitations in the de-serialization, the datasets were involuntarily randomly subsampled<br />

to c. 85% of the point clouds, hence the different number of points in table 4.1 and in table 3.1<br />

on page 62. Also, the first and last point cloud in every scene were dropped to exclude potentially<br />

erroneous data.<br />

Note the large variation in the mean number of points per point cloud in the different datasets,<br />

especially for the stereo camera. The stereo disparity computation relies heavily on the texture and<br />

feature information in the picture. Outdoor photographs can be particularly rich in these features.<br />

Refer for example to figures 3.6 (page 55) and 4.2. However, the ground in outdoor scenes and walls<br />

in indoor scenes are often featureless and consequently the related data is often missing in stereo<br />

snapshots as illustrated in figure 4.2.<br />

Both the stereo camera and the SwissRanger indicate points in their range images where information<br />

is missing. A preprocessing step is hence used, which estimates whether there are sufficiently<br />

many meaningful data-points in a snapshot. Table 4.2 shows the amount of ruled out snapshots due<br />

to insufficient data. The stereo camera data from the Outdoor 2 dataset was excluded completely due<br />

to its very low ratio of valid points (0.043) and high number of outliers which are typically erroneous<br />

points (∅˜r /max r=0.0103), cf. tables 3.1 and 3.2, pages 62 and 63. The total number of snapshots<br />

used as actual input for classification is still about 6,800. Please note that despite the high amount<br />

of excluded data, the stereo camera is useful as a supplementary sensor to the SwissRanger, which<br />

tends to fail in scenarios with direct exposure to very bright sunlight, which does not affect the stereo<br />

camera.<br />

The four parameters t m = 2 /3, t h = 1 /3, t n = 7, and t p = 1 /5 were determined once based on the analysis<br />

of a few example snapshots considered to be typical. The parameters were not altered during the<br />

experiments. In the following subsection some examples of Hough spaces are presented to illustrate<br />

the working principles of the classification.<br />

69


Near Field 3D Navigation with the Hough Transform<br />

(a) The webcam image of the scene.<br />

(b) 3D point cloud returned by the stereo camera. The<br />

ground does not show enough features, so it does not have<br />

depth information.<br />

Figure 4.2: An example scene where the stereo camera delivers few data points; especially ground<br />

information is missing.<br />

Table 4.2: PERCENTAGES OF SNAPSHOTS EXCLUDED FROM HOUGH TRANSFORM VIA PREPRO-<br />

CESSING – Especially the stereo camera data suffers if there are few features in a scene (figure 4.2),<br />

but it can complement the SwissRanger SR-3000, which can fail in different environment situations,<br />

e.g., in scenes with direct exposure to very bright sunlight.<br />

dataset point-clouds excluded data<br />

stereo camera<br />

Arena 408 92%<br />

Outdoor 1 318 75%<br />

Outdoor 3 414 70%<br />

TOF<br />

Arena 449 1%<br />

Outdoor 1 470 2%<br />

Outdoor 2 203 2%<br />

Outdoor 3 5461 0%<br />

70


4.2 Experiments and results<br />

4.2.1 Hough space examples<br />

The real world data is obviously subject to noise and additionally, planes in the world can lie in an<br />

area in the parameter space just on the boundary between two bins. It can hence not be expected that<br />

even a perfect plane will produce hits in a single bin only. In this subsection, the working principles<br />

of our approach are illustrated by discussing a few typical examples. This is followed by a global<br />

performance analysis based on the 6,800 snapshots in the next subsection 4.2.2.<br />

For the discussion in this subsection, the parameter space of the Hough transform of a snapshot is<br />

depicted as a two-dimensional histogram In figure 4.3, there are three histograms depicting the bins<br />

of the parameter space for more or less typical snapshots. The origin of the histogram is in the top<br />

left corner, the down-pointing axis contains the distances, the right-pointing axis contains both ρ x and<br />

ρ y . This is accomplished by first fixing ρ y and iterating ρ x , then increasing ρ y and iterating ρ x again<br />

and so on. This is depicted graphically in sub-figure 4.3(a). The bin which corresponds to the floor is<br />

indicated by a yellow frame.<br />

The magnitude of the bins is represented by shade, where white corresponds to all magnitudes<br />

above the ration t m of the maximum magnitude, t m = 2 /3. All the other shades are scaled uniformly<br />

and thus the different histograms are better comparable in contrast to a scheme where just the top bin<br />

is white. The following discussion motivates why a threshold on only the absolute number of hits is<br />

not sufficient and the shapes of the distribution has to be also taken into account.<br />

In the floor histograms, a single bin sticks out, even with some slightly obstructed ground. But it<br />

can be noticed that the histogram of the ramp differs from the floor-histograms. First of all, the bins<br />

with many hits are concentrated on the right side of the histogram. This is obviously the case because<br />

planes with a greater inclination get more hits, proportionally to their similarity to the actual ramp.<br />

The second difference is that the strip of significantly filled bins is lower. This is because the ramp is<br />

at a greater distance to the camera than the floor.<br />

In the histogram of the random step field, the bins with a relatively high magnitude are most<br />

equally distributed among all the three histograms. In the ramp-histogram, there all also a lot of bins<br />

represented in white, but this is because the assignment of shade to magnitude is done in an absolute<br />

way. So in the ramp-histogram, bins with relatively medium magnitude are painted in white. In the<br />

random-step-field-histogram, there are 14 white bins, in the ramp-histogram there are 35.<br />

Common to all diagrams is that the areas with the higher magnitudes are distributed in a diagonal<br />

shape. This is caused by the following. Suppose there is just a lot of points from the floor. The planes<br />

with most points in them are very similar to the floor. For a plane which is not very similar to the floor<br />

to encompass many points, the best way is to intersect the floor plane in an angle as small as possible.<br />

With growing distance to the floor, the angle has to become greater in order for the plane to intersect<br />

the floor plane in the area of the data points. In the region of the parameter space where the angles are<br />

just big enough, the values are the highest for every distance.<br />

For the ramp, the parameter space is a warped version of the one for the floor, because the region<br />

of the data points is smaller, so there are fewer possibilities to intersect with it. This is also the reason<br />

why the region of non-zero bins is so small in this parameter space. Furthermore, the area of the<br />

highest overall magnitude is higher (on the ρ x -axis), due to the ramp being tilted.<br />

4.2.2 General Performance Analysis<br />

The results presented in this subsection are based on those approximately 6,800 snapshots from the<br />

four datasets recorded in different environment conditions which had meaningful range data. The data<br />

was labeled by a human to provide ground truth references to measure the accuracy of the classifica-<br />

71


Near Field 3D Navigation with the Hough Transform<br />

(a) Layout of the bins in the depictions of parameter spaces below<br />

(b) Plain floor<br />

(c) Slightly obstructed floor<br />

(d) Ramp<br />

(e) Random step field<br />

Figure 4.3: TWO-DIMENSIONAL DEPICTIONS OF THE THREE DIMENSIONAL PARAMETER<br />

(HOUGH-)SPACE FOR SEVERAL EXAMPLE SNAPSHOTS – Distances are on the y-axis, angles are<br />

on the x-axis, where ρ y iterates just once and ρ x iterates repeatedly. Hits in the bins are represented<br />

by gray-scale, the darker the less hits.<br />

Table 4.3: Success rates and computation times for drivability detection.<br />

dataset success rate false negative false positive time [ms]<br />

stereo camera<br />

Arena 1.000 0.000 0.000 4<br />

Outdoor 1 0.987 0.00 0.013 53<br />

Outdoor 3 0.977 0.016 0.007 29<br />

TOF<br />

Arena 0.831 0.169 0.000 11<br />

Outdoor 1 1.000 0.000 0.000 8<br />

Outdoor 2 1.000 0.000 0.000 12<br />

Outdoor 3 0.830 0.031 0.139 12<br />

72


4.2 Experiments and results<br />

55<br />

50<br />

Outdoor 1/STOC<br />

time [ms]<br />

45<br />

40<br />

35<br />

30<br />

25<br />

20<br />

Outdoor 3/STOC<br />

15<br />

10<br />

Arena/SR<br />

Outdoor 2/SR<br />

Outdoor 3/SR<br />

Outdoor 1/SR<br />

5<br />

Arena/STOC<br />

0<br />

0 10000 20000 30000 40000 50000 60000 70000 80000<br />

# points<br />

Figure 4.4: MEAN PROCESSING TIME AND CARDINALITY OF POINT CLOUDS – The mean time per<br />

classification directly depends on the mean number of 3D points per snapshot in each dataset.<br />

73


Near Field 3D Navigation with the Hough Transform<br />

Table 4.4: Human generated ground truth labels for the stereo camera data of the different scenes<br />

dataset scene description human # of aver. # points<br />

set label PC per PC<br />

Arena 1 lab floor with black plastic floor 32 5058<br />

2 bush1 very near obstacle 30 22151<br />

3 bush2 very near obstacle 1 71646<br />

4 lawn floor 2 11367<br />

Outdoor 1 5 hill with grass ramp 47 107173<br />

6 tree1 very near obstacle 1 15267<br />

7 grass, background sky floor 1 32686<br />

8 tree2 very near obstacle 30 27139<br />

9 tree3 very near obstacle 41 25141<br />

10 railing very near obstacle 2 77342<br />

Outdoor 3 11 concrete slope ramp 27 10770<br />

12 wall very near obstacle 23 113368<br />

tion. As it would be very tedious to label the several thousand snapshots one by one, the four datasets<br />

were broken down into ”scene sets”. Each scene set consists of some larger sequence of snapshots<br />

from one sensor for which the same label can be applied. The properties of the different scenes are<br />

discussed later on in the context of fine grain classification.<br />

The first and foremost result is with respect to coarse classification, namely the binary distinction<br />

between drivable and non-drivable terrain. For this purpose an angular resolution of 45° is used,<br />

i. e. only two bins per axis. As shown in table 4.3, the approach can robustly detect drivable ground<br />

in a very fast manner. The success rates of classifying the input data correctly, i.e., of assigning<br />

the same label as in the human generated ground truth data, range between 83% and 100%. Cases<br />

where the algorithm classifies terrain as non-drivable in opposite to the ground truth data label are<br />

counted as false negatives; when drivability is detected though the ground truth data label points to<br />

non-drivability, a false positive is counted.<br />

As mentioned before, the stereo camera has the drawback that it does not deliver data in featureless<br />

environments, but it allows an almost perfect classification ranging between 98% and 100%. The<br />

SwissRanger has a very low percentage of excluded data (see table 4.2), but it behaves poorly in<br />

strongly sunlit situations. In these cases, the snapshot is significantly distorted but not automatically<br />

recognizable as such. Such situations occurred during the recording of the outdoor data of the Arena<br />

and Outdoor 3 datasets causing the lower success rates for the SwissRanger in these two cases. The<br />

two sensors can hence supplement each other very well.<br />

The processing can be done very fast, namely in the order of 5 to 50 ms. Please note that these<br />

run times are for the full approach, i.e., the run times include the preprocessing (which is always<br />

successful for this data) as well as the computation of the Hough transformation and the execution of<br />

the decision tree. As illustrated in figure 4.4, the variations in processing time are due to the variations<br />

in the number of points per snapshot.<br />

Any standard 2D obstacle sensor would have done significantly worse in the test scenarios. It<br />

would especially have failed to detect perpendicular obstacles as well as rubble on the floor. The approach<br />

can hence serve as serious alternative to 2D obstacle detection for motion control and mapping<br />

in real world environments.<br />

The next question is of course to what extent the approach is capable of more fine grain classifi-<br />

74


4.2 Experiments and results<br />

Table 4.5: Human generated ground truth labels for the SwissRanger data of the different scenes<br />

dataset scene description human # of aver. # points<br />

set label PC per PC<br />

13 lab floor, background barrels floor 55 23484<br />

14 lab floor with black plastic floor 33 23256<br />

Arena 15 boxes very near obstacle 43 25344<br />

16 lab floor, dark cave floor 75 25344<br />

17 wooden ramp with plastic cover ramp 113 19691<br />

18 red-cloth hanging down obstacle 124 25344<br />

19 bush1 very near obstacle 32 18121<br />

20 bush2 very near obstacle 49 24604<br />

21 car very near obstacle 44 17396<br />

Outdoor 1 22 concrete ground1 floor 68 11372<br />

23 lawn floor 70 13812<br />

24 hill with grass and earth ramp 59 7235<br />

25 concrete with rubble obstacle 90 20375<br />

26 tree1 very near obstacle 50 23512<br />

27 concrete ground2 floor 28 24726<br />

Outdoor 2 28 grass, background far wall floor 35 25342<br />

29 mix of grass and concrete floor 50 25004<br />

30 grass, background close wall floor 86 25344<br />

31 grass, background sky floor 13 25343<br />

32 grass, background building1 floor 233 25343<br />

33 grass, background building2 floor 892 25250<br />

34 stone ramp ramp 150 25287<br />

35 hill1 ramp 169 19853<br />

Outdoor 3 36 tree2 very near obstacle 728 25344<br />

37 tree3 very near obstacle 904 22625<br />

38 railing very near obstacle 616 25344<br />

39 grass, background building3 floor 538 25343<br />

40 concrete slope ramp 548 25202<br />

41 grass hill ramp 987 25343<br />

42 wall very near obstacle 655 25344<br />

75


Near Field 3D Navigation with the Hough Transform<br />

Table 4.6: CLASSIFICATION RATES AND RUN TIMES FOR STEREO CAMERA DATA PROCESSED AT<br />

DIFFERENT ANGULAR RESOLUTIONS OF THE HOUGH SPACE – Though drivability can be robustly<br />

detected with stereo, finer classification performs rather badly with this sensor.<br />

45° 15° 9°<br />

dataset scene human class. time class. time class. time<br />

set label rate [ms] rate [ms] rate [ms]<br />

Arena 1 floor 1.00 4 0.00 20 0.00 51<br />

2 obstacle 1.00 16 1.00 92 1.00 232<br />

3 obstacle 1.00 60 1.00 320 1.00 850<br />

4 floor 1.00 15 1.00 60 1.00 160<br />

Outdoor 1 5 ramp 0.00 78 0.00 431 0.00 1089<br />

6 obstacle 0.00 10 0.00 50 0.00 130<br />

7 floor 0.00 1 0.00 30 0.00 70<br />

8 obstacle 1.00 22 1.00 109 1.00 279<br />

9 obstacle 1.00 19 1.00 100 1.00 255<br />

10 obstacle 0.00 35 0.50 180 0.50 455<br />

Outdoor 3 11 ramp 0.00 8 0.00 42 0.00 109<br />

12 obstacle 1.00 84 1.00 457 1.00 1163<br />

mean 0.58 29 0.54 158 0.54 404<br />

cation of terrain types. Tables 4.4 and 4.5 show the different scene sets and their general properties.<br />

Plateaus and canyons onto which our robot can drive, like a curb between a lawn patch and a concrete<br />

ground, had to be included in the label ”floor” to make the manual labeling of the data feasible.<br />

It can be expected that the angular resolution of the Hough space discretization has an influence<br />

on the finer classification. Hence, experiments with 9°, 15°, and 45° are conducted. Table 4.6 shows<br />

the results for the stereo camera data. It can be noticed that the success rates of 54% to 58% are rather<br />

poor, especially when compared to drivability detection where 98% to 100% are achieved. The main<br />

reason seems to be the high rate of noise in the stereo data. This is also supported by the fact that the<br />

classification rates are hardly influenced by the angular resolution of the discretization of the Hough<br />

space.<br />

The situation is a bit different for more fine grain classification with the SwissRanger, which<br />

provides less noisy data. As shown in table 4.7, a higher angular resolution allows for this sensor at<br />

least some more fine grain classification with success rates of up to 70%. But the limitations in more<br />

fine grain classification seem also for the SwissRanger to lie in the noise level of the data.<br />

It is expected that the high quality data from a 3D laser scanner would allow a very fine grain<br />

classification with the presented approach. Pursuing experiments are left for future work. This type<br />

of classification could for example be used for detailed semantic map annotation. But data acquisition<br />

with a 3D laser scanner is very slow, namely in the order of several seconds per single snapshot. This<br />

means that the robot has to be stopped and that the updates rates are very low.<br />

A stereo camera or a SwissRanger in contrast allow very fast data acquisition on a moving robot.<br />

The detection of whether the robot can drive over the terrain covered by the sensor is extremely<br />

fast and very robust with the presented approach. It is hence an interesting alternative for obstacle<br />

detection in non-trivial environments, which can be used for reactive locomotion control as well as<br />

mapping.<br />

76


4.2 Experiments and results<br />

Table 4.7: CLASSIFICATION RATES AND RUN TIMES FOR SWISSRANGER DATA PROCESSED AT<br />

DIFFERENT ANGULAR RESOLUTIONS OF THE HOUGH SPACE – Unlike with the stereo camera, a<br />

higher angular resolution improves finer classification.<br />

45° 15° 9°<br />

dataset scene human class. time class. time class. time<br />

set label rate [ms] rate [ms] rate [ms]<br />

13 floor 0.98 12 0.96 59 0.96 148<br />

14 floor 1.00 12 0.97 60 0.97 148<br />

Arena 15 obstacle 0.79 10 1.00 64 1.00 160<br />

16 floor 0.00 13 0.00 64 0.00 161<br />

17 ramp 0.00 10 0.00 49 0.30 124<br />

18 obstacle 0.00 12 1.00 64 1.00 160<br />

19 obstacle 0.00 9 0.00 44 0.00 113<br />

20 obstacle 0.00 12 0.00 61 0.24 155<br />

21 obstacle 0.00 8 0.00 44 0.50 109<br />

Outdoor 1 22 floor 1.00 5 1.00 29 1.00 72<br />

23 floor 1.00 8 1.00 34 1.00 88<br />

24 ramp 0.00 3 0.22 18 0.00 45<br />

25 obstacle 1.00 10 1.00 51 1.00 128<br />

26 obstacle 1.00 11 1.00 60 1.00 146<br />

27 floor 1.00 13 1.00 64 1.00 160<br />

Outdoor 2 28 floor 1.00 13 1.00 65 1.00 158<br />

29 floor 1.00 13 1.00 65 0.98 162<br />

30 floor 1.00 11 1.00 63 1.00 160<br />

31 floor 1.00 10 1.00 62 1.00 158<br />

32 floor 1.00 12 0.00 64 0.00 160<br />

33 floor 0.00 12 0.00 64 0.00 160<br />

34 ramp 0.00 14 0.00 64 1.00 160<br />

35 ramp 0.00 10 0.00 51 0.00 128<br />

Outdoor 3 36 obstacle 0.00 12 1.00 63 1.00 160<br />

37 obstacle 0.00 11 1.00 57 1.00 143<br />

38 obstacle 0.00 13 1.00 64 1.00 169<br />

39 floor 1.00 13 0.00 65 0.00 163<br />

40 ramp 0.00 12 0.00 66 0.00 160<br />

41 ramp 0.00 12 0.00 61 0.00 161<br />

42 obstacle 1.00 12 1.00 63 1.00 157<br />

mean 0.55 12 0.64 63 0.70 158<br />

77


78<br />

Near Field 3D Navigation with the Hough Transform


Chapter 5<br />

Patch Map Data-Structure<br />

This chapter describes the 3D mapping data structure conceived for this work. It gives an overview of<br />

its different variants implemented for this thesis. They differ in the used algorithm and hence in their<br />

capabilities.<br />

5.1 Mapping with planar patches<br />

Planar patches are two-dimensional polygons (possibly with holes) on infinite planes. They result<br />

from applying a plane extraction algorithm on a 3D point cloud. In this work, [<strong>Poppinga</strong> et al., 2008b]<br />

is used, but any algorithm returning an set of planes, each with a set of associated points (e.g.<br />

[Okada et al., 2001]) will work.<br />

The patches’ polygons’ are defined by their 2D vertices V =v 0 , . . . , v n , by the 3D position of the<br />

plane they lie on and by the position and orientation of a 2D coordinate system on that plane. We<br />

define the latter two by an affine transform T consisting of a rotational and a translational part. It is<br />

applied to the two-dimensional polygons assumed to be in the xy-plane. Using a complete transform<br />

looks redundant at first glance since a plane can be transformed into any other plane in infinitely many<br />

ways (it is invariant with respect to rotation around the normal and to translation within itself). One<br />

potential alternative would be to use a standard plane definition (e.g. plane normal n and distance to<br />

the origin d ) and use a canonical 2D coordinate system on the planes. This would reduce the amount<br />

of data that needs to be stored. However, this brings about a number of problems: Polygons do not<br />

necessarily stay in their initial position. Typically they are in sensor coordinates initially. They might<br />

be transformed to local robot coordinates with a transform T s→l and then again to global coordinates<br />

with a transform T l→g . With a canonical 2D coordinate system, each vertex in V would have to be<br />

transformed whenever a planar patch is transform within its plane. It is quite likely that this is part<br />

of either T s→l or T l→g . Also, there are necessarily discontinuities when defining the orientation of<br />

the 2D coordinate system. (As an example, one 2D axis might be defined as the cross product of<br />

n and a fixed base vector b, e.g. one of the 3D axes. When n is similar to b, the 2D axis is not<br />

properly defined.) This would mean in practice that the vertices would have to be modified in each<br />

transform. Using a full transform, the polygon can stay the same and only the plane transform needs<br />

to be modified.<br />

The two approaches to defining a 2D coordinate system on the plane are equivalent with respect to<br />

outline merging. Outline merging is the combination of two outlines into one, typically as the union<br />

of the two polygons. It is desirable if the same real-world surface appears in two subsequent scans.<br />

After initial processing, it will be represented by two overlapping planar patches in the map. These<br />

79


Patch Map Data-Structure<br />

polygon in 3D<br />

z<br />

x<br />

y<br />

polygon in xy-plane<br />

Figure 5.1: APPLYING A 3D TRANSFORM TO A 2D POLYGON – The transform is represented in<br />

grey; the transform’s translation as a grey arrow, the transform’s rotation axis as a grey cylinder.<br />

two should be merged to further losslessly simplify the data and to represent the environment more<br />

truthfully. Two polygons can only be joined if they lie on the same plane. In practice, two polygons<br />

that are to be joined will not lie on identical, but only on very similar planes. In order to meaningfully<br />

join the polygons, they to be transformed to lie in the same plane. Both with a full transform and with<br />

a canonical coordinate system, this means that all vertices have to be modified. In the case of the full<br />

transform, even if the two planes can be transformed to be identical, the 2D coordinate system will<br />

most likely not be. So, at least one polygon’s vertices have to be transformed to the 2D coordinate<br />

system of the other 1 . In the case of a canonical coordinate system, the vertices have to be modified<br />

with high probability during the transform unifying the planes.<br />

Another advantage of using a full transform is that this is the convention used to note the robot’s<br />

global pose and often also for the sensor’s pose relative to the robot’s reference frame. When an<br />

outline is to be transformed to the global reference frame, it is simply a matter of concatenating three<br />

transforms.<br />

The reason for using any kind of combined 2D/3D definition as opposed to a pure 3D polygon<br />

is first that basic operations on polygons are useful in this context, e.g. joining, simplification, or<br />

growing. These could not be defined as easily on truly three-dimensional polygons. The other reason<br />

is cleanliness: In our approach, all polygon vertices will be in the same plane by definition. In a<br />

polygon with three-dimensional vertices, this is not guaranteed. And even if one attempted to, they<br />

could never be, due to numerical limitations. Our solution avoids having to remedy the resulting<br />

errors.<br />

The advantages of using planar patches include simplification, entailing reduction in memory size,<br />

and more economical processing. In a largely man-made environment, many surfaces are planar, especially<br />

the larger ones (e.g. a potted plant is not, but the much larger wall and floor are). Representing<br />

1 This seeming advantage vanishes once we consider that in practice, often more than two outlines are merged. In the<br />

case of n outlines, the vertices of n − 1 have to be modified.<br />

80


5.2 Surface Patches<br />

Data set<br />

Patch storage method<br />

Plane + outl. Triangulated outline Triangulated region<br />

size [B] size [B] increase size [B] increase<br />

Lab 151 280 1.94406 3913 25.2947<br />

Car Park 207 445 2.225 3062 15.8319<br />

Dwelling 199 444 2.17544 3191 18.0892<br />

Hannover ’09 Arena 191 398 2.11261 4600 23.8958<br />

Hannover ’09 Hall 222 492 2.19919 2902 14.253<br />

Table 5.1: A COMPARISON OF THE SIZES OF DIFFERENT STORAGE METHODS FOR SURFACE<br />

PATCHES – For a fair comparison, all methods describe the same original data points. The increase<br />

given is the increase in size as compared to storing plane and outline. Both the size and the increase<br />

given are the median of all surface patches in the data set. ’Triangulated region’ refers to a Delaunay<br />

triangulation of the sub point cloud to which the plane was fitted.<br />

planar surfaces with point clouds is redundant for most applications (for archaeological applications<br />

it may be important to keep track of each of a wall’s tiny dents and bumps, for search and rescue,<br />

surveillance, or exploration it is not). Storing the infinite plane and its perceived outline is much more<br />

efficient. Table 5.1 compares the different sizes for different variants on a number of 3D LRF datasets.<br />

These datasets are presented in detail in chapter 3.<br />

In many robotics scenarios, a map is not only to be generated online, but paths are simultaneously<br />

being planned on it. For this requirement, path planning using collision detection is a simple, effective,<br />

and suitable solution. It is simple since relatively user-friendly, highly optimized collision detection<br />

libraries are available for games and other virtual reality applications. It is suitable for dynamic<br />

use, since it allows for inclusion of the polygons as they are, without having to calculate derivative<br />

structures like a Voronoi diagram. It is effective since derivation of surfaces is only done once. When<br />

using point clouds directly, there are two possible approaches: calculating an implicit surface or using<br />

just the points for collision detection. In the first approach, the implicit surface has to be calculated on<br />

each collision check [Klein and Zachmann, 2004, Ho et al., 2001] which adds processing overhead.<br />

In the second, it has to be assumed that point clouds are dense enough not to let the collision object<br />

pass between them. And even if that holds true, the collision detection will produce incorrect results<br />

for most cases (non-cuboidal or non-aligned collision shape). This assumption is not needed when<br />

planes are fitted into the point cloud.<br />

5.2 Surface Patches<br />

Surface Patches describe the surfaces of objects in an environment. They are typically derived from<br />

range sensor data, exclusively so in this work. Two types of patches are used: planar patches and<br />

trimesh patches.<br />

5.2.1 Planar Patches<br />

As discussed in section 5.1, the planes are stored as an affine transform on the xy-plane. CGAL is used<br />

for polygons. CGAL (Computational Geometry Algorithms Library, http://www.cgal.org/)<br />

is a peer-reviewed open-source library offering algorithms and data structures in several areas of geometry.<br />

In this vein, CGAL offers a data structure for polygons with optional holes and also operations<br />

81


Patch Map Data-Structure<br />

on it (e.g. union and outlining), [Giezeman and Wesselink, 2008]. Other operations are easily defined<br />

using standard algorithms (simplification with Douglas-Peucker [Douglas and Peucker, 1973] – the<br />

Douglas-Peucker algorithm and its results are presented on page 97 and in algorithm 9) or CGAL<br />

library functions and documentation (polygon growing, [Cacciola, 2008]).<br />

5.2.2 Trimesh Patches<br />

Trimesh patches are triangular meshes, possibly with additional information like color or texture.<br />

Trimesh patches are used in two ways: Firstly, maps consisting completely of trimeshes are used<br />

as a comparison to the proposed method of using planar patches. Alternatively, hybrid maps containing<br />

planar patches for planar elements and trimesh patches for non-planar ones are also used for<br />

comparison.<br />

5.2.3 Other Patch Types<br />

It is possible to extend the presented framework with other types of patches. These could be patches<br />

defined on higher order surfaces instead of planes. By way of example, an umbrella can be roughly<br />

described as a part of a sphere. A possible candidate for higher order surfaces are quadrics (a survey<br />

of methods for detecting these is in [Petitjean, 2002]) or primitives of collision detection engines like<br />

spheres, cones, and cylinders.<br />

5.3 Generation of patches from point clouds<br />

3D range sensors deliver data in a raw format (e.g. range image for range cameras or sets of 2D laser<br />

scans for actuated laser range finders, ALRF), which is generally converted into a 3D point cloud, i.e.<br />

a set of 3D points. Especially in the case of a range camera, the scanning order is usually retained such<br />

that points in the point cloud correspond to the camera’s pixels e.g. row-wise. From point clouds, the<br />

types of patches presented are derived with different levels of ease.<br />

5.3.1 Trimesh Patches<br />

Trimesh patches are included mainly to allow for a comparison of the patch map with traditional<br />

methods. Therefore, the vast literature on trimesh simplification was not consulted.<br />

Trimeshes are grown on each point cloud individually and in two phases. In the first phase, for<br />

each pixel the optimal plane under least mean square criterion is computed. The triangles of the<br />

triangulation induced by the scan geometry are classified in growable and un-growable. A triangle is<br />

growable iff all points have similar optimal planes. Than trimeshes are grown from different starting<br />

points using only growable triangles. Note that the algorithm is independent of the location of the<br />

starting points (algorithm 4). It does rely on two thresholds θ ang for angular similarity and θ d for<br />

distance similarity. The second phase grows trimeshes in all remaining points on closely located<br />

points (algorithm 5). It uses the threshold θ s for maximum side-length of a triangle.<br />

5.3.2 Planar Patches<br />

Approximately planar regions in the point cloud are detected as described in [<strong>Poppinga</strong> et al., 2008b].<br />

These regions are originally sub point clouds. Then, an outlining 2D polygon is derived. This 2D<br />

polygon and the infinite plane in which it lies define the planar patch (which is a 3D entity).<br />

82


5.3 Generation of patches from point clouds<br />

Algorithm 4 THE FIRST PHASE OF TRIMESH GROWING – returns the set of trimeshes M; the basic<br />

triangulation (esp. the neighborhood function for triangles N()) is defined by the grid structure of the<br />

scan; Π ⊂ N × N: set of input pixels, p : Π ↦→ R 3 : pixel to point map, π ±1 (P ): eight-neighborhood<br />

of pixel P<br />

1: for P ∈ Π do<br />

2: p c = 1 ∑<br />

9 P ′ ∈π ±1 (P ) p(P ′ )<br />

3: A = ∑ P ′ ∈π ±1 (P ) (p c − p(P ′ )(p c − p(P ′ )) T<br />

4: n(P ) ← {eigenvalue of A corr. to smallest eigenvalue}<br />

5: d(P ) ← n(P )p c<br />

6: if d(P ) < 0 then<br />

7: d(P ) ← −d(P ), n(P ) ← −n(P )<br />

8: end if<br />

9: end for<br />

10: for P = (r, c) ∈ Π do<br />

11: triangle T 0 = ((r, c), (r + 1, c), (r, c + 1))<br />

12: trimesh m<br />

13: queue Q.push(T 0 )<br />

14: while not Q.empty() do<br />

15: T = Q.pop()<br />

16: if not used(T ) then<br />

17: if max{acos( n(t 0)n(t 1 )<br />

|n(t 0 )||n(t 1 )| )|t 0, t 1 ∈ T } < θ ang and max{|d(t 0 )−d(t 1 )||t 0 , t 1 ∈ T } < θ d<br />

then<br />

18: m.add(T )<br />

19: used(T )←true<br />

20: Q.push(N(T ))<br />

21: end if<br />

22: end if<br />

23: end while<br />

24: M ← M ∪ m<br />

25: end for<br />

83


Patch Map Data-Structure<br />

Algorithm 5 THE SECOND PHASE OF TRIMESH GROWING – returns the set of trimeshes M; the basic<br />

triangulation (esp. the neighborhood function for triangles N()) is defined by the grid structure of the<br />

scan; Π ⊂ N × N: set of input pixels; p : Π ↦→ R 3 : pixel to point map; π ±1 (P ): eight-neighborhood<br />

of pixel P ; τ 0 (T ), τ 1 (T ): the two sides of a triangle that only differ in one dimension in pixel space<br />

1: for P = (r, c) ∈ Π do<br />

2: triangle T 0 = ((r, c), (r + 1, c), (r, c + 1))<br />

3: trimesh m<br />

4: queue Q.push(T 0 )<br />

5: while not Q.empty() do<br />

6: T = Q.pop()<br />

7: if not used(T ) then<br />

8: if τ 0 (T ) < θ s and τ 1 (T ) < θ s then<br />

9: m.add(T )<br />

10: used(T )←true<br />

11: Q.push(N(T ))<br />

12: end if<br />

13: end if<br />

14: end while<br />

15: M ← M ∪ m<br />

16: end for<br />

Projection for outlining<br />

Plane fitting yields a sub point cloud and the infinite plane on which these points all lie (within an error<br />

bound). So while these points are originally 3D, they can be considered points in a 2D coordinate<br />

system in a specified pose in 3D. We would like to find a small polygon containing the 2D point set.<br />

We call this process outlining. It uses the information that a relatively dense sub point cloud lies on a<br />

plane for compression.<br />

Outlining presents us with a fundamental choice: We can either outline the sub point cloud in<br />

the pixel domain of the sensor and then project only the points of the outline onto the optimal plane<br />

(late projection). Or we can project all points from the sub point cloud onto the plane and then find<br />

an outline in the 2D floating point domain (early projection). Obviously, early projection brings the<br />

burden of projecting a considerably larger number of points. Also the 2D grid containing a contiguous<br />

region of pixels is computationally much easier to handle. However, late projection leads to other<br />

problems as described below. In addition to being less obvious, they are less straight-forward to<br />

handle.<br />

There are two methods for projecting the 3D points onto the plane: orthogonally or along the<br />

sensor beam 2 to which they correspond (projection along the beam, see figure 5.2). Both methods<br />

come with drawbacks, most prominently when the plane they project to has an acute angle with the<br />

beams the points being projected come from (figure 5.3). Orthogonal projection can change the order<br />

of points such that neighboring pixels correspond to points that do not neighbor (i.e. they are not<br />

visible to each other). This manifests itself in two problems for late projection. Firstly, points that lie<br />

inside the online before projection may end up outside the constructed outline after late orthogonal<br />

projection. The polygon would cease to be a proper outline of the sub point cloud. Secondly, the<br />

projection may create self-intersections of the outline. Projection along the beam causes distortions<br />

2 For range cameras, a beam in the center of the pixel frustum is assumed.<br />

84


5.3 Generation of patches from point clouds<br />

(a) Orthogonal projection<br />

detected point<br />

projected point<br />

optimal plane<br />

beam<br />

(b) Legend<br />

(c) Projection along the beam<br />

Figure 5.2: Different methods of projecting the point of a sub point cloud to the optimal plane fitted<br />

to them<br />

(a) Inconvenient behavior of orthogonal projection: order<br />

of points changes<br />

(b) Distortion by projection along the beam: size of regions<br />

grows enormously<br />

Figure 5.3: PROBLEMS WITH THE TWO TYPES OF PROJECTION – Gray bars are sensor beams, black<br />

line is fitted plane, crosses are distance measurements, circles are points projected to the fitted plane<br />

85


Patch Map Data-Structure<br />

(a) Huge erroneous region due to projection along the<br />

beam<br />

(b) Same scene with orthogonal projection<br />

Figure 5.4: Differences between orthogonal projection and projection along the beam<br />

because the plane fitting which forms the sub point cloud uses orthogonal projection. Points which lie<br />

close together when projected orthogonally can spread out when projected along the beam (figure 5.4,<br />

also compare the projected point positions in figure 5.3(a) to those in figure 5.3(b)). This problem<br />

haunts both early and late projection.<br />

There is an additional problem case with late projection when used with ALRF. The straightforward<br />

way to enter ALRF scans into a grid for processing is to enter the first LRF scan as the first<br />

row, the second scan as the second row and so on. But, as opposed to range cameras, the plane on<br />

which all points of one row lie typically extends to behind the origin. Imagine a laser scanner with<br />

an opening angle of 240°. Let the beam going forward be at 0°. If the 0° beam points upward in the<br />

first scan, all beams beyond 90° or -90° will point downward. In the last scan the converse is true.<br />

Consequently, while the center of the top row describes the top of the scene, the ends will describe the<br />

bottom of the scene (the reader is referred to figure 3.8 on page 57 for an ALRF range image). The<br />

scan lines intersect at 90°and -90°. Now if a sub point clouds has points corresponding to pixels both<br />

from the center columns of the image and from the left and right edges, after projection the outline<br />

will self-intersect (see figure 5.5). This affects both orthogonal projection and projection along the<br />

beam.<br />

So in conclusion, early orthogonal projection is the only projection method that can be used without<br />

having to solve additional problems.<br />

Outlining<br />

Once all points of an approximately planar sub point cloud have been projected to a 2D coordinate<br />

system, 2D outlining algorithms can be used to find a descriptive polygon. Outlines are of interest<br />

in their own right. They provide a compact representation of planar patches and they are used in<br />

subsequent processing steps, most importantly patch maps. For display purposes however, be it via<br />

X3D, in an OpenGL context or similar, planar regions have to be triangulated to be displayed. This<br />

can either be done indirectly by first generating an outline and then triangulating that outline or by<br />

directly generating a trimesh out of the point set. For this reason, we benchmarked several direct and<br />

indirect triangulation algorithms on a variety of datasets, namely<br />

86


5.3 Generation of patches from point clouds<br />

(a) An ALRF scan consisting of 3<br />

scan lines (color-coded), opening angle<br />

240°, horizontal step 30°, ALRF<br />

is depicted as gray box, coinciding<br />

scan points at 90° and -90° have been<br />

shifted for visibility.<br />

(b) The ALRF scan inserted into a<br />

grid. Suppose the three left-most<br />

columns are identified as one region.<br />

Its best outline solely based on the<br />

grid is depicted.<br />

(c) The vertices of the outline are projected<br />

back into 3D (an orthogonal<br />

view of the plane they lie in is depicted).<br />

Due to how the scan was entered<br />

into the grid, an intersection occurs.<br />

Figure 5.5: LATE PROJECTION CAN CAUSE INTERSECTIONS WHEN APPLIED TO ALRF DATA<br />

• convex hull with Graham’s line scan, a standard algorithm (yielding the smallest simple polygon<br />

containing all points of a point cloud), implementation by CGAL [Hert and Schirra, 2008],<br />

• RI, line-fitting based on the plane-fitting presented in [<strong>Poppinga</strong> et al., 2008b], implemented by<br />

Narunas Vaskevicius 3 ,<br />

• naive triangulation based on the range image nature of the point clouds,<br />

• scanLineV1,<br />

• scanLineV2,<br />

• α-shapes, and<br />

• chain code.<br />

Only chain code, α-shapes and convex hull use indirect triangulation. For the triangulation step,<br />

CGAL [Yvinec, 2008] was used. Since it relies on the polygons being simple, but the chain code<br />

algorithm uses late projection, chain code could only be used on true range images and not on scans<br />

from actuated LRF. When run on ALRF scans, it often produces polygons with many more involved<br />

self-intersections than the heuristic for removing them can remove.<br />

The α-shapes algorithm is a standard algorithm from geometry for finding a polygon to a set<br />

of points [Edelsbrunner and Mücke, 1994]. The α-shape of a 2D point cloud P is the set of all α-<br />

exposed edges of the Delauney-triangulation. An edge is α-exposed if there is no circle of radius α<br />

containing the edge and another point from P . The α-shape of P is a generalization of the convex hull,<br />

parameterized on α: the ∞-shape of P is the convex hull. For smaller values, the α-shape may be<br />

non-convex, contain holes, or even consist of multiple components. See figure 5.6 for an illustration.<br />

For pseudo-code of the algorithm, please refer to algorithm 6. We used the implementation available<br />

in CGAL [Da, 2008].<br />

One natural limitation of the α-shapes algorithm is the dependency on the parameter α. Too great<br />

values will close gaps and holes, rendering the representation of the environment less truthful. On<br />

3 The algorithm is benchmarked since it was part of the <strong>Jacobs</strong> Robotics library, but it will not be discussed in detail<br />

87


Patch Map Data-Structure<br />

Figure 5.6: AN α-SHAPE (GREY) OF A 2D POINT CLOUD (ORANGE) – The radius of the circles is<br />

α. Taken from [Da, 2008].<br />

Algorithm 6 GENERATING THE α-SHAPE A FOR POINT CLOUD P<br />

D ←Delauney-triangulation(P )<br />

for all edges e ∈ D do<br />

c e ←circumcircle(e)<br />

if radius r ce < α and ∀p ∈ P : p ∈ c e ⇒ p ∈ e then<br />

C ← C ∪ {e}<br />

end if<br />

end for<br />

A = {e ∈ C|e outer edge}<br />

88


5.3 Generation of patches from point clouds<br />

Figure 5.7: SOME EXAMPLES FOR CONVEX POLYGONS ON A GRID<br />

(a) Before processing the last line: three polygons in P<br />

(b) After processing the last line: two polygon transferred to R, four in P and one link<br />

polygon (dashed) in R<br />

Figure 5.8: AN EXAMPLE ITERATION OF THE ALGORITHM – The polygons in P are depicted on top<br />

of the pixels which are part of the region of the range image currently being processed<br />

the other hand, small values might produce more than one polygon. Thankfully CGAL offers the<br />

possibility to compute the optimal α-value to achieve a given number of polygons. In our case, we<br />

obviously would like to have only one component.<br />

The scanLine algorithms use late projection. Their approach is to find a set of convex polygons<br />

covering the same area as the set of triangles that is produced by connecting all triplets of pixels<br />

neighboring in the distance image. This set is supposed to have a triangular decomposition with as<br />

few triangles as possible.<br />

All convex polygons on a grid are (possibly degenerated) octagons (see figure 5.7 for some examples).<br />

It can furthermore be observed that it can be decided whether a polygon is convex when<br />

scanning it row by row. So an algorithm can be conceived, which reads the input regions row by<br />

row and gradually builds convex polygons, starting a new polygon whenever the current one would<br />

become non-convex (algorithm 7). This obviously has the advantage of linear runtime. The algorithm<br />

uses the notion of a link polygon. Its construction is given in algorithm 8. An example iteration of the<br />

algorithm is shown in figure 5.8. It exemplifies most of the cases from the algorithm. The stretches in<br />

5.8(b) correspond, from left to right, to lines 11 (sub-case line 15), 9, 11 (sub-case line 13), and 3.<br />

Triangulation of convex polygons on a grid is mostly trivial. The basic mode of triangulation can<br />

be seen in the leftmost polygon in figure 5.9. Depending on how degenerated the octagon is, some<br />

89


Patch Map Data-Structure<br />

Algorithm 7 THE SCANLINE ALGORITHM FOR RANGE IMAGE TRIANGULATION – In the current<br />

line i, build a list of polygons P i from the previous line’s list P i−1 , sorted from left to right. Result is<br />

the set of convex polygons R. Versions 1 and 2 differ in how they define “convex” in lines 9 and 13.<br />

1: for all lines i do<br />

2: for all continuous stretches of pixels s do<br />

3: if |P i−1 | = 0 then<br />

4: P i ← P i ∪ s<br />

5: else<br />

6: p j ← left-most element of P i−1<br />

7: s ′ ← last stretch added to p j<br />

8: if p j and s share one or more columns then<br />

9: if p j + s convex then<br />

10: P i ← P i ∪ p j + s<br />

11: else<br />

12: R ← R ∪ {p j }<br />

13: if s ′ + s convex then<br />

14: P i ← P i ∪ s ′ + s<br />

15: else<br />

16: R ← R ∪ L(s ′ , s)<br />

17: P i ← P i ∪ s<br />

18: end if<br />

19: end if<br />

20: P i−1 ← P i−1 \ p j<br />

21: else<br />

22: if p j < s then<br />

23: R ← R ∪ p j<br />

24: P i−1 ← P i−1 \ p j<br />

25: goto line 3<br />

26: else<br />

27: P i ← P i ∪ s<br />

28: end if<br />

29: end if<br />

30: end if<br />

31: end for<br />

32: end for<br />

33: for p ∈ P i do<br />

34: R ← R ∪ {p}<br />

35: end for<br />

90


5.3 Generation of patches from point clouds<br />

Algorithm 8 CONSTRUCTION OF A LINK POLYGON – Given two stretches of pixel s i , s i+1 on consecutive<br />

lines i, i + 1 with s j = p j,1 , . . . , p j,nj , the link polygon L(s i , s i+1 ) =(TL,BL,BR,TR) is<br />

derived as follows.<br />

if p i,1 < p i+1,1 then<br />

TL← (i, p i+1,1 − 1)<br />

BL← (i + 1, p i+1,1 )<br />

else if p i,1 = p i+1,1 then<br />

TL← (i, p i,1 )<br />

BL← (i + 1, p i+1,1 )<br />

else<br />

TL← (i, p i,1 )<br />

BL← (i + 1, p i,1 − 1)<br />

end if<br />

if p i,ni < p i+1,ni+1 then<br />

TR← (i, p i,ni )<br />

BR← (i + 1, p i,ni + 1)<br />

else if p i,ni = p i+1,ni+1 then<br />

TR← (i, p i,ni )<br />

BR← (i + 1, p i,ni+1 )<br />

else<br />

TR← (i, p i,ni+1 + 1)<br />

BR← (i + 1, p i,ni+1 )<br />

end if<br />

Figure 5.9: TRIANGULATION OF CONVEX GRID POLYGONS – The numbers on the leftmost polygon<br />

indicate all possible occurring triangles. On the other polygons, some of these have been left out based<br />

on a simple decision tree using at most two tests for equality per triangle. Note that all vertices are<br />

points on the grid.<br />

91


Patch Map Data-Structure<br />

(a) Triangulation with scanLineV1: 5 triangles. Accurate<br />

coverage.<br />

(b) Triangulation with scanLineV2: 1 triangle. Coverage<br />

is less accurate.<br />

Figure 5.10: COMPARISON OF TRIANGULATION WITH/WITHOUT THE RESTRICTION TO THE AREA<br />

COVERED BY THE NAIVE TRIANGULATION – Please keep in mind that this is an extreme corner case,<br />

since the data is usually much too noisy to allow outlines like the depicted one to continue over more<br />

than two rows.<br />

5.0<br />

0.0<br />

0.5<br />

1.0 1.5<br />

2.0<br />

2.5<br />

3.0<br />

3.5<br />

4.0<br />

4.5<br />

Figure 5.11: TRIANGLES WITH ANNOTATED SPIKYNESS<br />

of the triangles can be left out [ibid.]. The only caveat is keeping watertightness. Two vertices of a<br />

link polygon are typically on edges of polygons adjacent to the link polygons. This is not allowed in<br />

trimeshes, so a vertex has to be inserted into the adjacent polygon which effects triangulation. These<br />

vertices are marked ’L’ in sub-figure 5.8(b).<br />

The major advantage of the scan line algorithm is its high speed. In fact, 64 % of the processing<br />

time is used for projecting the points of the ideal plane of their region and related pre-processing.<br />

Further 30 % are needed for the scan line algorithm and a mere 6 % for the actual polygonization.<br />

One disadvantage of the algorithm are the very sharp polygons it outputs. Also, the restriction to the<br />

area covered by the naive triangulation can produce an excess of triangles (figure 5.10).<br />

In the first working version of the algorithm, scanLineV1, the edges are restricted to vertical and<br />

horizontal lines and diagonals of squares of four pixels. This is relaxed in scanLineV2, which allows<br />

for more types of diagonal lines. This reduces the number of triangles. On the other hand, the accuracy<br />

is reduced (see figure 5.10).<br />

For benchmarking, we define the spikyness of a triangle. The spikyness of a triangle T with inner<br />

angles α 0 , α 1 , α 2 is defined as ∑ n=2<br />

i=0 (π /3 − α i ) 2 . As an orientation: the spikyness of a symmetric<br />

right triangle is ∼0.411, for a regular triangle it is 0. See figure 5.11 for examples.<br />

92


5.3 Generation of patches from point clouds<br />

Figure 5.12: Time taken, number of polygons and spikyness for the different triangulation algorithms<br />

on the German Open ’09 Arena dataset.<br />

93


Patch Map Data-Structure<br />

Figure 5.13: Time taken, number of polygons and spikyness for the different triangulation algorithms<br />

on the German Open ’09 Hall dataset.<br />

94


5.3 Generation of patches from point clouds<br />

The experiments were conducted on three of the datasets presented in chapter 3. The reader is<br />

referred there for an in-depth analysis. Two of the datasets were recorded at the RoboCup German<br />

Open ’09 at Hannover Fair. One is a run through the RoboCup Rescue arena consisting of 79 point<br />

clouds (results in figure 5.12). The other is a circuit of the fair hall in which the arena was located<br />

(figure 5.13). It consists of 78 point clouds. These datasets were recorded with an actuated laser range<br />

finder. The other dataset was recorded with a range camera. It consists of hand chosen scenes in our<br />

laboratory. They were picked for one of three criteria: being dominantly planar, dominantly spherical,<br />

conical or cylindrical or for containing planar regions with holes in them. The dataset is hence called<br />

’Planar/Round/Holes’. It contains 75 point cloud. Its results are in figure 5.14.<br />

The two scanLine algorithms and the convex hull are the fastest contenders. Naive triangulation is<br />

in the mid-field, while α-shapes and RI are slowest, with a slight advantage for RI. Where chain code<br />

could be used, it performed between naive triangulation and RI.<br />

In terms of number of polygons, the trivial triangulation was by far worst (as could be expected).<br />

Then follow, from worse to better, scanLineV1, scanLineV2, and α-shapes. Convex hull and RI<br />

perform best.<br />

The spikyness depends on the dataset. A common feature is that the scanLine algorithms produce<br />

quite spiky triangles, with a slight advantage of scanLineV2 and α-shapes and RI produce less spiky<br />

triangles, with a slight advantage for RI. In the point clouds collected with an ALRF, the trivial triangulation<br />

performs as well as RI and convex hull is worst. In the point clouds collected with a range<br />

camera, trivial triangulation was slightly better than RI, chain code was slightly worse than α-shapes<br />

and convex hull was slightly better than scanLineV2.<br />

These differences can be attributed to the properties of the different ranging sensors. Laser beams<br />

are reflected at steep angles. The near-IR light from the range camera’s light-source is not reflected<br />

in a sufficient brightness beyond a certain angle. This means that on areas at steep angles to the<br />

sensor, the ALRF will produce points which are stretched over a large area, while the range camera<br />

will produce no points at all. Additionally, even without this distortion, the range camera’s pixels are<br />

more evenly distributed than the ALRF’s beams. With the naive triangulation, this results in triangles<br />

which are close to right triangles (spikyness 0.5) for the range camera. On the ALRF’s more irregular<br />

pattern, triangles get spikier. Something similar goes for the convex hull: A more stretched polygon<br />

as produced by the ALRF’s more stretched out point sets will be triangulated into spikier triangles.<br />

In conclusion, the RI performed best, closely followed by the α-shapes. Since only the α-shapes<br />

produce outlines which are needed henceforth, the α-shape algorithm will be used. In a context where<br />

only trimeshes are needed, the RI is preferred. The convex hull was actually fastest, but is too coarse<br />

an approximation for mapping purposes. It might be interesting though for display in a system with<br />

different levels of detail as a coarse approximation to be used at great distances.<br />

Outline simplification<br />

Outlines constructed by outlining algorithms typically contain many vertices more than necessary for<br />

subsequent processing. We call the process of removing excess vertices while maintaining resemblance<br />

to the original shape outline simplification. There is a rich literature on algorithms for this<br />

task. A survey can be found in [Heckbert and Garland, 1997]. Also, Mustafa gives an introduction<br />

into the field in his Ph. D. thesis [Mustafa, 2004]. There are three groups of algorithms: Global ones<br />

using constrained Delauney triangulation, local general ones and local ones working on monotone<br />

subsections. A further distinction mark is whether the vertices of the simplification are supposed to<br />

be a subset of the input polygon (strong simplification) or if they can be placed freely (weak simplification).<br />

He mentions that the creation of an optimal weak simplification without self-intersection and<br />

95


Patch Map Data-Structure<br />

Figure 5.14: Time taken, number of polygons and spikyness for the different triangulation algorithms<br />

on the Planar/Round/Holes dataset.<br />

96


5.3 Generation of patches from point clouds<br />

(a) The outline of a region as found by the α-shapes<br />

algorithm: 141 vertices<br />

(b) The same outline simplified by the Douglas-<br />

Peucker algorithm: 9 vertices<br />

Figure 5.15: An example of outlining<br />

within a bound ɛ is NP hard and states that no approximation algorithm is known (p. 57).<br />

However, [Mustafa, 2004] only covers the min-#-problem: finding an approximation within bound<br />

ɛ in some error measure with as few nodes as possible. There is also the the min-ɛ-problem: finding<br />

an approximation with a fixed number of nodes with minimal error ɛ (cf. e.g. [Imai and Iri, 1988]).<br />

A very early and still very popular strong min-# simplification algorithm is by Douglas and<br />

Peucker [Douglas and Peucker, 1973]. It is known as Ramer’s algorithm in vision and the sandwich<br />

algorithm in computational geometry [Hershberger and Snoeyink, 1997].<br />

While this classic works locally on general polylines, the most widely researched class is the<br />

one of monotone polylines (i.e. one of the coordinates grows across all vertices) [Varadarajan, 1996,<br />

Goodrich, 1995, Aronov et al., 2004]. This is because they can be approximated by a piecewise linear<br />

function. It is, however, not a real limitation, since each polyline can be broken into a series of<br />

monotone polylines.<br />

A weak simplification algorithm on polylines was proposed by Imai and Iri [Imai and Iri, 1988].<br />

In later works [Guibas et al., 1991], an algorithm with a lower runtime complexity was proposed. The<br />

runtime complexity that actually applies depends on certain definitions and the type of object used.<br />

In [Imai and Iri, 1988], piecewise linear functions can be approximated in linear time, while the run<br />

time for general polygonal curves varies depending on the definition of the error criterion. For the<br />

latter case, run time can be O(n 2 log n) or O(n log n). In [Guibas et al., 1991], it depends on the<br />

definition of visiting the points “in order”.<br />

Simplification algorithms can either work online, as the nodes arrive, or it can work offline, on<br />

the entire polyline. In some domains such as sampling time series of data in medicine or natural<br />

sciences, it is desirable to process data as it comes in. This is equivalent to exclusively using a sliding<br />

window in an offline algorithm. This is, however, strongly discouraged by the pathological examples<br />

these types of algorithms can yield as given in [Shatkay, 1995]. An improved variant is proposed<br />

in [Keogh et al., 2003]. Since we use an algorithm with separate steps (plane extraction, outlining,<br />

outline simplification, triangulation), we can use offline algorithms.<br />

Since the Douglas-Peucker has proven itself so extensively, there was no reason not to use it.<br />

Its only small drawback is that it assumes an open polyline as input and not a close polyline, i.e. a<br />

polygon. However, using the polyline between the first and last vertex as arbitrarily defined by CGAL,<br />

the results look very convincing. This way the first and last vertex will always stay a part of the outline,<br />

97


5.4 Patch Maps<br />

5.4.1 Patch Map Capabilities<br />

Potential capabilities of a patch map framework are defined by simple interfaces. This allows for<br />

flexibility. Not all implementations have to offer all kinds of queries e.g. for collision or proximity.<br />

Algorithms specify which exact capabilities they need to run and thus indirectly which patch map<br />

implementation they can run on.<br />

PatchMapInterface The PatchMapInterface represents the basic capabilities a patch map framework<br />

must offer. These are adding of patches and read-only sequential access to all patches. A patch<br />

map framework must be able to store both planar and trimesh patches. If it gives each a special<br />

treatment or e.g. simply triangulated the planar patches is up to the individual implementation.<br />

ProximityQueryPatchMap The ProximityQueryPatchMap interface represents the ability to return<br />

the patch which is closest to a specified point.<br />

Un/SpecificPatchStepIntersector The UnspecificPatchStepIntersector interface represents the<br />

ability to test for collision freedom when traversing from one point to another. In this context “unspecific”<br />

means that in the case of collision the colliding patch is not returned. This interface does<br />

not precisely define behavior; this is done by its children UnspecificPatchLineIntersector and UnspecificPatchCapsuleIntersector.<br />

Still, it can be used in constructing a simple <strong>RRT</strong>. In case of a<br />

collision, the SpecificPatchStepIntersector also returns the patch which the line collides and the<br />

collision point on the patch.<br />

Un/SpecificPatchLineIntersector The interface UnspecificPatchLineIntersector represents the<br />

ability to test for collisions on a line between two points. Since this means that it does not consider<br />

object size, it can only be used as an approximation. In case of a collision, the SpecificPatchLineIntersector<br />

also returns the patch which the line collides and the collision point on the patch.<br />

Un/SpecificPatchCapsuleIntersector A capsule is the convex hull of two spheres of identical radius.<br />

In other words, it is the volume a sphere sweeps when cast from one point to another. The<br />

UnspecificPatchCapsuleIntersector interface represents the capability to test for collisions of a<br />

sphere on a path between to points. Considering the sphere to be the bounding sphere of a vehicle,<br />

this test is useful for motion planning. In case of a collision, the SpecificPatchCapsuleIntersector<br />

also returns the patch which the line collides and the collision point on the patch.<br />

5.4.2 Patch Map Frameworks<br />

In the following, we will present different Patch Map Frameworks. Here, this term refers to a datastructure<br />

to store patches together with the associated algorithms. The central problem patch maps try<br />

to solve is collision detection.<br />

Collision detection means efficiently testing whether two or more geometric primitives or complexes<br />

intersect. It typically consists of two phases. A broad phase perform a cheap check on all<br />

existing objects and returns a list of objects potentially in collision. On these, a more expensive and<br />

more accurate check is performed in the narrow phase.<br />

The main motivation and the most prominent case to use a Patch Map Framework are planar<br />

patches. Planar patches are relatively simple which tempts to implement from scratch the subset of<br />

99


Patch Map Data-Structure<br />

collision detection needed. Additionally, collision detection libraries typically do not handle nonconvex<br />

polygons as primitives, making it necessary to increase the complexity of the data (from<br />

polygons to trimeshes). A first attempt yielded SimplePatchMap. Adding a simple broad-phase<br />

to the collision detection lead to BoundingBox6DPatchMap.<br />

During implementation, it became clear that it would be very hard to fulfil all requirements on the<br />

patch map with naive approaches. And even in case of success, they would probably still be lacking<br />

in terms of speed and stability. So we decided to use an external collision detection library. We<br />

considered several popular libraries. In general it can be said that we chose the library which left the<br />

least amount of programming work to be done. That is, we reviewed the supplied collision primitives<br />

and algorithms. Specifically, we discarded RAPID [Gottschalk, 1997] because it only offers twobody<br />

collision check [Gottschalk, 1998] (i.e. whether two given collision objects collide) and not the<br />

n-body check we need (i.e. whether a given collision objects collides with a collection of collision<br />

objects [the map]). In other words, it does not implement a broad phase collision detection. It also<br />

lacks primitives other than triangles. We could not make SOLID [van den Bergen, 2004] work 4 . Since<br />

there were many alternatives available, we avoiding spending too much time on it. Our first working<br />

library was OPCODE for its small code size and high speed [Terdiman, 2003]. It sufficed for early<br />

experiments with ray casting. For a proper roadmap, we need to cast bounding volumes, so we turned<br />

to more elaborate libraries. We chose Bullet [Coumans, 2009] because<br />

• it is the popular free physics engine among game developers [ibid.],<br />

• the author of OPCODE, whose work there we liked, contributed to it,<br />

• it has a lively community to help with potential programming issues, and<br />

• one of our colleagues has had bad experiences with ODE [Smith, 2006], a potential alternative.<br />

To use a full physics engine might seem too much for the problem, but firstly this additional functionality<br />

does not inhibit the performance as a collision detection library and secondly a physics engine<br />

might prove useful in the future, e.g. for drivability tests for ground based robots.<br />

SimplePatchMap<br />

The SimplePatchMap is the reference implementation for most interfaces. It does not have a broad<br />

phase; it simply iterates over an array of all patches. In the narrow phase, it can natively collide planar<br />

patches: It first checks for collision with the infinite plane of the patch. If it collides, the intersection<br />

point is transformed to the 2D coordinate system of the polygon and checked for inclusion in it with<br />

the algorithm from CGAL [Giezeman and Wesselink, 2008].<br />

BoundingBox6DPatchMap<br />

The BoundingBox6DPatchMap uses a variant of the sweep and prune algorithm [Baraff, 1992] to<br />

add a broad phase to the SimplePatchMap’s narrow phase. Patches’ bounding boxes are entered<br />

into a six dimensional kD-tree. (The first three dimensions are the minima, the last three dimensions<br />

are the maxima.) To get patches potentially in collision with a line from point a to b, a query on the<br />

kD-tree for all points in the partially open hyper-cuboid Q is used. Here, Q is defined by its minimum<br />

and maximum point min Q and max Q . On min Q , the lower three dimensions are unbounded,<br />

4 We tried FreeSolid version 2.1.1 on Ubuntu Linux 9.10. Installation worked flawlessly. However, some of the header<br />

files provided included header files that were supposed to be among the installed files but were not.<br />

100


5.4 Patch Maps<br />

while the upper three are equal to the minimum of the bounding box of a and b: min [1−3]<br />

Q<br />

= −∞,<br />

min [4−6]<br />

Q<br />

= min BB{a,b} . Conversely, max [1−3]<br />

Q<br />

= max BB{a,b} , min [4−6]<br />

Q<br />

= ∞. We illustrate this in<br />

one dimension in figure 5.17.<br />

While on first glance this interpretation may look like a mere re-ordering of inequality-checks, it<br />

allows for optimization. Framing the interference check as a containment check on a kD-tree allows<br />

us to use its efficient look-up. As kD-trees are popular data-structures, there are heavily optimized<br />

implementations available. The effort for the BoundingBox6DPatchMap is reduced to one insertion<br />

per patch and one look-up per collision check broad phase. This also simplifies the implementation.<br />

OpcodePatchMap<br />

The OpcodePatchMap relies on the OPCODE library for collision detection [Terdiman, 2003].<br />

Since OPCODE does not store polygonal information, only bounding boxes, the patches are additionally<br />

stored in an array. Planar patches are triangulated for use with OPCODE. It offers lines and<br />

spheres as collision objects, but neither swept objects nor capsules. Hence, it only served for prototyping<br />

using line-collisions with the UnspecificPatchLineIntersector. To achieve more useful results,<br />

Bullet was adopted for collision detection. (For the choice of the collision detection library also see<br />

above.)<br />

BulletPatchMap<br />

The BulletPatchMap uses the Bullet Physics Engine for collision detection [Coumans, 2009]. Bullet<br />

uses trimeshes, so planar patches have to be triangulated to be used in Bullet. The trimeshes are<br />

generated directly in a format the Bullet can use. In the future, it might be worth trying to decompose<br />

the patch’s polygons into convex polygons because these are also supported by Bullet. Since such a<br />

decomposition yields far fewer components then a triangulation, there is potential for higher speed.<br />

PointCloudMap<br />

The PointCloudMap is a collection of point clouds. Although there are no patches present, some<br />

patch map interfaces are implemented to allow for a comparison of the patch maps with a traditional<br />

map type. Proper patches (i.e. planar or trimesh patches) cannot be entered into this map.<br />

The first approach was to render points as spheres with radius 0 or a very small radius as Bullet<br />

collision objects. However, point clouds can get quite large, especially if they originate from an ALRF<br />

(typically hundreds of thousands of points). Entering each as its own collision is quite challenging<br />

to the collision frameworks. Alternatively, one could enter the entire point cloud as one compound<br />

object. If the sensor covers a wide spectrum of ranges or has a very large opening angle, the bounding<br />

volume of the point cloud would be overly large and hence not very useful. A typical example is a<br />

robot standing in the middle of a room. Floor, ceiling, walls and objects are all covered by its ALRF.<br />

Should it now enter the complete PC as a compound collision object, the resulting bounding volume<br />

would encompass the whole room. Depending on how the collision detection framework handles<br />

compound objects, this could render the broad phase of collision detection useless. In that case, all<br />

collisions between the robot and the features of the room would have to be determined in the typically<br />

expensive narrow phase.<br />

We used a simple middle-ground algorithm. The point cloud was entered into a kD-tree which is<br />

then balanced. For each node N at a certain depth d, all points under N were used to generate sub<br />

point clouds, which were then entered as compound objects into Bullet. Due to the properties of the<br />

101


Patch Map Data-Structure<br />

P 2<br />

P 3<br />

P 0<br />

P 1<br />

a<br />

b<br />

(a) One-dimensional bounding boxes and two points from the collision query: “Does the line from a to b intersect any<br />

bounding boxes?”<br />

x<br />

P 2<br />

P 3<br />

x max<br />

a<br />

b<br />

P 0<br />

P 1<br />

a<br />

P P<br />

b<br />

P 2<br />

3<br />

0<br />

P1<br />

x min<br />

(b) Corresponding higher dimension point bounding boxes and open hyper-cuboid for collision query (unbounded towards<br />

−∞ on x min axis, towards ∞ on x max axis). Since the points for P 1 and P 3 lie inside the hyper-cuboid, the broad-phase<br />

would now pass the corresponding patches to the narrow phase collision detection.<br />

Figure 5.17: ONE DIMENSIONAL CASE OF BOUNDING BOX BASED BROAD PHASE INTERSECTION<br />

TEST – It uses higher dimensional point bounding boxes and open hyper-cuboid queries.<br />

102


5.4 Patch Maps<br />

kD-tree, the point cloud is repeatedly broken up at the median coordinate in one dimensions, cycling<br />

through all three. This is a very simple way to break up the big bounding volume of the point cloud<br />

into relatively well chosen sub bounding volumes. We took into account some over-segmentation, but<br />

limited under-segmentation. A value used for d in many experiments with ALRF points clouds of up<br />

to 200,000 points is 9. This means that one point cloud was split up into 512 sub point clouds. Please<br />

note that setting d to 0 or a very high value allowed us to also test the two naive approaches ruled out<br />

above.<br />

However, since Bullet requires its objects to be of a certain minimum size [Coumans, 2010], this<br />

approach did not deliver correct results. Since points are not commonly used as collision objects,<br />

none of the surveyed collision detection libraries supports them. As we have seen in the failed attempt<br />

with Bullet, simulating points with objects of size zero offers dubious chances of success. Hence, we<br />

implemented a simple collision detection algorithm ourselves.<br />

Our collision algorithm is given in pseudo-code as algorithm 10. It assumes that point clouds<br />

densely sample surfaces. Specifically, it is assumed that the radius will always be big enough such<br />

that a sphere cannot pass between points which lie on the same surface. For an optimized nearest<br />

neighbor search, the points are stored in a CGAL kD-tree [Tangelder and Fabri, 2008] which offers<br />

such a facility. An example of how the algorithm works can be found in figure 5.18. For it to always<br />

terminate, l 2 > 0 has to hold at all times. This is guaranteed due to the termination criterion in line 7.<br />

A theoretical limitation is the algorithm’s dependence on the point cloud structure. If a successful test<br />

is performed in a very dense area with many points close to the capsule, it will be very slow. However,<br />

this is a rare case since generally in denser areas tests are less likely to be successful.<br />

Algorithm 10 COLLISION DETECTION FOR A CAPSULE IN A POINT CLOUD – Given: point cloud<br />

P , capsule end points p 1 , p 2 , capsule radius r. Code in .<br />

1: if min p∈P ||p − p 1 || ≤ r or min p∈P ||p − p 2 || ≤ r then<br />

2: return true<br />

3: end if<br />

4: p ← p 1<br />

5: s ← (p 2 − p 1 )/||p 2 − p 1 ||<br />

6: while p i between p 1 , p 2 do<br />

7: if min p∈P ||p − p i || ≤ r then<br />

8: return true<br />

9: end if<br />

10: l ← √ min p∈P ||p − p i || 2 − r 2<br />

11: p i ← p i + sl<br />

12: end while<br />

13: return false<br />

The PointCloudMap only supports the unspecific collision queries (UnspecificPatchCapsuleIntersector).<br />

This is because the two most commonly used algorithms do not use the “specific” interfaces.<br />

Hence the effort would not have justified the reward. Also, for the other algorithms, it does<br />

not make sense to consider a complete point cloud as a patch. Consequently, they would have to be<br />

subdivided in a way that makes sense.<br />

Furthermore, the PointCloudMap does not support proximity queries. This is because they rely<br />

on the notion of the “distance to a patch”. Apart from the difficulty with identifying an equivalent<br />

to patches as outlined above, it is also challenging to define distance to a patch in a way such that it<br />

carries across the implications from planar patches (and to a lesser extent, trimeshes). This is because<br />

103


Patch Map Data-Structure<br />

(a) Nearest neighbor searches are conducted<br />

at each end of the capsule.<br />

Since neither of the nearest neighbors is<br />

within the capsule, the algorithm continues.<br />

(b) The algorithm starts at one end of<br />

the capsule and progresses to the other<br />

one. When the nearest neighbor to the<br />

query point is further away than the<br />

capsule radius, it continues to query at<br />

a point closer to the capsule’s other end.<br />

(c) The new query point is the intersection<br />

of the capsule and a sphere with radius<br />

equal to the distance to the nearest<br />

neighbor, center in the old query point,<br />

projected to the spine.<br />

(d)<br />

(e) The current query point is dark<br />

green, the query for the next iteration<br />

is light green.<br />

(f)<br />

(g) Not pictured: If a nearest neighbor<br />

search finds a point closer than the capsule<br />

radius, the algorithm terminates reporting<br />

a collision.<br />

(h)<br />

(i) When the new query would have<br />

to start behind the second end of the<br />

capsule, the algorithm returns reporting<br />

that there is no collision.<br />

Figure 5.18: A RUN OF THE KD-TREE BASED COLLISION DETECTION ALGORITHM DETECTING<br />

NO COLLISION. – The blue area is to be checked for any of the black points. The green arc and the<br />

red marking of a point represent the result of a nearest-neighbor search.<br />

104


5.5 Algorithms on patch maps<br />

Algorithm Proximity Specific Step Unspecific Step<br />

√<br />

<strong>RRT</strong> – –<br />

√<br />

Evasion <strong>RRT</strong> –<br />

–<br />

√ √<br />

MA <strong>RRT</strong><br />

–<br />

√<br />

PRM – –<br />

√ √<br />

MA PRM<br />

–<br />

Table 5.2: REQUIRED INTERFACES FOR PATCH MAP BASED ALGORITHMS – Proximity is short for<br />

ProximityQueryPatchMap, Specific Step for SpecificPatchStepIntersector, and Unspecific Step<br />

for UnspecificPatchStepIntersector<br />

there, the distance is defined along the plane normal for planar patches and along the local normal<br />

for trimeshes. Producing comparable behavior in the PointCloudMap is not inconceivable, but out of<br />

scope of this work.<br />

5.5 Algorithms on patch maps<br />

We implemented a number of algorithms on patch map frameworks. They do not use the patch map<br />

frameworks directly, via different capabilities, i.e. interfaces. Which capabilities an algorithm needs<br />

determines which patch map framework it can run on. These algorithms include<br />

• the Rapidly Exploring Random Tree (<strong>RRT</strong>) [LaValle, 1998]<br />

• the Evasion <strong>RRT</strong>, a variant of the <strong>RRT</strong>,<br />

• the Probabilistic Roadmap Method (PRM) [Kavraki et al., 1996],<br />

• the Medial Axis PRM (MA PRM) presented in [Wilmarth et al., 1999] adapted to 3D patch<br />

maps, and<br />

• the Medial Axis <strong>RRT</strong> (MA <strong>RRT</strong>) - inspired by [Wilmarth et al., 1999].<br />

Table 5.2 shows which algorithms require which interfaces. The <strong>RRT</strong> algorithms starts adds collisionfree<br />

edges to random points (shortened to a maximum length) successively to a tree starting at a<br />

specified root. The PRM algorithm works similarly, only that it grows in potentially disconnected<br />

components on the entire map and does not limit the length of edges. The <strong>RRT</strong> and PRM algorithms<br />

were chosen for further, in-depth experiments on real data, which are presented in chapter 6. To make<br />

that chapter more coherent, we present these two algorithms in detail in section 6.1.<br />

5.5.1 Evasion <strong>RRT</strong><br />

The Evasion <strong>RRT</strong> algorithm can be found in algorithm 11. The difference to standard <strong>RRT</strong> is that<br />

upon collision, it tries to place a non-colliding vertex in the vicinity of the first tried vertex. The<br />

position will be next to the planar patch in the planar patch’s plane. The rationale is that it is desirable<br />

to have many vertices near collision objects. A limitation is that it is currently only defined for planar<br />

patches.<br />

105


Patch Map Data-Structure<br />

Algorithm 11 THE EVASION <strong>RRT</strong> ALGORITHM – Building roadmap T on map M. The<br />

COLLISION*(M,x 1 , x 2 ) function family checks the map for patches between x 1 and x 2 . If p is a<br />

planar patch, n p is its normal.<br />

1: proc BUILD EVASION <strong>RRT</strong>(x init ,M)<br />

2: T.init(x init )<br />

3: for k:=1 to K do<br />

4: x rand ← RANDOM POINT()<br />

5: EXTEND(T,M,x rand )<br />

6: end for<br />

7: return T<br />

8:<br />

9: proc EXTEND(T,M,x)<br />

10: x near ← NEAREST NEIGHBOR(x,T)<br />

11: u new ← x near − x<br />

12: u new ← lstep<br />

|u new| u new<br />

13: x near ← x + u new<br />

14: while COLLISION(M,x, x near ) and p = p old do<br />

15: p ← COLLISION PATCH(M,x, x near )<br />

16: x c ← COLLISION POINT(M,x, x near )<br />

17: d ev ← u new × n p<br />

18: d ev ← lev<br />

|d ev| d ev<br />

19: x near ← x near + d ev<br />

20: end while<br />

21: T.add vertex(x new )<br />

22: T.add edge(x near , x new , u new )<br />

106


5.5 Algorithms on patch maps<br />

5.5.2 Medial Axis algorithms<br />

Given a point set P and a point q ∉ P , the distance d P (q) of q to P is the distance of the closest point<br />

in P to q.<br />

d P (q) := min ||p − q||<br />

p∈P<br />

The medial axis P MA of two point-sets P 1 , P 2 is the set of all points with equal distance to P 1 , P 2 .<br />

P MA := {p|d P1 (p) = d P2 (p)}<br />

The Medial Axis algorithms attempt to create smooth paths that keep maximum distance to the obstacles<br />

by placing vertices only on the medial axes. It does so by relocating random points to the<br />

nearest medial axis and thus avoids calculating the medial axes explicitly. In [Wilmarth et al., 1999],<br />

a volumetric method like an occupancy mesh is assumed. This means that a big part of their algorithm<br />

is dedicated to finding the border between free space and occupied space. In a patch map, there is no<br />

occupied space. Still, the patches correlate to this border: they cannot be crossed. Thus, adapting the<br />

algorithm to patch maps actually simplifies it (algorithm 12).<br />

Algorithm 12 MEDIAL AXIS PRM ADAPTED TO PATCH MAPS – Building roadmap T on map M.<br />

The COLLISION(M,x 1 , x 2 ) function checks the map for patches between x 1 and x 2 .<br />

1: T.init(x init )<br />

2: for k ← 1 to K do<br />

3: x rand ← RANDOM POINT()<br />

4: p ← CLOSEST PATCH(M,x rand )<br />

5: c ← CLOSEST POINT ON PATCH(M,x rand )<br />

6: determine s such that x ′ rand = x rand + s(x rand − c) has two closest patches<br />

7: T.add vertex(x ′ rand )<br />

8: end for<br />

9: for v vertex of T do<br />

10: V near ← N NEAREST NEIGHBORS(T, v, N)<br />

11: for v near ∈ V near , in order of increasing ||v − v near || do<br />

12: if not SAME COMPONENT(T,v, v near ) and not COLLISION(M,v, v near ) then<br />

13: T.add edge(v, v near )<br />

14: end if<br />

15: end for<br />

16: end for<br />

17: return T<br />

5.5.3 Experiments<br />

Thorough experiments with the roadmap algorithms will be discussed in chapter 6. Using all mapping<br />

frameworks and all algorithms there would have been overkill. To decide which mapping framework<br />

and which algorithms to use, we conducted preliminary experiments on all combinations for benchmarking.<br />

Another goal was to direct the implementation effort: A certain functionality was needed<br />

for the thorough experiments (most importantly capsule collisions). To save time, full functionality<br />

was only to be implemented on the mapping framework that came out of the preliminary experiments<br />

(using line collision checks) best.<br />

107


Patch Map Data-Structure<br />

Figure 5.19: MAP USED IN PRELIMINARY EXPERIMENTS – A 3D map was generated from this floor<br />

plan. Walls are black, doors are light grey, regions of interest (ROIs) are red, starting point is blue.<br />

A 3D patch map was generated from the floor plan in figure 5.19. Each wall was transformed into<br />

a planar patch of fixed height, with a door if specified. A floor and ceiling were added.<br />

For each patch map framework (simple, 6D bounding box, OPCODE, and Bullet), one map was<br />

generated. The five algorithms presented in the previous section were run on each map. The only<br />

exception is OPCODE. Since it only offers checks if there was a collision, but no straight-forward<br />

way to detect with which part of the map, three mapping algorithm could not be run on the OPCODE<br />

map. Two other limitations should be noted: Only Bullet offers collision checks for capsules, so all<br />

test were run using ray collision checks. Also, the simple implementation and the implementation<br />

based on 6D bounding boxes only offer support for planar patches.<br />

All roadmap algorithms use different methods. For example, the Medial Axis PRM can be expected<br />

to generate less vertices than <strong>RRT</strong> in a given amount of time. But on the other hand, these<br />

vertices can be expected to be more useful. A common task was defined to even out these differences.<br />

The algorithms had to reach all four regions of interest (ROIs) in the map in figure 5.19 from a common<br />

starting point in the center, then the run was terminated. The experiments were executed 50 times<br />

with different random seeds. The tables below show medians and both quartiles of all successful runs.<br />

The Medial Axis PRM was given special treatment. It was implement as closely as possible to<br />

[Wilmarth et al., 1999], which uses two phases: a preliminary phase to place the vertices on the medial<br />

axes and a final phase to connect the vertices with the PRM. For evaluation, reaching all ROIs had<br />

to be guaranteed. However, placing vertices is relatively expensive. Since using enough iterations to<br />

reach all ROIs every time would be prohibitive, the optimal (within 5%) number of vertices reaching<br />

all ROIs was determined with a variant of binary search. The results presented below are from the<br />

runs with this number. The premise was that, should the MA PRM prove itself, a one-phase anytime<br />

implementation could be written (placed vertices would be connected right away, but edges could be<br />

removed later on). However, it turned out that MA PRM could not always reach all ROIs within the<br />

maximum iterations (required by limits in system resources). Out of 50 runs, it reached all ROIs 28<br />

times for OPCODE, simple and 6D bounding box based implementations and 30 times for the Bullet<br />

map. The numbers on the performance only refer to successful runs, further skewing the outcome in<br />

favor of the MA PRM. As will turn out, it still does not perform well.<br />

108


5.5 Algorithms on patch maps<br />

time [s] Simple BB6D OPCODE Bullet<br />

<strong>RRT</strong> 0.010057 0.006137 0.002435 0.004721<br />

MA <strong>RRT</strong> 0.124055 0.060409 – 0.055298<br />

Ev. <strong>RRT</strong> 0.18627 0.156209 – 0.099351<br />

PRM 0.013578 0.010544 0.001309 0.008801<br />

MA PRM 1.96683 1.20087 – 0.678831<br />

Table 5.3: MEDIAN WALL TIME TAKEN TO REACH THE ROIS FROM THE STARTING POINTS<br />

time [factor] Simple BB6D OPCODE Bullet<br />

<strong>RRT</strong> .547/1.874 .570/1.837 .716/1.544 .570/1.730<br />

MA <strong>RRT</strong> .513/1.484 .550/1.643 – .520/1.545<br />

Ev. <strong>RRT</strong> .339/2.141 .305/1.909 – .337/2.124<br />

PRM .525/2.621 .520/2.830 .770/1.388 .452/3.123<br />

MA PRM .253/1.307 .239/1.243 – .354/1.558<br />

Table 5.4: LOWER AND UPPER QUARTILE OF WALL TIME TAKEN TO REACH THE ROIS FROM THE<br />

STARTING POINTS – The given values are the ratios of the quartiles to the medians.<br />

5.5.4 Results<br />

All values except time should be the same on all map implementations. After all, the random number<br />

generator was seeded with the same seed, so the same collisions should be checked. However, collision<br />

detection uses many floating point operations. Numerical stability is a goal, but it is not always<br />

achieved.<br />

Table 5.3 shows the median wall time taken. We can see that <strong>RRT</strong> is the fastest algorithm with<br />

PRM a respectable runner-up. All other algorithms are one or even two (MA PRM) orders of magnitude<br />

slower. In terms of maps, Bullet and OPCODE are roughly the same for <strong>RRT</strong>, but OPCODE<br />

shows some surprising weakness with PRM. The naive CGAL based implementations come relatively<br />

close to Bullet – factor 2.35 in the worst case (Simple/MA PRM), factor 1.09 in the best case<br />

(BB6D/MA <strong>RRT</strong>). Examination of the quartiles given in table 5.4 shows that PRM has the largest<br />

variance. MA PRM, while slowest, shows the smallest upward variance. It also has the largest downward<br />

variance, but this also a good thing. Among the mapping frameworks, Bullet and the naive<br />

implementation show most variance, OPCODE is most stable.<br />

Reaching the ROIs quickly is commendable, however, producing paths with favorable properties<br />

is also important. A basic, yet important property is the length (medians in table 5.5). MA <strong>RRT</strong><br />

produces the shortest paths, the ones produces by <strong>RRT</strong> are only marginally longer. Especially the<br />

PRM based algorithm produce long paths. For the basic PRM, this can be blamed on the unlimited<br />

path length [m] Simple BB6D OPCODE Bullet<br />

<strong>RRT</strong> 23.1608 23.1608 21.5803 23.1608<br />

MA <strong>RRT</strong> 22.166 22.3942 – 22.1411<br />

Ev. <strong>RRT</strong> 25.7439 26.621 – 26.621<br />

PRM 41.3277 41.3277 40.2935 41.3277<br />

MA PRM 66.0655 64.6227 – 68.2961<br />

Table 5.5: MEDIAN ADDED LENGTH OF PATHS FROM STARTING POINT TO ALL ROIS<br />

109


Patch Map Data-Structure<br />

path length [factor] Simple BB6D OPCODE Bullet<br />

<strong>RRT</strong> .904/1.060 .904/1.060 .926/1.158 .904/1.060<br />

MA <strong>RRT</strong> .947/1.042 .935/1.034 – .943/1.047<br />

Ev. <strong>RRT</strong> .923/1.104 .932/1.066 – .924/1.066<br />

PRM .838/1.212 .838/1.212 .843/1.291 .838/1.212<br />

MA PRM .797/1.094 .815/1.167 – .771/1.158<br />

Table 5.6: LOWER AND UPPER QUARTILE OF ADDED LENGTH OF PATHS FROM STARTING POINT<br />

TO ALL ROIS – The given values are the ratios of the quartiles to the medians.<br />

steep/total turns/ratio Simple BB6D OPCODE Bullet<br />

<strong>RRT</strong> 3/20/0.150 3/20/0.150 2/18/0.111 3/20/0.150<br />

MA <strong>RRT</strong> 3/18/0.167 3/18/0.167 – 2/18/0.111<br />

Ev. <strong>RRT</strong> 3/23/0.130 3/24/0.125 – 3/24/0.125<br />

PRM 7/14/0.500 7/14/0.500 4/8/0.500 7/14/0.500<br />

MA PRM 36/160/0.225 36/167/0.216 – 38/189/0.201<br />

Table 5.7: MEDIAN STEEP TURNS ON PATHS FROM STARTING POINT TO ROIS – Steep is defined<br />

to mean greater than 90°. The number given are median number of steep turns in paths/median total<br />

number of turns in paths/ratio.<br />

edge length which produces longer detours. The MA PRM on the other hand is designed to use<br />

detours along the medial axes of the map to maximize distance to obstacles. It achieves this better<br />

than the MA <strong>RRT</strong>, which immediately connects nodes moved to the medial axes. In table 5.6, lower<br />

and upper quartile of the path lengths are given as a factor of the respective median. We can see that<br />

the <strong>RRT</strong>-based algorithms are rather stable while the PRM based ones show slightly more variance.<br />

Paths in 3D are to be used by UAV (Unmanned Aerial Vehicle) or AUV (Autonomous Underwater<br />

Vehicle). Both types of vehicles suffer from kinematic and kinodynamic constraints, so steep turns<br />

are to be avoided. This is considered in two tables. Table 5.7 shows the median number of steep<br />

turns on the paths from the starting point to the ROIs and the total number of turns. A steep turn is a<br />

turn of over 90°. PRM produces paths with few total turns, however, many of these (50%) are steep.<br />

As with the length, this can be blamed on the unlimited edge length. MA PRM, on the other hand<br />

produces less than half as many steep turns. This is a property of the algorithm. (Unfortunately, it also<br />

produces many more vertices than any of the other algorithms.) However, the <strong>RRT</strong> based algorithms<br />

did significantly better still. Considering the variances, the lower quartiles are in table 5.8, the upper<br />

ones in table 5.9. Evasion <strong>RRT</strong> further shows its potential to create the smoothest paths all over. PRM<br />

steep/total turns/ratio Simple BB6D OPCODE Bullet<br />

<strong>RRT</strong> 2/17/0.118 2/17/0.118 1/16/0.062 2/17/0.118<br />

MA <strong>RRT</strong> 2/17/0.118 1/17/0.059 – 2/17/0.118<br />

Ev. <strong>RRT</strong> 2/21/0.095 2/22/0.091 – 2/22/0.091<br />

PRM 4/9/0.444 4/9/0.444 3/5/0.600 4/9/0.444<br />

MA PRM 22/91/0.242 22/95/0.232 – 24/95/0.253<br />

Table 5.8: LOWER QUARTILE OF STEEP TURNS ON PATHS FROM STARTING POINT TO ROIS – Steep<br />

is defined to mean greater than 90°. The number given are lower quartile of number of steep turns in<br />

paths/lower quartile total number of turns in paths/ratio.<br />

110


5.5 Algorithms on patch maps<br />

steep/total turns/ratio lower Simple BB6D OPCODE Bullet<br />

<strong>RRT</strong> 4/22/0.182 4/22/0.182 3/21/0.143 4/22/0.182<br />

MA <strong>RRT</strong> 4/20/0.200 4/20/0.200 – 4/20/0.200<br />

Ev. <strong>RRT</strong> 5/25/0.200 5/25/0.200 – 5/25/0.200<br />

PRM 9/18/0.500 9/18/0.500 7/11/0.636 9/18/0.500<br />

MA PRM 46/242/0.190 48/242/0.198 – 52/238/0.218<br />

Table 5.9: UPPER QUARTILE OF STEEP TURNS ON PATHS FROM STARTING POINT TO ROIS – Steep<br />

is defined to mean greater than 90°. The number given are upper quartile of number of steep turns in<br />

paths/upper quartile total number of turns in paths/ratio.<br />

curviness [deg] Simple BB6D OPCODE Bullet<br />

<strong>RRT</strong> 78.0185 78.0185 71.383 78.0185<br />

MA <strong>RRT</strong> 78.5776 77.6607 – 78.8892<br />

Ev. <strong>RRT</strong> 80.0567 78.5778 – 77.8766<br />

PRM 119.903 119.903 135.332 119.903<br />

MA PRM 86.8547 86.8317 – 86.5707<br />

Table 5.10: MEDIAN CURVINESS OF PATHS FROM STARTING POINT TO ROIS – Curviness is the<br />

upper quartile of turns on the paths from starting point to ROIs.<br />

has almost no variance. <strong>RRT</strong>, MA <strong>RRT</strong>, and MA PRM show little enough variance to be considered<br />

reliable.<br />

Table 5.10 confirms these findings. It shows the curviness, the upper quartile of the angle of all<br />

turns on paths to the ROIs. The path generated by the <strong>RRT</strong> based planners is similarly smooth, while<br />

MA PRM trails behind and PRM is the worst by far. The variance of the curviness is so small that all<br />

algorithms can be considered stable in this regard (table 5.11). The lower quartile is only once lower<br />

than 83% of the median: 79.1% for MA <strong>RRT</strong>/Simple. Only for the two algorithms on the OPCODE<br />

implementation does the upper quartile exceed 110% of the median: 115.3% and 115.8% for <strong>RRT</strong><br />

and PRM respectively.<br />

In conclusion, <strong>RRT</strong> stands out as the winner, especially concerning efficiency, However, PRM is<br />

a widely recognized standard algorithm, so it is worth further consideration. In terms of mapping<br />

frameworks, we found that Bullet, in addition to being functionally most complete, is also the fastest<br />

implementation.<br />

curviness [factor] Simple BB6D OPCODE Bullet<br />

<strong>RRT</strong> .850/1.093 .850/1.093 .858/1.153 .850/1.093<br />

MA <strong>RRT</strong> .791/1.048 .838/1.060 – .832/1.065<br />

Ev. <strong>RRT</strong> .913/1.042 .937/1.060 – .946/1.063<br />

PRM .876/1.080 .876/1.080 .905/1.158 .876/1.080<br />

MA PRM .973/1.028 .968/1.029 – .960/1.042<br />

Table 5.11: LOWER AND UPPER QUARTILE OF CURVINESS OF PATHS FROM STARTING POINT TO<br />

ROIS – Curviness is the upper quartile of turns on the paths from starting point to ROIs. The given<br />

values are the ratios of the quartiles to the medians.<br />

111


112<br />

Patch Map Data-Structure


Chapter 6<br />

3D Roadmaps for Unmanned Aerial<br />

Vehicles on Planar Patch Map<br />

In the previous chapter we introduced the patch map framework that represents the real world as a<br />

collection of patches. In this chapter, we set out to thoroughly test its usability for path planning.<br />

Based on the experiments conducted in section 5.5.3, we choose the Rapidly-exploring Random Tree<br />

(<strong>RRT</strong>) and the Probabilistic Roadmap Method (PRM).<br />

While <strong>RRT</strong> and PRM were originally intended for motion problems in high-dimensional spaces<br />

with non-holonomic constraints [LaValle and Kufner, 2001], they are also suitable in problems of low<br />

dimensionality [Koyuncu and Inalhan, 2008, Andert and Adolf, 2009, Pettersson and Doherty, 2006].<br />

In our case, we do not use the configuration space [Lozano-Perez, 1983] and a point robot as originally<br />

proposed, but operate directly on the workspace. We check for collisions with a bounding sphere.<br />

This means that we do not need to handle robot orientation. The three remaining degrees of freedom<br />

correspond to the three spatial dimensions. These restrictions mean that generated paths will most<br />

likely be unusable for fixed-wing Unmanned Aerial Vehicles (UAV). They are, however, usable by<br />

AUV and Vertical Take-off and Landing (VTOL) UAV. On the positive side, this approach allows us<br />

to use available highly optimized professional grade collision detection frameworks.<br />

6.1 Experiments<br />

We implemented the <strong>RRT</strong> and PRM algorithm for the patch map. For now, we plan piecewiselinear<br />

paths to evaluate patch maps in path planning. That is sufficient to evaluate performance,<br />

correctness and path lengths. As we saw in the literature review (section 1.3), the piecewise-linear<br />

path can be smoothed according to the vehicles’ motion constraints in a separate later phase (e.g.<br />

[Andert and Adolf, 2009, Pettersson and Doherty, 2006, Hrabar, 2008]).<br />

The <strong>RRT</strong> algorithm from [LaValle, 1998] is reproduced as algorithm 13. When applied to the<br />

patch map, RANDOM STATE() (line 4) is implemented as a uniformly distributed random point<br />

from the bounding box of all patches. The NEAREST NEIGHBOR() method (line 10) uses an octree.<br />

NEW STATE() in line 11 is implemented as an interference check.<br />

Algorithm 14 reproduces the original PRM algorithm. The method NEAREST NEIGHBORS()<br />

(line 5) is taken from CGAL [Tangelder and Fabri, 2008]. SAME COMPONENT() (line 8) is implemented<br />

with an associative array which associates each node with its component.<br />

113


3D Roadmaps for Unmanned Aerial Vehicles on Planar Patch Map<br />

Algorithm 13 The original <strong>RRT</strong> algorithm from [LaValle, 1998]<br />

1: proc BUILD <strong>RRT</strong>(x init )<br />

2: T.init(x init )<br />

3: for k ← 1 to K do<br />

4: x rand ← RANDOM STATE()<br />

5: EXTEND(T,x rand )<br />

6: end for<br />

7: return T<br />

8:<br />

9: proc EXTEND(T,x)<br />

10: x near ← NEAREST NEIGHBOR(x,T);<br />

11: if NEW STATE(x, x near , x new , u new ) then<br />

12: T.add vertex(x new );<br />

13: T.add edge(x near , x new , u new )<br />

14: if x new = x then<br />

15: return Reached<br />

16: else<br />

17: return Advanced<br />

18: end if<br />

19: end if<br />

20: return Trapped<br />

Algorithm 14 THE ORIGINAL PRM ALGORITHM FROM [KAVRAKI ET AL., 1996] – Notation<br />

adapted to that used in <strong>RRT</strong> algorithm<br />

1: T.init()<br />

2: for k ← 1 to K do<br />

3: x rand ← RANDOM STATE()<br />

4: X near ← N NEAREST NEIGHBORS(T,x rand , N)<br />

5: T.add vertex(x rand )<br />

6: for x near ∈ X near , in order of increasing ||x rand − x near || do<br />

7: if not SAME COMPONENT(T, x rand , x near ) and not COLLISION(x rand , x near ) then<br />

8: T.add edge(x rand , x near )<br />

9: end if<br />

10: end for<br />

11: end for<br />

114


6.1 Experiments<br />

Data structure and nearest neighbor search<br />

The basic data structure is an “adjacency list” type graph from the boost graph library [Siek et al., 2001]<br />

with 3D Cartesian points as vertices. For use with the <strong>RRT</strong>, it is supplemented by an octree à<br />

la [<strong>Poppinga</strong> et al., 2007] in which the vertices are additionally entered. This achieves a very quick<br />

nearest neighbor search. Since this octree implementation cannot store data on the nodes and since<br />

more than one vertex might fall into one cell, an additional data structure is used. A hashtable stores<br />

a mapping from an octree cell to a list of vertices. Because of the way the roadmap algorithms used<br />

here are constructed, nodes are quite sparse. Only very rarely do two or more points fall within one<br />

cell of the octree, so the list of vertices typically contains only one element.<br />

For the PRM, a CGAL kD-tree [Tangelder and Fabri, 2008] is used instead of an octree. This<br />

provides for an n-nearest-neighbors-search.<br />

NEW STATE() and COLLISION()<br />

The NEW STATE() and COLLISION() functions are where the collision detection takes place. They<br />

are implemented by each mapping framework used. In our experiments, we used the Bullet physics<br />

library [Coumans, 2009] and our point cloud map (see section 5.4.2). Since Bullet’s collision primitives<br />

do not include non-convex polygons with holes, the planar patches were triangulated using the<br />

Delaunay triangulation from CGAL [Yvinec, 2008].<br />

The function NEW STATE(x, x near , x new , u new ) explores one possible extension of the roadmap.<br />

First, it defines u new , a new edge from x to x near . Then it shortens u new to a fixed distance from x<br />

such that it ends in x new . Then it checks for collisions. The return value depends on whether a sphere<br />

cast from x near to x new has a collision with any of the trimeshes (or point clouds in the case of the<br />

PC map).<br />

The simpler COLLISION(x, y) only returns if there is a collision if a sphere is cast from x to y.<br />

6.1.1 Explored Volume<br />

Apart from the time taken, we also consider the explored volume of the algorithms for each map. Each<br />

roadmap was entered into an octree specific to the map and algorithm. Each node was entered as a<br />

sphere, each edge as a cylinder with the radius also used in building the roadmap. All octrees had the<br />

same dimensions: The smallest octree covering all maps. It had a grid size of 1024 3 , resulting in a cell<br />

side length of 38.684 mm. A real-world example can be found in figure 6.1, an illustrative example in<br />

figure 6.2.<br />

In the case of the PRM algorithm, the collision detection for the point cloud map did not work<br />

reliably. This was probably caused by the very short edges occurring. We implemented our own<br />

collision detection based on CGAL’s kD-tree [Tangelder and Fabri, 2008], see section 5.4.2. While<br />

the times given for the purpose of speed comparison are for the Bullet-based maps, the explored<br />

volume was determined using our method of collision detection. Incidently our algorithm was roughly<br />

twice as fast as Bullet was in this special case.<br />

6.1.2 Implementation<br />

The algorithm was implemented under GNU/Linux in C++ based on the <strong>Jacobs</strong> Intelligent Robotics<br />

Library and the libraries mentioned above. It ran on a PC with an Intel Core i7 920 CPU with<br />

2.67 GHz, 8 MB cache and 6 GB memory (not all of which could be used, but it was not needed)<br />

115


3D Roadmaps for Unmanned Aerial Vehicles on Planar Patch Map<br />

Figure 6.1: REAL-WORLD EXAMPLE FOR AN EXPLORED VOLUME – A cross-section parallel to the<br />

XY-plane, close to the floor. Black pixels correspond to voxels that have been explored by one or<br />

more instances of the <strong>RRT</strong> on the Lab map.<br />

116


6.2 Results<br />

(a) Volume explored on patch map<br />

(points added for illustration)<br />

(b) Volume explored on point cloud<br />

map (patches added for illustration)<br />

(c) Comparison: blue – explored on<br />

plane map, but not on point cloud map;<br />

green – explored on point cloud map,<br />

but not on plane map; black – explored<br />

on both maps<br />

Figure 6.2: USING EXPLORED VOLUME TO COMPARE DIFFERENT MAP TYPES – Synthetic examples<br />

under Ubuntu Linux 9.04 32-bit in a single thread. The program was started four times, once for each<br />

map. First the map was read, than the path-planning algorithm was executed from scratch 96 times.<br />

Time was measured with the POSIX method gettimeofday(). Times exclude loading the maps<br />

and entering road-maps into the octree. While the latter is linear dependent on the number of nodes<br />

and edges in the roadmap, the former varies depending on the type of map. For the lab maps, loading<br />

the plane map took 6.0 s, the hybrid map took 6.8 s, the trimesh map took 21.3 s, the point cloud map<br />

took 22.8 s.<br />

6.1.3 Datasets<br />

We used the Lab, Crashed Car Park, and Lesumsperrwerk datasets. For an extensive presentation,<br />

please refer to chapter 3. These datasets were processed with the plane extraction algorithm from<br />

[<strong>Poppinga</strong> et al., 2008b], which returned for each point cloud a list of planar patches and a remaining<br />

point cloud of points which were found not to belong to planar areas. To get global poses,<br />

the lists of planar patches were registered with the technique developed in [Pathak et al., 2010a,<br />

Pathak et al., 2010b]. With these, maps in the different patch map frameworks were generated. The<br />

“planes” maps use just the planar patches, for the “hybrid” maps, the remainder point clouds were<br />

transformed to trimeshes and used alongside the planar patches, for the “trimesh” maps, the complete<br />

point clouds were triangulated, and for the “pc” maps, just the complete point clouds were used. For<br />

preliminary comparison, the actual file sizes of the maps and other properties can be found in table 6.1.<br />

Aside from the geometrical features described above, the files contain uncertainty information (a Hessian<br />

matrix) for each planar patch and color, string ID, transparency and group number for each patch.<br />

As an example for the used maps, perspective views of the planes map of the Lab dataset can be found<br />

in figure 6.3.<br />

6.2 Results<br />

6.2.1 Results of <strong>RRT</strong><br />

The path planner was run from four different starting locations (in the four corners of the room for<br />

the lab dataset, spread across the site for the crashed car park dataset) with 24 different seeds for the<br />

117


3D Roadmaps for Unmanned Aerial Vehicles on Planar Patch Map<br />

Map type File size Patches Triangles<br />

planar trimesh pc<br />

hybrid 8.7 MB 5906 4142 – 345533<br />

pc 54.0 MB – – 29 –<br />

planes 3.0 MB 5906 0 – 86174<br />

trimesh 106.4 MB 0 17273 – 5600612<br />

Table 6.1: THE ACTUAL FILE SIZES OF THE DIFFERENT MAP TYPES OF THE LAB DATASET – The<br />

median number of points in a polygon of a planar patch is 11. The median number of points in a point<br />

cloud is 163,439.<br />

(a) Perspective wire-frame view<br />

(b) Perspective surface view<br />

(c) Top surface view.<br />

(d) Inside view with many details, including multiple levels<br />

Figure 6.3: THE LAB MODEL THE ROADMAP EXPERIMENTS ARE RUN ON. – The robot location is<br />

shown by a red circle at the height of the sensor. An X3D model and video files for this dataset can<br />

be found at http://robotics.jacobs-university.de/projects/3Dmap.<br />

118


6.2 Results<br />

Map Volume [m 3 ] not in map [%]<br />

planes hybrid trimesh pc<br />

planes 9714.015 – 0.557 1.739 0.986<br />

hybrid 9693.171 0.343 – 1.551 0.803<br />

trimesh 9564.215 0.200 0.223 – 0.347<br />

pc 9651.058 0.340 0.370 1.244 –<br />

Table 6.2: EXPLORED VOLUME ON DIFFERENT MAPS FOR THE <strong>RRT</strong> ON THE LAB DATASET – The<br />

four columns on the right give the volume that was explored on the map in the line, but not on the map<br />

in the column as a percentage of the total volume explored on the map in the line (which is given in<br />

the second column).<br />

random number generator. Figure 6.4 shows how long the <strong>RRT</strong> algorithm took on the four different<br />

map types in the lab dataset. The performance of the Crashed Car Park dataset is in figure 6.5.<br />

The first thing to notice is that the trimesh and point cloud maps are more than an order of magnitude<br />

slower than plane and hybrid maps. For lower numbers of iterations (as they would be used in<br />

a real world planning scenario), exploration on the trimesh map is faster than on the point cloud map.<br />

For higher numbers, we can see that the point cloud map offers higher speed and also slower growth<br />

than the trimesh map. Also, the trimesh map shows a much higher variance in speed than any other<br />

map. This is most likely due to it being over-restrictive (see below). But even the lower quartile is<br />

still in the area of the PC map’s median. The fact the increase of time needed is less than linear can<br />

be explained by the structure of the maps. For both datasets, a large majority of collision objects is<br />

concentrated in the center of the map. Outside of that, there are only few collision objects. These typically<br />

represent objects that are outside the mapped area, but visible under certain circumstances, e.g.<br />

through a window or through a door left ajar. The planning starts inside the main structure. Initially,<br />

there are many collision objects to check in each expansion. But when the <strong>RRT</strong> has nodes outside the<br />

main structure on more and more sides, more and more expansions of the <strong>RRT</strong> can be done with few<br />

or no collision objects to check.<br />

For a useful number in a real world scenario of 500 iterations, <strong>RRT</strong> growing on both trimesh<br />

and PC map needs well over 4 seconds. This allows for the generation of one roadmap on start-up.<br />

If, however, frequent re-planning should be used, this run-time is too high. In contrast, on hybrid<br />

and plane map 500 iterations take less then 0.4 seconds. This allows for frequent re-planning on a<br />

powerful computer or infrequent planning on a slower one.<br />

Our proposed method of using planes and polygons as an abstraction for a number of points<br />

obviously decreases the number of collision objects. A speed advantage should hence surprise no<br />

one. But we also have to address the question if our method is a valid simplification or a distortion<br />

of the original sensor data. To this end we compare the explored volume of each method. Since it<br />

was stored in an octree of identical dimensions for each map, we can subtract the explored volumes<br />

from one another to see how much volume was covered on one map but not on another (congruence,<br />

Table 6.2).<br />

We can see the overall congruence is high. The biggest differences arise with the trimesh. In<br />

the second column we see that the explored volume is lowest on the trimesh map. The fifth column<br />

affirms the interpretation that the trimesh is over-restrictive: all maps allow for exploration of quite<br />

some volume that is not explored on the trimesh map. This shows that the naive trimesh growing<br />

algorithms presented in Section 6.1 in some instances closes gaps between points that should have<br />

119


3D Roadmaps for Unmanned Aerial Vehicles on Planar Patch Map<br />

Map Volume [m 3 ] not in map [%]<br />

planes hybrid trimesh pc<br />

planes 106379.8199 – 2.760 4.145 3.458<br />

hybrid 106062.0166 2.468 – 3.739 3.156<br />

trimesh 105153.1485 3.027 2.907 – 3.104<br />

pc 106073.6549 3.179 3.167 3.945 –<br />

Table 6.3: EXPLORED VOLUME ON DIFFERENT MAPS FOR THE <strong>RRT</strong> ON THE CRASHED CAR PARK<br />

DATASET – The four columns on the right give the volume that was explored on the map in the line,<br />

but not on the map in the column as a percentage of the total volume explored on the map in the line<br />

(which is given in the second column).<br />

been left open 1 . It is not surprising that naive trimesh growing performs sub-optimally, since a very<br />

naive algorithm for generation and no simplification were used. The inclusion of the trimesh map in<br />

the experiments is still justified to allow for a comparison of its speed as a gross approximation. The<br />

small incorrectness in exploration can be assumed not to have a decisive impact on the speed.<br />

Concerning the other maps, the incongruence is low enough to warrant calling the plane map<br />

and even more the hybrid map a valid simplification of the original point cloud data for all practical<br />

purposes. The small incorrectness can be compensated by low level obstacle avoidance.<br />

When interpreting the percentages in Table 6.2, it has to be taken into account that they result<br />

from a limited number of runs of a probabilistic algorithm. That means that even on identical maps,<br />

a comparison of the explored volume calculated in two different ways based on different sets of<br />

random seeds would yield small chance incongruences. This is evidenced by the percentage of volume<br />

explored on the hybrid map which was not explored on the plane map (0.343%). Since the plane map<br />

is a subset of the hybrid map, this percentage should be 0 ideally. To minimize this effect, in the<br />

calculation of the explored volume, 120 random seeds were used instead of 24, totalling 480 <strong>RRT</strong><br />

grown on each map. The number of iterations K was 10,000 to ensure a high coverage.<br />

The same standard has to be applied when evaluating the numbers for the Crashed Car Park dataset<br />

in which can be found in Table 6.3. Although the plane map is a subset of the hybrid map and the<br />

incongruence should be 0, it is 2.468%. By this measure, none of the other numbers are too high to<br />

claim the proposed method of patch based mapping inaccurate. Inspection of the difference octree<br />

revealed that the high numbers encountered in this map are explained by areas which are hard to reach<br />

from the starting points yet highly complex. Specifically, this refers to the underside of a large pile of<br />

rubble.<br />

6.2.2 Results of PRM<br />

The PRM was run with 24 different random seeds on the maps. The time taken can be found in<br />

figure 6.7 for the lab dataset and in figure 6.8 for the Crashed Car Park dataset. Again it can be<br />

noticed that the plane and hybrid maps are considerably faster than trimesh and PC maps. However,<br />

this effect becomes less pronounced as more iterations are performed. This is caused by the decreasing<br />

mean distance covered in collision checks as the PRM progresses. As vertices become denser, the<br />

nearest neighbors are found ever shorter distances. Empirical data of the distances covered in collision<br />

detection is depicted in figure 6.9. In this case they were collected on the hybrid map of the lab<br />

1 The possibility that the other maps leave gaps open that should have been closed can be excluded since we are using a<br />

map with a big loop: all parts of walls, ceiling and floor that were seen from afar (with big gaps between the points) have<br />

also been seen from up close and the point clouds resulting from all sightings overlap.<br />

120


6.2 Results<br />

100<br />

planes<br />

hybrid<br />

pc<br />

trimesh<br />

10<br />

1<br />

0.1<br />

0.01<br />

0 200 400 600 800 1000<br />

1000<br />

planes<br />

hybrid<br />

pc<br />

trimesh<br />

100<br />

10<br />

1<br />

0.1<br />

0 2000 4000 6000 8000 10000<br />

Figure 6.4: TIME TAKEN BY THE <strong>RRT</strong> ALGORITHM ON THE FOUR DIFFERENT MAP TYPES ON THE<br />

LAB DATASET. – The number of iterations (K in algorithm 13) is on the x-axis, time taken for one<br />

<strong>RRT</strong> on the y-axis. The lines represent the median time for one run, the error bars extend to lower<br />

and upper quartile. Please note the logarithmic scale on the y-axis. The <strong>RRT</strong> was grown from four<br />

different starting locations with 24 different random seeds at each.<br />

121


3D Roadmaps for Unmanned Aerial Vehicles on Planar Patch Map<br />

10<br />

pc<br />

trimesh<br />

hybrid<br />

planes<br />

1<br />

0.1<br />

0.01<br />

0.001<br />

0 200 400 600 800 1000<br />

10<br />

pc<br />

trimesh<br />

hybrid<br />

planes<br />

1<br />

0.1<br />

0.01<br />

0 2000 4000 6000 8000 10000<br />

Figure 6.5: TIME TAKEN BY THE <strong>RRT</strong> ALGORITHM ON THE FOUR DIFFERENT MAP TYPES ON THE<br />

CRASHED CAR PARK DATASET. – The number of iterations (K in algorithm 13) is on the x-axis,<br />

time taken for one <strong>RRT</strong> on the y-axis. The lines represent the median time for one run, the error bars<br />

extend to lower and upper quartile. Please note the logarithmic scale on the y-axis. The <strong>RRT</strong> was<br />

grown from four different starting locations with 24 different random seeds at each.<br />

122


6.2 Results<br />

Figure 6.6: A visualization of an <strong>RRT</strong> generated in 100 iterations in the <strong>Jacobs</strong> Robotics lab hybrid<br />

map.<br />

dataset with random seed 1. They are typical of all datasets, maps, and random seeds. The shorter the<br />

distance covered in the collision check, the less likely it is that a collision object is encountered. The<br />

fewer collision objects encountered, the less prominent the distances between the map types whose<br />

differences are the type of collision object used.<br />

The congruences are in tables 6.4 and 6.5. As with the <strong>RRT</strong>, the incongruences are higher on the<br />

Crashed Car Park map. However, the overall congruence is high enough to call the simplification true<br />

to the original data.<br />

Map Volume [m 3 ] not in map [%]<br />

planes hybrid trimesh pc<br />

planes 11859.6544 – 0.160 0.737 0.387<br />

hybrid 11841.7095 0.008 – 0.593 0.251<br />

trimesh 11776.3023 0.035 0.041 – 0.056<br />

pc 11818.7659 0.042 0.057 0.416 –<br />

Table 6.4: EXPLORED VOLUME ON DIFFERENT MAPS FOR THE PRM ON THE LAB DATASET – The<br />

four columns on the right give the volume that was explored on the map in the line, but not on the map<br />

in the column as a percentage of the total volume explored on the map in the line (which is given in<br />

the second column).<br />

123


3D Roadmaps for Unmanned Aerial Vehicles on Planar Patch Map<br />

100<br />

pc<br />

trimesh<br />

hybrid<br />

planes<br />

10<br />

1<br />

0.1<br />

0.01<br />

0 200 400 600 800 1000<br />

1000<br />

pc<br />

trimesh<br />

hybrid<br />

planes<br />

100<br />

10<br />

1<br />

0 2000 4000 6000 8000 10000<br />

Figure 6.7: TIME TAKEN BY THE PRM ALGORITHM ON THE FOUR DIFFERENT MAP TYPES ON<br />

THE LAB DATASET. – The number of iterations (K in algorithm 14) is on the x-axis, time taken for<br />

one PRM on the y-axis. The lines represent the median time for one run, the error bars extend to<br />

lower and upper quartile. Please note the logarithmic scale on the y-axis. The PRM was grown with<br />

24 different random seeds.<br />

124


6.2 Results<br />

100<br />

pc<br />

trimesh<br />

hybrid<br />

planes<br />

10<br />

1<br />

0.1<br />

0.01<br />

0 200 400 600 800 1000<br />

1000<br />

pc<br />

trimesh<br />

hybrid<br />

planes<br />

100<br />

10<br />

1<br />

0.1<br />

0 2000 4000 6000 8000 10000<br />

Figure 6.8: TIME TAKEN BY THE PRM ALGORITHM ON THE FOUR DIFFERENT MAP TYPES ON<br />

THE CRASHED CAR PARK DATASET. – The number of iterations (K in algorithm 14) is on the x-<br />

axis, time taken for one PRM on the y-axis. The lines represent the median time for one run, the error<br />

bars extend to lower and upper quartile. Please note the logarithmic scale on the y-axis. The PRM<br />

was grown with 24 different random seeds.<br />

125


3D Roadmaps for Unmanned Aerial Vehicles on Planar Patch Map<br />

Collisioncheckdistance[mm]<br />

Collisionchecknumber<br />

Figure 6.9: COLLISION DETECTION DISTANCE FOR PRM – The graph shows distance checked in<br />

collision detection over the course of 10,000 iterations with random seed 1 on the hybrid map on the<br />

lab dataset. There are typically 6 collision checks per iteration. Please note the logarithmic scale on<br />

the y-axis.<br />

Map Volume [m 3 ] not in map [%]<br />

planes hybrid trimesh pc<br />

planes 115177.2955 – 0.134 0.332 0.261<br />

hybrid 115091.7210 0.060 – 0.216 0.179<br />

trimesh 114948.2374 0.134 0.091 – 0.173<br />

pc 115064.0317 0.162 0.155 0.274 –<br />

Table 6.5: EXPLORED VOLUME ON DIFFERENT MAPS FOR THE PRM ON THE CRASHED CAR PARK<br />

DATASET – The four columns on the right give the volume that was explored on the map in the line,<br />

but not on the map in the column as a percentage of the total volume explored on the map in the line<br />

(which is given in the second column).<br />

126


6.2 Results<br />

Figure 6.10: WITHOUT USING A BOUNDING BOX, ROAD-MAPS MAY BE PLANNED OUTSIDE THE<br />

VALID AREA – EVEN ABOVE WATER AS IN THIS CASE. – Yellow sphere (background) – starting<br />

point, green sphere – goal point, edges are shown as two cylinders each, a more opaque one to highlight<br />

graph structure and a more transparent one with the actual radius<br />

6.2.3 Lesumsperrwerk dataset<br />

To demonstrate the versatility of the roadmap algorithms on patch maps, we now present results on<br />

data from another kind of sensor: a 3D sonar. This sensor covers a much smaller fraction of the<br />

surrounding surfaces. This is due to the sonar’s small field of view (FOV), the small reflectivity of<br />

river bed and water surface, and the fact that no complete panorama can be collected – for practical<br />

purposes, a river is unbounded in two directions. Due to this limitation, instead of building a complete<br />

roadmap, a path from a given starting point to a goal was searched. The algorithms were adapted such<br />

that after each addition of a node, it was checked whether the goal was reachable from there directly.<br />

If yes, the algorithm terminated with success.<br />

Paths were searched for all combinations of three starting points and three end points with 24<br />

different random seeds each. The starting and goal points were chosen such that a path had to be<br />

planned from one of three positions upstream of the flood gate to one of three positions downstream<br />

of the flood gate. Due to the data being incomplete, a bounding box around the actually collected data<br />

had to be imposed to prevent the algorithms from finding invalid paths through areas where no data<br />

was collected (figure 6.10 illustrates what can go wrong without a bounding box).<br />

When planning a path from a starting point to a goal, the general direction in which the path<br />

must be planned is clear. We therefore tried to speed up the process by biasing the distribution of<br />

the random points towards the goal. Instead of a uniform distribution as usual, we used a normal<br />

distribution around the goal. Each dimension was handled separately. The variance was the distance<br />

of the goal to that border of the bounding box that is further away times a bias factor. Success rates<br />

of experiments with this variant are in figure 6.11. It can be seen that smaller bias factors contribute<br />

negatively to the success of the algorithm. Only when the factors get so large that they are irrelevant<br />

does the success resemble that of the unbiased algorithm.<br />

When the <strong>RRT</strong> is run on the incomplete data with a bounding box, from a number of starting<br />

127


3D Roadmaps for Unmanned Aerial Vehicles on Planar Patch Map<br />

1<br />

pc<br />

planes<br />

1<br />

pc<br />

planes<br />

0.8<br />

0.8<br />

fraction of successful runs<br />

0.6<br />

0.4<br />

fraction of successful runs<br />

0.6<br />

0.4<br />

0.2<br />

0.2<br />

0<br />

0<br />

0 20 40 60 80 100<br />

0 20 40 60 80 100<br />

iterations<br />

iterations<br />

(a) Bias factor 1,000<br />

(b) Bias factor 4,000<br />

1<br />

pc<br />

planes<br />

1<br />

pc<br />

planes<br />

0.8<br />

0.8<br />

fraction of successful runs<br />

0.6<br />

0.4<br />

fraction of successful runs<br />

0.6<br />

0.4<br />

0.2<br />

0.2<br />

0<br />

0<br />

0 20 40 60 80 100<br />

0 20 40 60 80 100<br />

iterations<br />

iterations<br />

(c) Bias factor 20,000<br />

(d) Bias factor 50,000<br />

1<br />

pc<br />

planes<br />

1<br />

pc<br />

planes<br />

0.8<br />

0.8<br />

fraction of successful runs<br />

0.6<br />

0.4<br />

fraction of successful runs<br />

0.6<br />

0.4<br />

0.2<br />

0.2<br />

0<br />

0<br />

0 20 40 60 80 100<br />

iterations<br />

(e) Bias factor 200,000<br />

0 20 40 60 80 100<br />

iterations<br />

(f) No bias<br />

Figure 6.11: FRACTION OF RUNS WHERE THE GOAL WAS REACHED FOR <strong>RRT</strong> ON THE FIRST<br />

TWO SCANS OF THE LESUMSPERRWERK DATASET – Radius was 5733mm, maximum step size was<br />

28666mm.<br />

128


6.2 Results<br />

0.0001<br />

pc<br />

planes<br />

0.001<br />

pc<br />

planes<br />

time/node [s]<br />

time/node [s]<br />

0.0001<br />

1e-05<br />

0 200 400 600 800 1000<br />

iterations<br />

(a) First scan, radius 2m, maximum step 4m<br />

1e-05<br />

0 200 400 600 800 1000<br />

iterations<br />

(b) First scan, radius 4m, maximum step 8m<br />

0.01<br />

pc<br />

planes<br />

0.01<br />

pc<br />

planes<br />

time/node [s]<br />

0.001<br />

time/node [s]<br />

0.001<br />

0.0001<br />

0 20 40 60 80 100<br />

0.0001<br />

0 20 40 60 80 100<br />

iterations<br />

iterations<br />

(c) First two scans, radius 5733mm, maximum step<br />

28666mm<br />

(d) Scan 8, radius 4m, maximum step 16m<br />

Figure 6.12: TIME TAKEN FOR DIFFERENT PARAMETERS ON DIFFERENT SCANS OF THE<br />

LESUMSPERRWERK DATA SET – With object oriented bounding boxes<br />

points to a number of goal points, but not biased, the plane map still allows for faster planning. This<br />

can be seen in figure 6.12 which contains results for maps generated from different scans from the<br />

dataset and different radii. Regardless of these variations, the plane map is always faster.<br />

129


130<br />

3D Roadmaps for Unmanned Aerial Vehicles on Planar Patch Map


Chapter 7<br />

Conclusion<br />

Autonomous navigation is an important challenge in many application fields of mobile robotics. It<br />

relieves the human operator of an essentially dull task and allows a gradual transition to the degree<br />

of robot autonomy most suited to a particular task. This thesis made contributions to mobile robot<br />

autonomous navigation at different levels from sensing to path planning.<br />

In chapter 2, different methods for 3D sensing were discussed. These were 3D laser range-finders,<br />

stereo vision, time-of-flight range cameras, and 3D sonars. In a case study, we evaluated a time-offlight<br />

range camera in various situations. We developed a method to detect and, if desirable, heuristically<br />

correct the most inhibitive error source. Range is detected via phase shift. Because of the way<br />

phase shift is measured, values beyond 360° will be mapped back into the [0°, 360°[ interval. The<br />

derived range is consequently also affected. The error is inherent not only to this model, but to all<br />

range cameras measuring the time-of-flight in the same way. Our correction makes it possible to use<br />

the range camera on a mobile robot for high-frequency 3D sensing.<br />

In chapter 3, we presented several 3D point cloud datasets gathered with the time-of-flight camera<br />

presented in the preceding chapter and other 3D sensors: an actuated laser range-finder, a stereo<br />

camera, and a 3D sonar. The datasets represent a wide variety of real-world and set-up scenes, indoors<br />

and outdoors, underwater and land, and small scale and large scale. These datasets were used in the<br />

experiments in the following chapters.<br />

In chapter 4, we presented a robust short-range obstacle detection algorithm that runs fast enough<br />

to actually make use of the range camera’s high update frequency. It is based on the Hough transform<br />

for planes in 3D point clouds. The bins of the discritized Hough space were relatively coarse which<br />

turned out to be enough for testing for drivability. The method allows a mobile robot to reliably detect<br />

the drivability of the terrain it faces. With the same method and finer bins it is possible to classifiy the<br />

terrain, albeit not as reliably as checking drivability. Experiments with two types of sensors on data<br />

from indoors and outdoors demonstrated the algorithm’s performance. The processing time typically<br />

lies between 5 and 50 ms. This is enough for real-time processing on a robot moving at reasonable<br />

speed.<br />

In chapter 5, we developed the Patch Map data-structure for memory efficient 3D mapping based<br />

on planar surfaces extracted from 3D point cloud data. It is flexible enough to allow for different kind<br />

of surface representations, different methods of collision detection, and different kinds of roadmap<br />

algorithms. Surface representations are planar polygons and trimeshes. The method is extendable to<br />

other surface representations like quadrics or collision detection library primitives. We surveyed and<br />

benchmarked different methods of generating planar patches from a point cloud segmented into planar<br />

regions. The α-shape algorithm provided the best results. We also implemented a point cloud based<br />

131


Conclusion<br />

variant of the map data-structure that allows for comparison with this standard data-structure. Collision<br />

detection was implemented in a way to exploit the fact that planar patches are polygons and also<br />

based on two external collision detection libraries, Bullet and OPCODE. The implemented roadmap<br />

algorithms were Rapidly-exploring Random Tree (<strong>RRT</strong>), Probabilistic Roadmap Method (PRM) and<br />

variants of these, most notably variants of <strong>RRT</strong> and PRM that place vertices on the medial axes of<br />

the map without explicitly computing it. We benchmarked all collision detection methods with all<br />

roadmap algorithms on synthetic data to find out the most efficient ones. The test scenario was finding<br />

paths from a starting point to four regions of interest. <strong>RRT</strong> was the best roadmap algorithm. The<br />

tested criteria were time, path length and path smoothness Bullet was the fastest collision detection<br />

method. Thus, the Patch map data-structure showed its flexibility and usability in practice.<br />

In chapter 6, we thoroughly tested the Patch Map data-structure developed in the preceding chapter<br />

on real-world data to investigate size reduction, performance and fidelity. In terms of size, the Patch<br />

Map consisting only of planar patches was 18 times smaller than the point clouds it was based on. We<br />

performed both roadmap generation with PRM and <strong>RRT</strong> and way finding from start to goal, based on<br />

<strong>RRT</strong>. We compared our approach to the established methods of trimeshes and point clouds and found<br />

that it performs an order of magnitude faster on 3D LRF data and also considerably better on sonar<br />

data. We also showed that this speed advantage does not come at the cost of loss of precision. To<br />

this end, we compared the total explorable space on the different map representations and found they<br />

only marginally differ. In summary, the Patch Map has proven to be a viable alternative to standard<br />

methods that is much more economical with memory.<br />

132


Appendix A<br />

Addenda & Errata<br />

These addenda have not been reviewed by the referees.<br />

• Section 2.5.5, Identification of erroneous pixels: It is interesting to note that in active sensing,<br />

three quadratic relations impact the number of photons per pixels (the amplitude). Let r be<br />

the distance to an obstacle. The number of photons from the illumination unit per fixed area<br />

on an obstacle is inversely proportionate to r 2 . We assume the pin-hole camera-model Since<br />

the obstacle typically reflects diffusely, the number photons received from a fixed area on an<br />

obstacle in a fixed area (i.e. a pixel) is also inversely proportionate to r 2 . But then again, the<br />

area mapped to a pixel is directly proportionate to r 2 . The last two effects cancel each other<br />

out, a simple inverse proportionality to r 2 remains.<br />

• Section 3.4: In a point cloud with equally distributed points filling a sphere, the value for<br />

∅˜r /max r would be 2 − 1/3<br />

or roughly 0.8.<br />

• Section 5.1: An advantage of using a canonical coordinate system is smaller memory size: It<br />

only needs four floating point numbers (three for n, one for d) as opposed to six or seven for<br />

a full transform (three for the translational part and three or four for rotational part, depending<br />

on whether full traditional form is stored (e.g. unit quaternion or axis and angle) or compressed<br />

(unit quaternion has only three DOF, the axis can be stored as a unit vector which has only<br />

two). However, this advantage vanishes once we consider that the vertices need to be stored as<br />

well, each requiring two further floating point numbers. Assuming only ten vertices, memory<br />

requirements compare as 10˙2 + 4 = 24 for a canonical coordinate system versus 10˙2 + 7 = 27<br />

for a full transform. The latter is only 11% more. However, most planar patches have a lot more<br />

vertices such that the difference becomes negligible.<br />

• Section 5.3.2, Generation of Planar Patches: One step towards fewer intersection using late<br />

projection with ARLF scans is this: In the range image, the column containing the ±90°-points<br />

should be invalidated (with the possible exception of the central pixel).<br />

• Section 5.3.2, Generation of Planar Patches: Another important source for intersections being<br />

generated by late projection on ALRF is that the LRF is not identical with the rotation origin.<br />

An example for a unsimplyfiable outline can be found in figure A.1.<br />

• Section 5.4.2, BulletPatchMap: A possible alternative to triangulating planar patches is decomposing<br />

them into convex polygons. CGAL provides functions for that and convex polygons are<br />

133


Addenda & Errata<br />

(a) Outline on the<br />

range image<br />

(b) Outline projected to<br />

optimal region, cropped<br />

to bounding box of projected<br />

points<br />

Figure A.1: UNSIMPLIFYABLE OUTLINE<br />

a primitive of Bullet. For a given planar patch, the number of convex polygons would obviously<br />

be much lower than the number of triangles. Theoretically, this would imply a speed up.<br />

However, it is not clear whether this speed-up would really materialize, since triangle meshes<br />

are much more commonly used than sets of convex polygons, so there will be quite some optimizations<br />

aimed at the former case.<br />

134


References<br />

For the readers’ (and the author’s) convenience, hyperlinks to electronic versions of most references<br />

are supplied with the electronic version of this thesis. The anchor is =⇒.<br />

[Amato et al., 1998] Amato, N., Bayazit, O. B., Dale, L., Jones, H., and Vallejo, D. (1998). OBPRM:<br />

An Obstacle-Based PRM for 3D Workspaces. In Agarwal, P., Kavraki, L., and Mason, M., editors,<br />

Robotics : The Algorithmic Perspective. The Third Workshop on the Algorithmic Foundations of<br />

Robotics, pages 156–168. A.K. Peters.<br />

[Andert and Adolf, 2009] Andert, F. and Adolf, F. (2009). Online world modeling and path planning<br />

for an unmanned helicopter. Autonomous Robots. =⇒.<br />

[Aronov et al., 2004] Aronov, B., Asano, T., Katoh, N., Mehlhorn, K., and Tokuyama, T. (2004).<br />

Polyline fitting of planar points under min-sum criteria. In Fleischer, R. and Trippen, G., editors,<br />

Algorithms and Computation: 15th International Symposium ISAAC 2004, volume 3341 of LNCS,<br />

pages 77–88. Springer.<br />

[B. Buettgen et al., 2005] B. Buettgen et al. (2005). CCD/CMOS lock-in pixel for range imaging:<br />

Challenges, limitations and state-of-the-art. ETH 1st RIM Days.<br />

[Ballard, 1981] Ballard, D. H. (1981). Generalizing the hough transform to detect arbitrary shapes.<br />

Pattern Recognition, 13(3):111–122.<br />

[Baraff, 1992] Baraff, D. (1992). Dynamic Simulation of Non-Penetrating Rigid Bodies. <strong>PhD</strong> thesis,<br />

Computer Science Department, Cornell <strong>University</strong>.<br />

[Barate and Manzanera, 2008] Barate, R. and Manzanera, A. (2008). Automatic Design of Vision-<br />

Based Obstacle Avoidance Controllers Using Genetic Programming, volume 4926/2008, pages<br />

25–36. Springer Berlin / Heidelberg. =⇒.<br />

[Besl and McKay, 1992] Besl, P. J. and McKay, N. D. (1992). A method for registration of 3-d<br />

shapes. IEEE Trans. on Pattern Analysis and Machine Intelligence, 14(2):239–256.<br />

[Birk et al., 2007a] Birk, A., Pathak, K., <strong>Poppinga</strong>, J., Schwertfeger, S., and Chonnaparamutt, W.<br />

(2007a). Intelligent Behaviors in Outdoor Environments. In 13th International Conference<br />

on Robotics and Applications, Special Session on Outdoor Robotics - Taking Robots off road.<br />

IASTED.<br />

[Birk et al., 2007b] Birk, A., Pathak, K., <strong>Poppinga</strong>, J., Schwertfeger, S., Pfingsthorn, M., and Blow,<br />

H. (2007b). The <strong>Jacobs</strong> Test Arena for Safety, Security, and Rescue Robotics (SSRR). In WS on<br />

Performance Evaluation and Benchmarking for Intelligent Robots and Systems, Intern. Conf. on<br />

Intelligent Robots and Systems (IROS). IEEE Press.<br />

135


[Birk et al., 2010] Birk, A., Pathak, K., Vaskevicius, N., Pfingsthorn, M., <strong>Poppinga</strong>, J., and Schwertfeger,<br />

S. (2010). Surface Representations for 3D Mapping: A Case for a Paradigm Shift. KI -<br />

German Journal on Artificial Intelligence.<br />

[Birk et al., 2009a] Birk, A., <strong>Poppinga</strong>, J., and Pfingsthorn, M. (2009a). Using different Humanoid<br />

Robots for Science Edutainment of Secondary School Pupils. In Iocchi, L., Matsubara, H.,<br />

Weitzenfeld, A., and Zhou, C., editors, RoboCup 2008: Robot WorldCup XII, Lecture Notes in<br />

Artificial Intelligence (LNAI). Springer.<br />

[Birk et al., 2009b] Birk, A., <strong>Poppinga</strong>, J., Stoyanov, T., and Nevatia, Y. (2009b). Planetary Exploration<br />

in USARsim: A Case Study including Real World Data from Mars. In Iocchi, L., Matsubara,<br />

H., Weitzenfeld, A., and Zhou, C., editors, RoboCup 2008: Robot WorldCup XII, Lecture Notes in<br />

Artificial Intelligence (LNAI). Springer.<br />

[Birk et al., 2008] Birk, A., Stoyanov, T., Nevatia, Y., Ambrus, R., <strong>Poppinga</strong>, J., and Pathak, K.<br />

(2008). Terrain Classification for Autonomous Robot Mobility: from Safety, Security Rescue<br />

Robotics to Planetary Exploration. In Planetary Rovers Workshop, International Conference on<br />

Robotics and Automation (ICRA). IEEE.<br />

[Birk et al., 2009c] Birk, A., Vaskevicius, N., Pathak, K., Schwertfeger, S., <strong>Poppinga</strong>, J., and Buelow,<br />

H. (2009c). 3-D Perception and Modeling: Motion-Level Teleoperation and Intelligent Autonomous<br />

Functions. IEEE Robotics and Automation Magazine (RAM), December.<br />

[Blanco et al., 2007] Blanco, J.-L., González, J., and Fernndez-Madrigal, J.-A. (2007). Extending obstacle<br />

avoidance methods through multiple parameter-space transformations. Autonomous Robots,<br />

24(1):29–48. =⇒.<br />

[Borenstein and Koren, 1989] Borenstein, J. and Koren, Y. (1989). Real-time obstacle avoidance for<br />

fact mobile robots. Systems, Man and Cybernetics, IEEE Transactions on, 19(5):1179–1187. =⇒.<br />

[Bouabdallah et al., 2007] Bouabdallah, S., Becker, M., and Perrot, V. D. (2007). Computer obstacle<br />

avoidance on quadrotors. In Proceedings of the XII International Symposium on Dynamic Problems<br />

of Mechanics. =⇒.<br />

[Branicky et al., 2001] Branicky, M. S., Lavalle, S. M., Olson, K., and Yang, L. (2001). Quasirandomized<br />

path planning. In Robotics and Automation, 2001. Proceedings 2001 ICRA. IEEE<br />

International Conference on, volume 2, pages 1481–1487. =⇒.<br />

[Cacciola, 2008] Cacciola, F. (2008). 2d straight skeleton and polygon offsetting. In Board, C. E.,<br />

editor, CGAL User and Reference Manual. 3.4 edition.<br />

[Chao et al., 2009] Chao, C.-H., Hsueh, B.-Y., Hsiao, M.-Y., Tsai, S.-H., and Li, T.-H. S. (2009).<br />

Fuzzy target tracking and obstacle avoidance of mobile robots with a stereo vision system. International<br />

Journal of Fuzzy Systems, 11(3). =⇒.<br />

[Charnley and Blissett, 1989] Charnley, D. and Blissett, R. (1989). Surface reconstruction from outdoor<br />

image sequences. Image Vision Comput., 7(1):10–16. =⇒.<br />

[Chen and Tsai, 2000] Chen, K.-H. and Tsai, W.-H. (2000). Vision-based obstacle detection and<br />

avoidance for autonomous land vehicle navigation in outdoor roads. Automation in Construction,<br />

10:1–25. =⇒.<br />

136


[Chen et al., 2006] Chen, Y., Wu, K.-Q., Wang, D., and Chen, G. (2006). Elastic Algorithm: A New<br />

Path Planning Algorithm About Auto-navigation in 3D Virtual Scene, pages 1156–1165. Springer<br />

Berlin / Heidelberg. =⇒.<br />

[Cole and Newman, 2006] Cole, D. and Newman, P. (2006). Using laser range data for 3D SLAM<br />

in outdoor environments. In Robotics and Automation, 2006. ICRA 2006. Proceedings 2006 IEEE<br />

International Conference on, pages 1556–1563. =⇒.<br />

[Coumans, 2009] Coumans, E. (2009). Physics Simulation Forum. http://bulletphysics.<br />

com/. retrieved on September 1st, 2009.<br />

[Coumans, 2010] Coumans, E. (2010). Physics simulation forum – do these limitations still apply?<br />

http://bulletphysics.org/Bullet/phpBB3/viewtopic.php?f=9&t=3413, last accessed May 6th, 2010.<br />

[Courbon et al., 2009] Courbon, J., Mezouar, Y., Guenard, N., and Martinet, P. (2009). Visual navigation<br />

of a quadrotor aerial vehicle. In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ<br />

International Conference on, pages 5315–5320. =⇒.<br />

[CSEM, 2006] CSEM (2006). The SwissRanger, Manual V1.02. 8048 Zurich, Switzerland.<br />

[Da, 2008] Da, T. K. F. (2008). 2d alpha shapes. In Board, C. E., editor, CGAL User and Reference<br />

Manual. 3.4 edition.<br />

[Douglas and Peucker, 1973] Douglas, D. H. and Peucker, T. K. (1973). Algorithms for the reduction<br />

of number of points required to represent a line or its caricature. The Canadian Carthographer,<br />

10(2):112–122.<br />

[Edelsbrunner and Mücke, 1994] Edelsbrunner, H. and Mücke, E. (1994). Three-dimensional alpha<br />

shapes. ACM Transactions on Graphics, 3(1):43–72. =⇒.<br />

[Fainekos et al., 2005] Fainekos, G., Kress-Gazit, H., and Pappas, G. (2005). Temporal logic motion<br />

planning for mobile robots. In Robotics and Automation, 2005. ICRA 2005. Proceedings of the<br />

2005 IEEE International Conference on, pages 2020–2025. =⇒.<br />

[Fuchs and Hirzinger, 2008] Fuchs, S. and Hirzinger, G. (2008). Extrinsic and depth calibration of<br />

tof-cameras. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference<br />

on, pages 1–6. =⇒.<br />

[Fuchs and May, 2007] Fuchs, S. and May, S. (2007). Calibration and registration for precise surface<br />

reconstruction with ToF cameras. In Proceedings of the Dynamic 3D Imaging Workshop in<br />

Conjunction with DAGM (Dyn3D), volume I, Heidelberg, Germany. =⇒.<br />

[Gao et al., 2006] Gao, J., Chen, X., Zheng, D., Yilmaz, O., and Gindy, N. (2006). Adaptive restoration<br />

of complex geometry parts through reverse engineering application. Advances in Engineering<br />

Software, 37:592–600. =⇒.<br />

[Giezeman and Wesselink, 2008] Giezeman, G.-J. and Wesselink, W. (2008). 2d polygons. In Board,<br />

C. E., editor, CGAL User and Reference Manual. 3.4 edition.<br />

[Goodrich, 1995] Goodrich, M. T. (1995). Efficient piecewise-linear function approximation using<br />

the uniform matrix. Discrete Comput Geom, (14):445–462.<br />

137


[Gottschalk, 1997] Gottschalk, S. (1997). RAPID – Robust and Accurate Polygon Interference Detection.<br />

http://www.cs.unc.edu/˜geom/OBB/OBBT.html. retrieved on July 6th, 2009.<br />

[Gottschalk, 1998] Gottschalk, S. (1998). RAPID 2.01. http://www.cs.unc.edu/˜geom/<br />

OBB/request.html. retrieved on July 6th, 2009.<br />

[Guibas et al., 1991] Guibas, L. J., Hershberger, J. E., Mitchell, J. S. B., and Snoeyink, J. S. (1991).<br />

Approximating polygons and subdivisions with minimum link paths. In ISAAC: 2nd International<br />

Symposium on Algorithms and Computation (formerly SIGAL International Symposium on Algorithms),<br />

Organized by Special Interest Group on Algorithms (SIGAL) of the Information Processing<br />

Society of Japan (IPSJ) and the Technical Group on Theoretical Foundation of Computing of the<br />

Institute of Electronics, Information and Communication Engineers (IEICE)).<br />

[Gut, 2004] Gut, O. (2004). Untersuchungen des 3D-sensors swissranger. Master’s thesis, =⇒.<br />

[Haddad et al., 1998] Haddad, H., Khatib, M., Lacroix, S., and Chatila, R. (1998). Reactive navigation<br />

in outdoor environments using potential fields. In Robotics and Automation, 1998. Proceedings.<br />

1998 IEEE International Conference on, volume 2, pages 1232–1237. =⇒.<br />

[Hadsell et al., 2009] Hadsell, R., Sermanet, P., Scoffier, M., Erkan, A., Kavackuoglu, K., Muller, U.,<br />

and LeCun, Y. (2009). Learning long-range vision for autonomous off-road driving. Journal of<br />

Field Robotics, 26(2):120–144. =⇒.<br />

[Hähnel et al., 2003] Hähnel, D., Burgard, W., and Thrun, S. (2003). Learning Compact 3D Models<br />

of Indoor and Outdoor Environments with a Mobile Robot. Robotics and Autonomous Systems,<br />

44(1):15–27. =⇒.<br />

[Hajebi and Zelek, 2007] Hajebi, K. and Zelek, J. (2007). Dense surface from infrared stereo. In<br />

Applications of Computer Vision, 2007. WACV ’07. IEEE Workshop on, pages 21–26.<br />

[Harris and Pike, 1988] Harris, C. G. and Pike, J. M. (1988). 3D positional integration from image<br />

sequences. Image Vision Comput., 6(2):87–90.<br />

[Hart et al., 1968] Hart, P., Nilsson, N., and Raphael, B. (1968). A formal basis for the heuristic<br />

determination of minimum cost paths. Systems Science and Cybernetics, IEEE Transactions on,<br />

4(2):100 –107. =⇒.<br />

[Hasircioglu et al., 2008] Hasircioglu, I., Topcuoglu, H. R., and Ermis, M. (2008). 3-d path planning<br />

for the navigation of unmanned aerial vehicles by using evolutionary algorithms. In GECCO ’08:<br />

Proceedings of the 10th annual conference on Genetic and evolutionary computation, pages 1499–<br />

1506, New York, NY, USA. ACM. =⇒.<br />

[Heckbert and Garland, 1997] Heckbert, P. S. and Garland, M. (1997). Survey of polygonal surface<br />

simplification algorithms. Technical report, Pittsburgh.<br />

[Hershberger and Snoeyink, 1997] Hershberger, J. and Snoeyink, J. (1997). Cartographic Line Simplification<br />

and polygon CSG formulae in O(n log*n) time. Springer Berlin/Heidelberg.<br />

[Hert and Schirra, 2008] Hert, S. and Schirra, S. (2008). 2d convex hulls and extreme points. In<br />

Board, C. E., editor, CGAL User and Reference Manual. 3.4 edition.<br />

138


[Ho et al., 2001] Ho, S., Sarma, S., and Adachi, Y. (2001). Real-time interference analysis between<br />

a tool and an environment. Computer-Aided Design, 33(13):935 – 947. =⇒.<br />

[Howard et al., 2004] Howard, A., Wolf, D. F., and Sukhatme, G. S. (2004). Towards 3D Mapping in<br />

Large Urban Environments. In Proceedings of the IEEE/RSJ International Conference on Intelligent<br />

Robots and Systems (IROS). Sendai, Japan.<br />

[Hrabar, 2008] Hrabar, S. (2008). 3D path planning and stereo-based obstacle avoidance for rotorcraft<br />

UAVs. In Intelligent Robots and Systems, 2008. IROS 2008. IEEE/RSJ International Conference<br />

on, pages 807–814. =⇒.<br />

[Hwang and Chang, 2007] Hwang, C.-L. and Chang, L.-J. (2007). Trajectory tracking and obstacle<br />

avoidance of car-like mobile robots in an intelligent space using mixed h2/h decentralized control.<br />

Mechatronics, IEEE/ASME Transactions on, 12(3):345–352. =⇒.<br />

[Hwangbo et al., 2007] Hwangbo, M., Kuffner, J., and Kanade, T. (2007). Efficient two-phase 3D<br />

motion planning for small fixed-wing UAVs. In Robotics and Automation, 2007 IEEE International<br />

Conference on, pages 1035–1041. =⇒.<br />

[Imai and Iri, 1988] Imai, H. and Iri, M. (1988). Polygonal approximations of a curve – formulations<br />

and algorithms. In Toussaint, G. t., editor, Computational Morphology, pages 71–86. Elsevier<br />

Science Publishers.<br />

[Jun and D’Andrea, 2003] Jun, M. and D’Andrea, R. (2003). Path planning for unmanned aerial<br />

vehicles in uncertain and adversarial environments. In Butenko, S., Murpheym, R., and Pardalos,<br />

P., editors, In Cooperative Control: Models, Applications and Algorithms. Kluwer Academic<br />

Publishers. =⇒.<br />

[Kagami et al., 2003] Kagami, S., Kuffner, J. J., Nishiwaki, K., Okada, K., Inoue, H., and Inaba,<br />

M. (2003). Humanoid arm motion planning using stereo vision and <strong>RRT</strong> search. In Intelligent<br />

Robots and Systems, 2003. (IROS 2003). Proceedings. 2003 IEEE/RSJ International Conference<br />

on, volume 3, pages 2167–2172, Washington, DC, USA. IEEE Computer Society. =⇒.<br />

[Kavraki et al., 1996] Kavraki, L. E., Svestka, P., Latombe, J.-C., and Overmars, M. (1996). Probabilistic<br />

roadmaps for path planning in high dimensional configuration spaces. IEEE Trans. on<br />

Robotics and Automation, 12:566–580.<br />

[Kelly et al., 2006] Kelly, A., Amidi, O., Bode, M., Happold, M., Herman, H., Pilarski, T., Rander, P.,<br />

Stentz, A., Vallidis, N., , and Warner, R. (2006). Toward Reliable Off Road Autonomous Vehicles<br />

Operating in Challenging Environments, volume 21/2006 of Springer Tracts in Advanced Robotics,<br />

pages 599–608. Springer Berlin / Heidelberg. =⇒.<br />

[Keogh et al., 2003] Keogh, E., Chu, S., Hart, D., and Pazzani, M. (2003). Segmenting time series:<br />

A survey and novel approach. In Data Mining In Time Series Databases, chapter 1, pages 1–22.<br />

World Scientific Publishing Company.<br />

[Kim and Khosla, 1991] Kim, J.-O. and Khosla, P. (1991). Real-time obstacle avoidance using harmonic<br />

potential functions. In Robotics and Automation, 1991. Proceedings., 1991 IEEE International<br />

Conference on, pages 790–796 vol.1. =⇒.<br />

[Klein and Zachmann, 2004] Klein, J. and Zachmann, G. (2004). Point cloud collision detection.<br />

Computer Graphics Forum, 23(3). =⇒.<br />

139


[Konolige, 1999] Konolige, K. (1999). Stereo geometry. , last visited July 22nd, 2010.<br />

[Konolige and Beymer, 2006] Konolige, K. and Beymer, D. (2006). SRI Small vision system, user’s<br />

manual, software version 4.2.<br />

[Koyuncu and Inalhan, 2008] Koyuncu, E. and Inalhan, G. (2008). A probabilistic b-spline motion<br />

planning algorithm for unmanned helicopters flying in dense 3d environments. In Intelligent Robots<br />

and Systems, 2008. IROS 2008. IEEE/RSJ International Conference on, pages 815–821.<br />

[Kuffner and LaValle, 2000] Kuffner, J.J., J. and LaValle, S. (2000). <strong>RRT</strong>-connect: An efficient approach<br />

to single-query path planning. In Robotics and Automation, 2000. Proceedings. ICRA ’00.<br />

IEEE International Conference on, volume 2, pages 995–1001. =⇒.<br />

[Lange, 2000] Lange, R. (2000). 3D time-of-flight distance measurement with custom solid-state<br />

image sensors in CMOS/CCD-technology. <strong>PhD</strong> thesis, Department Of Electrical Engineering And<br />

Computer Science. =⇒<br />

[Lapierre et al., 2007] Lapierre, L., Zapata, R., and Lepinay, P. (2007). Combined Path-following and<br />

Obstacle Avoidance Control of a Wheeled Robot. The International Journal of Robotics Research,<br />

26(4):361–375. =⇒.<br />

[Larson et al., 2006] Larson, J., Bruch, M., and Ebken, J. (2006). Autonomous navigation and obstacle<br />

avoidance for unmanned surface vehicles. In SPIE Proc. 6230: Unmanned Systems Technology<br />

VIII, pages 17–20. =⇒.<br />

[LaValle, 1998] LaValle, S. (1998). Rapidly-Exploring Random Trees. Computer Science Department<br />

Iowa State <strong>University</strong>, October.<br />

[LaValle and Kufner, 2001] LaValle, S. and Kufner, J. (2001). Rapidly-Exploring Random Trees:<br />

Progress and Prospects. In Donald, B., Lynch, K., and Rus, D., editors, Algorithmic and Computational<br />

Robotics: New Directions, pages 45–59. A.K. Peters.<br />

[Lavalle and Kuffner, 2000] Lavalle, S. M. and Kuffner, J. J. (2000). Rapidly-exploring random trees:<br />

Progress and prospects. In Algorithmic and Computational Robotics: New Directions, pages 293–<br />

308. =⇒.<br />

[LeCun et al., 2005] LeCun, Y., Muller, U., Ben, J., Cosatto, E., and Flepp, B. (2005). Off-road<br />

obstacle avoidance through end-to-end learning. In Advances in Neural Information Processing<br />

Systems (NIPS 2005). MIT Press. =⇒.<br />

[Lewis, 2002] Lewis, M. (2002). Detecting surface features during locomotion using optic flow.<br />

In Robotics and Automation, 2002. Proceedings. ICRA ’02. IEEE International Conference on,<br />

volume 1, pages 305–310.<br />

[Lozano-Perez, 1983] Lozano-Perez, T. (1983). Spatial planning: A configuration space approach.<br />

Computers, IEEE Transactions on, C-32(2):108–120. =⇒.<br />

[Matthies et al., 1995] Matthies, L., Kelly, A., Litwin, T., and Tharp, G. (1995). Obstacle detection<br />

for unmanned ground vehicles: a progress report. In Proceedings of IEEE Intelligent Vehicles ‘95<br />

Conference, pages 66 – 71. =⇒.<br />

140


[Matthies et al., 2002] Matthies, L., Xiong, Y., Hogg, R., Zhu, D., Rankin, A., Kennedy, B., Hebert,<br />

M., Maclachlan, R., Won, C., Frost, T., Sukhatme, G., McHenry, M., and Goldberg, S. (2002).<br />

A portable, autonomous, urban reconnaissance robot. Robotics and Autonomous Systems, 40(2-<br />

3):163 – 172. =⇒.<br />

[May et al., 2009] May, S., Droeschel, D., Holz, D., Fuchs, S., Malis, E., Nüchter, A., and Hertzberg,<br />

J. (2009). Three-dimensional mapping with time-of-flight cameras. Journal of Field Robotics,<br />

26(11-12):934–965. =⇒.<br />

[May et al., 2006] May, S., Werner, B., Surmann, H., and Pervolz, K. (2006). 3D time-of-flight<br />

cameras for mobile robotics. In Intelligent Robots and Systems, 2006 IEEE/RSJ International<br />

Conference on, pages 790–795. =⇒.<br />

[Mayer et al., 2002] Mayer, L., Li, Y., and Melvin, G. (2002). 3D visualization for pelagic fisheries<br />

research and assessment. ICES J. Mar. Sci., 59(1):216–225. =⇒.<br />

[Meister et al., 2009] Meister, O., Frietsch, N., Ascher, C., and Trommer, G. (2009). Adaptive path<br />

planning for VTOL-UAVs. Aerospace and Electronic Systems Magazine, IEEE, 24(7):36 –41. =⇒.<br />

[MESA Imaging AG, 2006] MESA Imaging AG (2006).<br />

MESA Imaging AG, Zürich, Switzerland.<br />

SwissRanger SR-3000, Manual V1.02.<br />

[Michel et al., 2008] Michel, P., Chestnutt, J., Kagami, S., Nishiwaki, K., Kuffner, J., and Kanade,<br />

T. (2008). GPU-accelerated real-time 3D tracking for humanoid autonomy. In Proceedings of the<br />

JSME Robotics and Mechatronics Conference (ROBOMEC’08). =⇒.<br />

[Michels et al., 2005] Michels, J., Saxena, A., and Ng, A. Y. (2005). High speed obstacle avoidance<br />

using monocular vision and reinforcement learning. In ICML ’05: Proceedings of the 22nd<br />

international conference on Machine learning, pages 593–600, New York, NY, USA. ACM. =⇒.<br />

[Mihailidis et al., 2007] Mihailidis, A., Elinas, P., Boger, J., and Hoey, J. (2007). An intelligent powered<br />

wheelchair to enable mobility of cognitively impaired older adults: An anticollision system.<br />

Neural Systems and Rehabilitation Engineering, IEEE Transactions on, 15(1):136–143. =⇒.<br />

[Milroy et al., 1996] Milroy, M. J., Weir, D. J., Bradley, C., and Vickers, G. W. (1996). Reverse<br />

engineering employing a 3d laser scanner: A case study. International Journal of Advanced Manufacturing<br />

Technology, 12(2):111–121. =⇒.<br />

[Moravec, 1980] Moravec, H. (1980). Obstacle avoidance and navigation in the real world by a<br />

seeing robot rover. Technical report, Stanford <strong>University</strong>. Available as Stanford AIM-340, CS-<br />

80-813 and republished as a Carnegie Mellon <strong>University</strong> Robotics Institue Technical Report to<br />

increase availability, =⇒<br />

[Mulligan et al., 2002] Mulligan, J., Isler, V., and Daniilidis, K. (2002). Trinocular stereo: A realtime<br />

algorithm and its evaluation. International Journal of Computer Vision, 47:51–61. =⇒.<br />

[Murphy et al., 2001] Murphy, R. R., Casper, J., and Micire, M. (2001). Potential Tasks and Research<br />

Issues for Mobile Robots in RoboCup Rescue. In Stone, P., Balch, T., and Kraetszchmar, G.,<br />

editors, RoboCup-2000: Robot Soccer World Cup IV, volume 2019 of Lecture notes in Artificial<br />

Intelligence (LNAI), pages 339–334. Springer Verlag. =⇒.<br />

141


[Mustafa, 2004] Mustafa, N. H. (2004). Simplification, Estimation and Classification of Geometric<br />

Objects. <strong>PhD</strong> thesis, Department of Computer Science, Duke <strong>University</strong>.<br />

[Newman et al., 2006] Newman, P., Cole, D., and Ho, K. (2006). Outdoor slam using visual appearance<br />

and laser ranging. In Robotics and Automation, 2006. ICRA 2006. Proceedings 2006 IEEE<br />

International Conference on, pages 1180–1187. =⇒.<br />

[Nüchter et al., 2004] Nüchter, A., Surmann, H., Lingemann, K., Hertzberg, J., and Thrun, S. (2004).<br />

6d slam with an application in autonomous mine mapping. In Robotics and Automation, 2004.<br />

Proceedings. ICRA ’04. 2004 IEEE International Conference on, volume 2, pages 1998 – 2003.<br />

=⇒.<br />

[Oggier et al., 2003] Oggier, T., Lehmann, M., Kaufmann, R., Schweizer, M., Richter, M., Metzler,<br />

P., Lang, G., Lustenberger, F., and Blanc, N. (2003). An all-solid-state optical range camera for<br />

3D real-time imaging with sub-centimeter depth resolution (swissranger). In Proceedings of SPIE,<br />

volume SPIE-5249, pages 534–545. =⇒.<br />

[Ogren and Leonard, 2005] Ogren, P. and Leonard, N. (2005). A convergent dynamic window approach<br />

to obstacle avoidance. Robotics, IEEE Transactions on, 21(2):188–195. =⇒.<br />

[Ohya et al., 1998] Ohya, A., Kosaka, A., and Kak, A. (1998). Vision-based navigation by a mobile<br />

robot with obstacle avoidance using single-camera vision and ultrasonic sensing. IEEE Transactions<br />

on Robotics and Automation. =⇒.<br />

[Okada et al., 2001] Okada, K., Kagami, S., Inaba, M., and Inoue, H. (2001). Plane segment finder:<br />

Algorithm, implementation and applications. In Robotics and Automation, 2001. Proceedings 2001<br />

ICRA. IEEE International Conference on, volume 2, pages 2120–2125, Washington, DC, USA.<br />

IEEE Computer Society. =⇒.<br />

[Oliensis, 2000] Oliensis, J. (2000). A critique of structure-from-motion algorithms. Computer Vision<br />

and Image Understanding, 80(2):172 – 214. =⇒.<br />

[Overmars and Geraerts, 2007] Overmars, M. H. and Geraerts, R. (2007). The corridor map method:<br />

a general framework for real-time high-quality path planning. Computer Animation and Virtual<br />

Worlds, 18:107–119(13). =⇒.<br />

[Parasuraman et al., 2000] Parasuraman, R., Sheridan, T. B., and Wickens, C. D. (2000). A model for<br />

types and levels of human interaction with automation. IEEE Transactions on Systems, Man, and<br />

Cybernetics, Part A: Systems and Humans, 30(3):286–297. =⇒.<br />

[Pathak et al., 2008] Pathak, K., Birk, A., and <strong>Poppinga</strong>, J. (2008). Subpixel Depth Accuracy with<br />

a Time of Flight Sensor using Multimodal Gaussian Analysis. In International Conference on<br />

Intelligent Robots and Systems (IROS), Nice, France. IEEE Press.<br />

[Pathak et al., 2007a] Pathak, K., Birk, A., <strong>Poppinga</strong>, J., and Schwertfeger, S. (2007a). 3d forward<br />

sensor modeling and application to occupancy grid based sensor fusion. In IEEE/RSJ Int. Conf. on<br />

Intelligent Robots and Systems, San Diego.<br />

[Pathak et al., 2007b] Pathak, K., Birk, A., Schwertfeger, S., and <strong>Poppinga</strong>, J. (2007b). 3D Forward<br />

Sensor Modeling and Application to Occupancy Grid Based Sensor Fusion. In International Conference<br />

on Intelligent Robots and Systems (IROS), pages 2059 – 2064, San Diego, USA. IEEE<br />

Press.<br />

142


[Pathak et al., 2010a] Pathak, K., Birk, A., Vaskevicius, N., Pfingsthorn, M., Schwertfeger, S., and<br />

<strong>Poppinga</strong>, J. (2010a). Online 3D SLAM by Registration of Large Planar Surface Segments and<br />

Closed Form Pose-Graph Relaxation. Journal of Field Robotics, Special Issue on 3D Mapping,<br />

27(1):52–84.<br />

[Pathak et al., 2010b] Pathak, K., Birk, A., Vaškevičius, N., and <strong>Poppinga</strong>, J. (2010b). Fast registration<br />

based on noisy planes with unknown correspondences for 3-d mapping. Robotics, IEEE<br />

Transactions on, 26(3):424 –441. =⇒.<br />

[Pathak et al., 2009] Pathak, K., Vaskevicius, N., <strong>Poppinga</strong>, J., Pfingsthorn, M., Schwertfeger, S., and<br />

Birk, A. (2009). Fast 3D Mapping by Matching Planes Extracted from Range Sensor Point-Clouds.<br />

In International Conference on Intelligent Robots and Systems (IROS). IEEE Press.<br />

[Pellenz, 2007] Pellenz, J. (2007). Rescue robot sensor design: An active sensing approach. In Proceedings<br />

of the Fourth International Workshop on Synthetic Simulation and Robotics to Mitigate<br />

Earthquake Disaster SRMED 2007. =⇒.<br />

[Petitjean, 2002] Petitjean, S. (2002). A survey of methods for recovering quadrics in triangle meshes.<br />

ACM Comput. Surv., 34(2):211–262.<br />

[Petres et al., 2007] Petres, C., Pailhas, Y., Patron, P., Petillot, Y., Evans, J., and Lane, D. (2007). Path<br />

planning for autonomous underwater vehicles. Robotics, IEEE Transactions on, 23(2):331–341.<br />

=⇒.<br />

[Pettersson and Doherty, 2006] Pettersson, P. O. and Doherty, P. (2006). Probabilistic roadmap based<br />

path planning for an autonomous unmanned helicopter. Journal of Intelligent and Fuzzy Systems,<br />

17:395–405.<br />

[Pflimlin et al., 2006] Pflimlin, J., Soueres, P., and Hamel, T. (2006). Waypoint navigation control of<br />

a VTOL UAV amidst obstacles. In Intelligent Robots and Systems, 2006 IEEE/RSJ International<br />

Conference on, pages 3544–3549. =⇒.<br />

[Point of Beginning, 2006] Point of Beginning (2006). 2006 3D Laser Scanner Hardware Survey. .<br />

[<strong>Poppinga</strong> and Birk, 2009] <strong>Poppinga</strong>, J. and Birk, A. (2009). A Novel Approach to Wrap Around<br />

Error Correction for a Time-Of-Flight 3D Camera. In Iocchi, L., Matsubara, H., Weitzenfeld, A.,<br />

and Zhou, C., editors, RoboCup 2008: Robot WorldCup XII, Lecture Notes in Artificial Intelligence<br />

(LNAI). Springer.<br />

[<strong>Poppinga</strong> et al., 2008a] <strong>Poppinga</strong>, J., Birk, A., and Pathak, K. (2008a). Hough based Terrain Classification<br />

for Realtime Detection of Drivable Ground. Journal of Field Robotics, 25(1-2):67–88.<br />

[<strong>Poppinga</strong> et al., 2007] <strong>Poppinga</strong>, J., Pfingsthorn, M., Schwertfeger, S., Pathak, K., and Birk, A.<br />

(2007). Optimized Octtree Datastructure and Access Methods for 3D Mapping. In IEEE Safety,<br />

Security, and Rescue Robotics (SSRR). IEEE Press.<br />

[<strong>Poppinga</strong> et al., 2008b] <strong>Poppinga</strong>, J., Vaskevicius, N., Birk, A., and Pathak, K. (2008b). Fast Plane<br />

Detection and Polygonalization in noisy 3D Range Images. In International Conference on Intelligent<br />

Robots and Systems (IROS), pages 3378 – 3383, Nice, France. IEEE Press.<br />

143


[Rusu et al., 2008a] Rusu, R. B., Marton, Z. C., Blodow, N., Dolha, M., and Beetz, M. (2008a).<br />

Towards 3D point cloud based object maps for household environments. Robotics and Autonomous<br />

Systems, 56:927–941. =⇒.<br />

[Rusu et al., 2008b] Rusu, R. B., Sundaresan, A., Morisset, B., Agrawal, M., and Beetz, M. (2008b).<br />

Leaving flatland: Realtime 3D stereo semantic reconstruction. In ICIRA ’08: Proceedings of the<br />

First International Conference on Intelligent Robotics and Applications, pages 921–932, Berlin,<br />

Heidelberg. Springer-Verlag. =⇒.<br />

[Sabe et al., 2004] Sabe, K., Fukuchi, M., Gutmann, J.-S., Ohashi, T., Kawamoto, K., and Yoshigahara,<br />

T. (2004). Obstacle avoidance and path planning for humanoid robots using stereo vision. In<br />

Robotics and Automation, 2004. Proceedings. ICRA ’04. 2004 IEEE International Conference on,<br />

volume 1, pages 592–597, Washington, DC, USA. IEEE Computer Society. =⇒.<br />

[Saez and Escolano, 2004] Saez, J. and Escolano, F. (2004). A global 3D map-building approach<br />

using stereo vision. In Robotics and Automation, 2004. Proceedings. ICRA ’04. 2004 IEEE International<br />

Conference on, volume 2, pages 1197 – 1202. =⇒.<br />

[Sakenas et al., 2007] Sakenas, V., Kosuchinas, O., Pfingsthorn, M., and Birk, A. (2007). Extraction<br />

of Semantic Floor Plans from 3D Point Cloud Maps. In International Workshop on Safety, Security,<br />

and Rescue Robotics (SSRR). IEEE Press.<br />

[Saunders et al., 2005] Saunders, J. B., Call, B., Curtis, A., and Beard, R. W. (2005). Static and<br />

dynamic obstacle avoidance in miniature air vehicles. AIAA Infotech at Aerospace Conference.<br />

=⇒.<br />

[Schäfer et al., 2005a] Schäfer, B. H., Proetzsch, M., and Berns, K. (2005a). Stereo-vision-based<br />

obstacle avoidance in rough outdoor terrain. In In International Symposium on Motor Control and<br />

Robotics. =⇒.<br />

[Schafer et al., 2008] Schafer, H., Hach, A., Proetzsch, M., and Berns, K. (2008). 3D obstacle detection<br />

and avoidance in vegetated off-road terrain. In Robotics and Automation, 2008. ICRA 2008.<br />

IEEE International Conference on, pages 923–928. =⇒.<br />

[Schäfer et al., 2005b] Schäfer, H. B., Luksch, T., and Berns, K. (2005b). Obstacle detection and<br />

avoidance for mobile outdoor robotics. In In EOS Conference on Industrial Imaging and Machine<br />

Vision. =⇒.<br />

[Schwertfeger et al., 2008] Schwertfeger, S., <strong>Poppinga</strong>, J., and Birk, A. (2008). Towards Object Classification<br />

using 3D Sensor Data. In ECSIS Symposium on Learning and Adaptive Behaviors for<br />

Robotic Systems (LAB-RS). IEEE.<br />

[Shatkay, 1995] Shatkay, H. (1995). Approximate queries and representations for large data sequences.<br />

Technical report, Providence, Rhode Island, USA. the tech report, not the 96 paper<br />

of the same title.<br />

[Siek et al., 2001] Siek, J., Lee, L.-Q., and Lumsdaine, A. (2001). Boost graph library. http://<br />

www.boost.org/doc/libs/1_39_0/libs/graph/doc/index.html. retrieved on<br />

September 10th, 2009.<br />

144


[Silveira et al., 2006] Silveira, G., Malis, E., and Rives, P. (2006). Real-time robust detection of<br />

planar regions in a pair of images. In Intelligent Robots and Systems, 2006 IEEE/RSJ International<br />

Conference on, pages 49–54, Washington, DC, USA. IEEE Computer Society. =⇒.<br />

[Simmons, 1996] Simmons, R. (1996). The curvature-velocity method for local obstacle avoidance.<br />

In Robotics and Automation, 1996. Proceedings., 1996 IEEE International Conference on, volume<br />

4, pages 3375–3382. =⇒.<br />

[Simmons et al., 1996] Simmons, R., Henriksen, L., Chrisman, L., and Whelan, G. (1996). Obstacle<br />

avoidance and safeguarding for a lunar rover. In Proc. AlAA Forum on Advanced Developments in<br />

Space Robotics, Madison WI. =⇒.<br />

[Smith, 2006] Smith, R. (2006). Open Dynamics Engine. http://www.ode.org/. retrieved on<br />

July 6th, 2009.<br />

[Sun et al., 2001] Sun, W., Bradley, C., Zhang, Y. F., and Loh, H. T. (2001). Cloud data modelling<br />

employing a unified, non-redundant triangular mesh. Computer-Aided Design, 33(2):183 – 193.<br />

=⇒.<br />

[Surmann et al., 2003] Surmann, H., Nuechter, A., and Hertzberg, J. (2003). An autonomous mobile<br />

robot with a 3D laser range finder for 3D exploration and digitalization of indoor environments.<br />

Robotics and Autonomous Systems, 45(3-4):181–198.<br />

[Tangelder and Fabri, 2008] Tangelder, H. and Fabri, A. (2008). dd spatial searching. In Board, C. E.,<br />

editor, CGAL User and Reference Manual. 3.4 edition.<br />

[Terdiman, 2003] Terdiman, P. (2003). OPCODE – Optimized Collision Detection. http://www.<br />

codercorner.com/Opcode.htm. retrieved on July 6th, 2009.<br />

[Thrun, 2002] Thrun, S. (2002). Robotic Mapping: A Survey. In Lakemeyer, G. and Nebel, B.,<br />

editors, Exploring Artificial Intelligence in the New Millenium. Morgan Kaufmann.<br />

[Thrun et al., 2000] Thrun, S., Burgard, W., and Fox, D. (2000). A Real-Time algorithm for Mobile<br />

Robot Mapping With Applications to Multi-Robot and 3D Mapping. In ICRA, pages 321–328.<br />

[Thrun et al., 2003] Thrun, S., Hahnel, D., Ferguson, D., Montemerlo, M., Triebel, R., Burgard, W.,<br />

Baker, C., Omohundro, Z., Thayer, S., and Whittaker, W. (2003). A system for volumetric robotic<br />

mapping of abandoned mines. In Robotics and Automation, 2003. Proceedings. ICRA ’03. IEEE<br />

International Conference on, volume 3, pages 4270 – 4275. =⇒.<br />

[Thrun et al., 2006] Thrun, S., Montemerlo, M., Dahlkamp, H., Stavens, D., Aron, A., Diebel, J.,<br />

Fong, P., Gale, J., Halpenny, M., Hoffmann, G., Lau, K., Oakley, C., Palatucci, M., Pratt, V., Stang,<br />

P., Strohband, S., Dupont, C., Jendrossek, L.-E., Koelen, C., Markey, C., Rummel, C., Niekerk,<br />

J. v., Jensen, E., Alessandrini, P., Bradski, G., Davies, B., Ettinger, S., Kaehler, A., Nefian, A., and<br />

Mahoney, P. (2006). Stanley: The robot that won the DARPA Grand Challenge. Journal of Field<br />

Robotics, 23(9):661–692. =⇒.<br />

[Ulrich and Nourbakhsh, 2000] Ulrich, I. and Nourbakhsh, I. (2000). Appearance-based obstacle<br />

detection with monocular color vision. In Proceedings of the Seventeenth National Conference on<br />

Artificial Intelligence, pages 866 – 871, Menlo Park, California. The AAAI Press. =⇒.<br />

145


[Unnikrishnan and Hebert, 2003] Unnikrishnan, R. and Hebert, M. (2003). Robust extraction of multiple<br />

structures from non-uniformly sampled data. In IEEE/RSJ International Conference on Intelligent<br />

Robots and Systems (IROS), volume 2, pages 1322–1329 vol.2. IEEE Press.<br />

[van den Bergen, 2004] van den Bergen, G. (2004). SOLID – Software Library for Interference Detection.<br />

http://www.win.tue.nl/˜gino/solid/. retrieved on July 27th, 2010.<br />

[Varadarajan, 1996] Varadarajan, K. R. (1996). Approximating monotone polygonal curves using<br />

the uniform metric. In Proceedings of the twelfth annual symposium on Computational geometry,<br />

pages 311–318, Philadelphia, Pennsylvania, USA.<br />

[Vaskevicius et al., 2007] Vaskevicius, N., Birk, A., Pathak, K., and <strong>Poppinga</strong>, J. (2007). Fast Detection<br />

of Polygons in 3D Point Clouds from Noise-Prone Range Sensors. In International Workshop<br />

on Safety, Security, and Rescue Robotics (SSRR). IEEE Press.<br />

[Videre-Design, 2006] Videre-Design (2006).<br />

1.1.<br />

Stereo-on-a-chip (STOC) stereo head, user manual<br />

[Viejo and Cazorla, 2007] Viejo, D. and Cazorla, M. (2007). 3D plane-based egomotion for SLAM<br />

on semi-structured environment. In IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS),<br />

pages 2761–2766.<br />

[Waite, 2002] Waite, A. D. (2002). Sonar for Practising Engineers - Third Edition. John Wiley &<br />

Sons, Ltd.<br />

[Watanabe et al., 2005] Watanabe, Y., Johnson, E. N., and Calise, A. J. (2005). Vision-based approach<br />

to obstacle avoidance. In 2005 AIAA Guidance, Navigation, and Control Conference and<br />

Exhibit, pages 1–10, San Francisco, CA; USA. =⇒.<br />

[Weingarten and Siegwart, 2006] Weingarten, J. and Siegwart, R. (2006). 3D SLAM using planar<br />

segments. In Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on, pages<br />

3062–3067. =⇒.<br />

[Wilmarth et al., 1999] Wilmarth, S. A., Amato, N. M., and Stiller, P. F. (1999). Motion planning for<br />

a rigid body using random networks on the medial axis of the free space. In SCG ’99: Proceedings<br />

of the fifteenth annual symposium on Computational geometry, pages 173–180, New York, NY,<br />

USA. ACM. =⇒.<br />

[Wulf and Wagner, 2003] Wulf, O. and Wagner, B. (2003). Fast 3D-Scanning Methods for Laser<br />

Measurement Systems. In International Conference on Control Systems and Computer Science<br />

(CSCS14).<br />

[Wzorek and Doherty, 2006] Wzorek, M. and Doherty, P. (2006). Reconfigurable path planning for<br />

an autonomous unmanned aerial vehicle. In Hybrid Information Technology, 2006. ICHIT ’06.<br />

International Conference on, volume 2, pages 242–249. =⇒.<br />

[Yakimovsky and Cunningham, 1978] Yakimovsky, Y. and Cunningham, R. (1978). A system for<br />

extracting three-dimensional measurements from a stereo pair of TV cameras. Computer Graphics<br />

and Image Processing, 7:195–210. =⇒.<br />

[Yvinec, 2008] Yvinec, M. (2008). 2d triangulations. In Board, C. E., editor, CGAL User and<br />

Reference Manual. 3.4 edition.<br />

146


[Zufferey and Floreano, 2005] Zufferey, J.-C. and Floreano, D. (2005). Toward 30-gram autonomous<br />

indoor aircraft: Vision-based obstacle avoidance and altitude control. In Robotics and Automation,<br />

2005. ICRA 2005. Proceedings of the 2005 IEEE International Conference on, pages 2594–2599.<br />

=⇒.<br />

147

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!