PhD Thesis Poppinga: RRT - Jacobs University

PhD Thesis Poppinga: RRT - Jacobs University PhD Thesis Poppinga: RRT - Jacobs University

from jacobs.university.de More from this publisher

11.03.2014 Views

Towards Autonomous Navigation for Robots with 3D Sensors by Jann Poppinga A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Approved Dissertation Committee Prof. Dr. Andreas Birk Prof. Dr. Andreas Nüchter Prof. Dr.-Ing. Heiko Mosemann Defended on September 17th, 2010 School of Engineering and Science

Towards Autonomous Navigation for Robots with 3D

Sensors

by

Jann Poppinga

A thesis submitted in partial fulfillment

of the requirements for the degree of

Doctor of Philosophy

in Computer Science

Approved Dissertation Committee

Prof. Dr. Andreas Birk

Prof. Dr. Andreas Nüchter

Prof. Dr.-Ing. Heiko Mosemann

Defended on September 17th, 2010

School of Engineering and Science

Abstract

Autonomous navigation is an important challenge in many application fields of mobile robotics. This

thesis contributes to mobile robot autonomous navigation at different levels. In chapter 2, different

methods for 3D sensing are discussed. In particular, we evaluate a time-of-flight range camera in

various situations. We develop a method to detect and, if desirable, heuristically correct the most

inhibitive error source: Range is detected via phase shift. Because of the way phase shift is measured,

values beyond 360° will be mapped back into the [0°, 360°[ interval. The derived range is consequently

also affected. The error is inherent not only to this model, but to all range cameras measuring

the time-of-flight in the same way. Our correction makes it possible to use the range camera on a

mobile robot for high-frequency 3D sensing. In chapter 3, we present several 3D point cloud datasets

gathered with the time-of-flight camera presented in the preceding chapter and other 3D sensors: an

actuated laser range-finder, a stereo camera, and a 3D sonar. The datasets represent a wide variety of

real-world and set-up scenes, indoors and outdoors, underwater and land, and small scale and large

scale. These datasets are used in the experiments in the following chapters. In chapter 4, we present a

robust short-range obstacle detection algorithm that runs fast enough to actually make use of the range

camera’s high data rate. It is based on the Hough transform for planes in 3D point clouds. The bins are

relatively coarse which turned out to be enough for testing for drivability. The method allows a mobile

robot to reliably detect the drivability of the terrain it faces. With the same method and finer bins it

is possible to classifiy the terrain, albeit not as reliably as checking drivability. Experiments with two

types of sensors on data from indoors and outdoors demonstrated the algorithm’s performance. The

processing time typically lies between 5 and 50 ms. This is enough for real-time processing on a robot

moving at reasonable speed. In chapter 5, we develop the Patch Map data-structure for memory efficient

3D mapping based on planar surfaces extracted from 3D point cloud data. It is flexible enough

to allow for different kind of surface representations, different methods of collision detection, and different

kinds of roadmap algorithms. Surface representations are planar polygons and trimeshes. We

survey and benchmark different methods of generating planar patches from a point cloud segmented

into planar regions. We also implement a point cloud based variant of the map data-structure that

allows for comparison with this standard data-structure. Collision detection is implemented in a way

to exploit the fact that planar patches are polygons and also based on two external collision detection

libraries, Bullet and OPCODE. The implemented roadmap algorithms are Rapidly-exploring Random

Tree (RRT), Probabilistic Roadmap Method (PRM) and variants of these, most notably variants of

RRT and PRM that place vertices on the medial axes of the map without explicitly computing it. We

benchmark all collision detection methods with all roadmap algorithms on synthetic data to find out

the most efficient ones. RRT is the best roadmap algorithm and Bullet the fastest collision detection.

Thus, the Patch map data-structure shows its flexibility and usability in practice. In chapter 6, we

thoroughly test the Patch Map data-structure developed in the preceding chapter on real-world data.

The Patch Map consisting only of planar patches is 18 times smaller than the point clouds it is based

on. We perform both roadmap generation with PRM and RRT and way finding from start to goal,

ased on RRT. We compare our approach to the established methods of trimeshes and point clouds

and find that it performs an order of magnitude faster on 3D LRF data and also considerably better on

sonar data. We also show that this speed advantage does not come at the cost of loss of precision. To

this end, we compare the total explorable space on the different map representations and found they

only marginally differ. In summary, the Patch Map is proven to be a viable alternative to standard

methods that is much more economical with memory.

Preface

This thesis is based on my work in the Jacobs Robotics Group at Jacobs University Bremen. It would

not have been possible without the work, help, collaboration, and inspiration of, with and by (in

alphabetical order) Rares Ambrus, Hamed Bastani, Andreas Birk, Heiko Bülow, Winai Chonnaparamutt,

Ivan Delchev, Stefan Markov, Mohammed Nour, Yashodan Nevatia, Kaustubh Pathak, Max

Pfingsthorn, Ravi Rathnam, Sören Schwertfeger, Todor Stoyanov, and Narūnas Vaškevičius.

Furthermore, I would like to thank my wife Danni and my daughter Lina for keeping my spirits

up during the stressful days of completing this thesis. Brigitte Dörr provided the invaluable help of

proof-reading together with her cousin. Stefan May clarified open questions in optics.

Together with my co-authors, I layed the basis for this thesis in the following publications (the

order corresponds to the order of their contribution of the thesis):

• Jann Poppinga and Andreas Birk. A Novel Approach to Wrap Around Error Correction for

a Time-Of-Flight 3D Camera. In Luca Iocchi, Hitoshi Matsubara, Alfredo Weitzenfeld, and

Changjiu Zhou, editors, RoboCup 2008: Robot WorldCup XII, Lecture Notes in Artificial Intelligence

(LNAI). Springer, 2009, henceforth referred to as [Poppinga and Birk, 2009].

• Jann Poppinga, Andreas Birk, and Kaustubh Pathak. Hough based Terrain Classification for

Realtime Detection of Drivable Ground. In Journal of Field Robotics, 25(1-2):67–88, 2008,

henceforth referred to as [Poppinga et al., 2008a].

• Andreas Birk, Todor Stoyanov, Yashodhan Nevatia, Rares Ambrus, Jann Poppinga, and Kaustubh

Pathak. Terrain Classification for Autonomous Robot Mobility: from Safety, Security

Rescue Robotics to Planetary Exploration. In Planetary Rovers Workshop, International Conference

on Robotics and Automation (ICRA). IEEE, 2008.

• Birk, A., Poppinga, J., Stoyanov, T., and Nevatia, Y. Planetary Exploration in USARsim: A

Case Study including Real World Data from Mars. In Iocchi, L., Matsubara, H., Weitzenfeld,

A., and Zhou, C., editors, RoboCup 2008: Robot WorldCup XII, Lecture Notes in Artificial

Intelligence (LNAI). Springer, .

• Narunas Vaskevicius, Andreas Birk, Kaustubh Pathak, and Jann Poppinga. Fast Detection of

Polygons in 3D Point Clouds from Noise-Prone Range Sensors. In International Workshop

on Safety, Security, and Rescue Robotics (SSRR). IEEE Press, 2007, henceforth referred to as

[Vaskevicius et al., 2007].

• Jann Poppinga, Narunas Vaskevicius, Andreas Birk, and Kaustubh Pathak. Fast Plane Detection

and Polygonalization in noisy 3D Range Images. In International Conference on Intelligent

Robots and Systems (IROS), pages 3378 – 3383, Nice, France, 2008. IEEE Press, henceforth

referred to as [Poppinga et al., 2008b].

• Jann Poppinga, Max Pfingsthorn, Soeren Schwertfeger, Kaustubh Pathak, and Andreas Birk.

Optimized Octtree Datastructure and Access Methods for 3D Mapping. In IEEE Safety, Security,

and Rescue Robotics (SSRR). IEEE Press, 2007,

henceforth referred to as [Poppinga et al., 2007].

The patch map datastructure is built based on the work by Kaustubh Pathak, published in:

• Andreas Birk, Kaustubh Pathak, Narunas Vaskevicius, Max Pfingsthorn, Jann Poppinga, and

Soeren Schwertfeger. Surface Representations for 3D Mapping: A Case for a Paradigm Shift.

KI - German Journal on Artificial Intelligence, 2010.

• Kaustubh Pathak, Andreas Birk, Narunas Vaskevicius, Max Pfingsthorn, Soeren Schwertfeger,

and Jann Poppinga. Online 3D SLAM by Registration of Large Planar Surface Segments and

Closed Form Pose-Graph Relaxation. Journal of Field Robotics, Special Issue on 3D Mapping,

27(1):52–84, 2010.

• Kaustubh Pathak, Narunas Vaskevicius, Jann Poppinga, Max Pfingsthorn, Soeren Schwertfeger,

and Andreas Birk. Fast 3D Mapping by Matching Planes Extracted from Range Sensor Point-

Clouds. In International Conference on Intelligent Robots and Systems (IROS). IEEE Press,

2009.

• Andreas Birk, Narunas Vaskevicius, Kaustubh Pathak, Soeren Schwertfeger, Jann Poppinga,

and Heiko Buelow. 3-D Perception and Modeling: Motion-Level Teleoperation and Intelligent

Autonomous Functions. IEEE Robotics and Automation Magazine (RAM), December, 2009.

• K. Pathak, A. Birk, N. Vaškevičius, and J. Poppinga. Fast registration based on noisy planes

with unknown correspondences for 3-d mapping. Robotics, IEEE Transactions on, 26(3):424

–441, June 2010, henceforth referred to as [Pathak et al., 2010b].

Data collected by me and library functions implemented by me were used in object classification:

• Soeren Schwertfeger, Jann Poppinga, and Andreas Birk. Towards Object Classification using

3D Sensor Data. In ECSIS Symposium on Learning and Adaptive Behaviors for Robotic Systems

(LAB-RS). IEEE, 2008.

In an unrelated educational project, I worked with humanoid robots:

• Andreas Birk, Jann Poppinga, and Max Pfingsthorn. Using different Humanoid Robots for

Science Edutainment of Secondary School Pupils. In Luca Iocchi, Hitoshi Matsubara, Alfredo

Weitzenfeld, and Changjiu Zhou, editors, RoboCup 2008: Robot WorldCup XII, Lecture Notes

in Artificial Intelligence (LNAI). Springer, 2009.

Based on my work with the SwissRanger sensor, K. Pathak implemented a forward sensor model,

sensor fusion, and achieved sub-pixel accuracy:

6

• Kaustubh Pathak, Andreas Birk, Soeren Schwertfeger, and Jann Poppinga. 3D Forward Sensor

Modeling and Application to Occupancy Grid Based Sensor Fusion. In International Conference

on Intelligent Robots and Systems (IROS), pages 2059 – 2064, San Diego, USA, 2007.

IEEE Press.

• Kaustubh Pathak, Andreas Birk, Jann Poppinga, and Sören Schwertfeger. 3d forward sensor

modeling and application to occupancy grid based sensor fusion. In IEEE/RSJ Int. Conf. on

Intelligent Robots and Systems, San Diego, Nov 2007.

• Kaustubh Pathak, Andreas Birk, and Jann Poppinga. Subpixel Depth Accuracy with a Time of

Flight Sensor using Multimodal Gaussian Analysis. In International Conference on Intelligent

Robots and Systems (IROS), Nice, France, 2008. IEEE Press.

Parts of the data was collected in the Jacobs Robotics Arena, documented in:

• Andreas Birk, Kaustubh Pathak, Jann Poppinga, Sören Schwertfeger, Max Pfingsthorn, and

Heiko Bülow. The Jacobs Test Arena for Safety, Security, and Rescue Robotics (SSRR). In

WS on Performance Evaluation and Benchmarking for Intelligent Robots and Systems, Intern.

Conf. on Intelligent Robots and Systems (IROS). IEEE Press, 2007.

My general contributions to the Jacobs Robotics robot system are published as:

• Andreas Birk, Kaustubh Pathak, Jann Poppinga, Soeren Schwertfeger, and Winai Chonnaparamutt.

Intelligent Behaviors in Outdoor Environments. In 13th International Conference on

Robotics and Applications, Special Session on Outdoor Robotics - Taking Robots off road.

IASTED, 2007.

Contents

1 Introduction 19

1.1 Obstacle avoidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.2 3D Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.2.1 3D mapping, not obstacle avoidance . . . . . . . . . . . . . . . . . . . . . . 21

1.2.2 Mapping: from 2D to 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.2.3 3D mapping – state of the art . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.3 3D Path Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2 3D Sensing 27

2.1 3D laser range finders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.2 Stereo cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.3 Time-of-flight cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.4 Sonar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.5 Case study: The SwissRanger time-of-flight camera . . . . . . . . . . . . . . . . . . 31

2.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.5.2 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.5.3 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.5.4 Types of Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.5.5 A novel solution to wrap-around: Adaptive Amplitude Threshold . . . . . . 41

2.5.6 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3 Datasets 51

3.1 Datasets with range cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.1.1 Arena . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.1.2 Outdoor 1-3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.1.3 Planar/Round/Holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.2 Datasets with actuated LRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.2.1 Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.2.2 Crashed Car Park . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.2.3 Dwelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.2.4 Hannover ’09 Hall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.2.5 Hannover ’09 Arena . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.3 Dataset with sonar: Lesumsperrwerk . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

CONTENTS

4 Near Field 3D Navigation with the Hough Transform 65

4.1 Approach and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.1.1 The Hough transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.1.2 Plane Parameterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.1.3 Processing of the Hough Space for Classification . . . . . . . . . . . . . . . 68

4.2 Experiments and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.2.1 Hough space examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.2.2 General Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 71

5 Patch Map Data-Structure 79

5.1 Mapping with planar patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.2 Surface Patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.2.1 Planar Patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.2.2 Trimesh Patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.2.3 Other Patch Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.3 Generation of patches from point clouds . . . . . . . . . . . . . . . . . . . . . . . . 82

5.3.1 Trimesh Patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.3.2 Planar Patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.4 Patch Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

5.4.1 Patch Map Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.4.2 Patch Map Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.5 Algorithms on patch maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.5.1 Evasion RRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.5.2 Medial Axis algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6 3D Roadmaps for Unmanned Aerial Vehicles on Planar Patch Map 113

6.1 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

6.1.1 Explored Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

6.1.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

6.1.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

6.2.1 Results of RRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

6.2.2 Results of PRM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

6.2.3 Lesumsperrwerk dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

7 Conclusion 131

A Addenda & Errata 133

Bibliography 134

List of Figures

1.1 Rugbot at a drill and at ELROB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.2 Rugbot at RoboCup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.3 An indicator for 3D mapping: Collapsed Building at Disaster City training site in Texas 23

2.1 Functional principle of stereo vision . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.2 Wave properties and their measurement . . . . . . . . . . . . . . . . . . . . . . . . 30

2.3 A simple scene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.4 Sample scene shot with different (3D) cameras . . . . . . . . . . . . . . . . . . . . 32

2.5 A lab scene at different amplitude thresholds . . . . . . . . . . . . . . . . . . . . . . 33

2.6 Deviation at different distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.7 Inhomogeneous lighting on white homogeneous surface . . . . . . . . . . . . . . . . 36

2.8 SR errors caused by ambient light . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.9 Amplitude influences range measurement . . . . . . . . . . . . . . . . . . . . . . . 37

2.10 Light scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.11 A ghost image of the rod can be seen on both edges of the image (circled). . . . . . . 39

2.12 Errors caused by a moving object (AT=500). . . . . . . . . . . . . . . . . . . . . . . 40

2.13 Reflections, here on a floor, cause the measurement of false distances. . . . . . . . . 40

2.14 An object at various too short distances to the camera, AT 0 . . . . . . . . . . . . . . 41

2.15 Irregular distribution of near-IR light from the camera’s illumination unit . . . . . . . 43

2.16 Three test scenes comparing the proposed method to the standard method . . . . . . 44

2.17 Three test scenes comparing the proposed method to the standard AT method of the

SwissRanger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.18 Webcam images for scenes demonstrating the correction of errors other than wraparound

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.19 SwissRanger images for scenes demonstrating the correction of errors other than

wrap-around . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.20 Different error exclusion criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2.21 An example for correcting wrapped-around pixels . . . . . . . . . . . . . . . . . . . 49

3.1 The autonomous version of a Rugbot with some important on-board sensors pointed out 51

3.2 Example range images from the Planar/Round/Holes dataset . . . . . . . . . . . . . 52

3.3 A SwissRanger point cloud from the Arena dataset . . . . . . . . . . . . . . . . . . 53

3.4 Photos from the Arena dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.5 Photos of the different types of scenes encountered in the Outdoor 1 dataset. The

photos are taken by a webcam which is mounted right next to the 3D sensors. . . . . 54

LIST OF FIGURES

3.6 The data returned by the SwissRanger (left) and the stereo camera (right) for the

scenes from the Outdoor 1 dataset. Photos in figure 3.5. . . . . . . . . . . . . . . . . 55

3.7 Photos of the robotics lab with a locomotion test arena in form of a high bay rack. . . 56

3.8 ALRF range image from the lab dataset . . . . . . . . . . . . . . . . . . . . . . . . 57

3.9 The Crashed Car Park in Disaster City in Texas where one of the datasets used in the

experiments was recorded. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.10 Perspective view of two point clouds from the Dwelling dataset from Disaster City . 58

3.11 The Hannover ’09 Hall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.12 Two views of a point cloud containing 78528 points from the Hannover ’09 Arena

dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.13 An overview of the Lesumsperrwerk as seen from the river’s surface. . . . . . . . . . 61

3.14 Two views of a sonar point cloud from the Lesumsperrwerk dataset . . . . . . . . . . 61

4.1 The definition for the angles ρ x and ρ y . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.2 An example scene where the stereo camera delivers few data points; especially ground

information is missing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.3 Two-dimensional depictions of the three dimensional parameter (Hough-)space for

several example snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.4 Mean processing time and cardinality of point clouds . . . . . . . . . . . . . . . . . 73

5.1 Applying a 3D transform to a 2D polygon . . . . . . . . . . . . . . . . . . . . . . . 80

5.2 Different methods of projecting the point of a sub point cloud to the optimal plane

fitted to them . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.3 Problems with the two types of projection . . . . . . . . . . . . . . . . . . . . . . . 85

5.4 Differences between orthogonal projection and projection along the beam . . . . . . 86

5.5 Late projection can cause intersections when applied to ALRF data . . . . . . . . . . 87

5.6 An α-shape (grey) of a 2D point cloud (orange) . . . . . . . . . . . . . . . . . . . . 88

5.7 Some examples for convex polygons on a grid . . . . . . . . . . . . . . . . . . . . . 89

5.8 An example iteration of the algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.9 Triangulation of convex grid polygons . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.10 Comparison of triangulation with/without the restriction to the area covered by the

naive triangulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.11 Triangles with annotated spikyness . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.12 Time taken, number of polygons and spikyness for the different triangulation algorithms

on the German Open ’09 Arena dataset. . . . . . . . . . . . . . . . . . . . . 93

5.13 Time taken, number of polygons and spikyness for the different triangulation algorithms

on the German Open ’09 Hall dataset. . . . . . . . . . . . . . . . . . . . . . 94

5.14 Time taken, number of polygons and spikyness for the different triangulation algorithms

on the Planar/Round/Holes dataset. . . . . . . . . . . . . . . . . . . . . . . . 96

5.15 An example of outlining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.16 Class structure of the different variants of the patch map framework . . . . . . . . . 98

5.17 One dimensional case of bounding box based broad phase intersection test . . . . . . 102

5.18 A run of the kD-tree based collision detection algorithm detecting no collision. . . . 104

5.19 Map used in preliminary experiments . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.1 Real-world example for an explored volume . . . . . . . . . . . . . . . . . . . . . . 116

6.2 Using explored volume to compare different map types . . . . . . . . . . . . . . . . 117

LIST OF FIGURES

6.3 The lab model the roadmap experiments are run on. . . . . . . . . . . . . . . . . . . 118

6.4 Time taken by the RRT algorithm on the four different map types on the lab dataset. . 121

6.5 Time taken by the RRT algorithm on the four different map types on the Crashed Car

Park dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.6 A visualization of an RRT generated in 100 iterations in the Jacobs Robotics lab hybrid

map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

6.7 Time taken by the PRM algorithm on the four different map types on the lab dataset. 124

6.8 Time taken by the PRM algorithm on the four different map types on the Crashed Car

Park dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

6.9 Collision detection distance for PRM . . . . . . . . . . . . . . . . . . . . . . . . . . 126

6.10 Without using a bounding box, road-maps may be planned outside the valid area –

even above water as in this case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6.11 Fraction of runs where the goal was reached for RRT on the first two scans of the

Lesumsperrwerk dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

6.12 Time taken for different parameters on different scans of the Lesumsperrwerk data set 129

A.1 Unsimplifyable outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

14

LIST OF FIGURES

List of Tables

2.1 Specification of the 3D Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.2 Technical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.3 Reflectivity of different materials for light with 900 nm wavelength . . . . . . . . . . 38

2.4 Results on all six scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.1 Number of points in the datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.2 Spatial properties of the datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.1 Stereo and TOF point clouds in the four datasets used in the experiments. . . . . . . 69

4.2 Percentages of snapshots excluded from Hough transform via preprocessing . . . . . 70

4.3 Success rates and computation times for drivability detection. . . . . . . . . . . . . . 72

4.4 Human generated ground truth labels for the stereo camera data of the different scenes 74

4.5 Human generated ground truth labels for the SwissRanger data of the different scenes 75

4.6 Classification rates and run times for stereo camera data processed at different angular

resolutions of the Hough space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.7 Classification rates and run times for SwissRanger data processed at different angular

resolutions of the Hough space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.1 A comparison of the sizes of different storage methods for surface patches . . . . . . 81

5.2 Required interfaces for patch map based algorithms . . . . . . . . . . . . . . . . . . 105

5.3 Median wall time taken to reach the ROIs from the starting points . . . . . . . . . . 109

5.4 Lower and upper quartile of wall time taken to reach the ROIs from the starting points 109

5.5 Median added length of paths from starting point to all ROIs . . . . . . . . . . . . . 109

5.6 Lower and upper quartile of added length of paths from starting point to all ROIs . . 110

5.7 Median steep turns on paths from starting point to ROIs . . . . . . . . . . . . . . . . 110

5.8 Lower quartile of steep turns on paths from starting point to ROIs . . . . . . . . . . 110

5.9 Upper quartile of steep turns on paths from starting point to ROIs . . . . . . . . . . . 111

5.10 Median curviness of paths from starting point to ROIs . . . . . . . . . . . . . . . . . 111

5.11 Lower and upper quartile of curviness of paths from starting point to ROIs . . . . . . 111

6.1 The actual file sizes of the different map types of the Lab dataset . . . . . . . . . . . 118

6.2 Explored volume on different maps for the RRT on the lab dataset . . . . . . . . . . 119

6.3 Explored volume on different maps for the RRT on the Crashed Car Park dataset . . 120

6.4 Explored volume on different maps for the PRM on the lab dataset . . . . . . . . . . 123

6.5 Explored volume on different maps for the PRM on the Crashed Car Park dataset . . 126

16

LIST OF TABLES

List of Algorithms

1 General form of the Hough transform algorithm . . . . . . . . . . . . . . . . . . . . 66

2 Hough transform applied for plane detection . . . . . . . . . . . . . . . . . . . . . . 66

3 The ground classification algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4 The first phase of trimesh growing . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5 The second phase of trimesh growing . . . . . . . . . . . . . . . . . . . . . . . . . 84

6 Generating the α-shape A for point cloud P . . . . . . . . . . . . . . . . . . . . . . 88

7 The scanLine algorithm for range image triangulation . . . . . . . . . . . . . . . . . 90

8 Construction of a link polygon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

9 The original Douglas-Peucker algorithm for open polylines . . . . . . . . . . . . . 98

10 Collision detection for a capsule in a point cloud . . . . . . . . . . . . . . . . . . . . 103

11 The Evasion RRT algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

12 Medial Axis PRM adapted to patch maps . . . . . . . . . . . . . . . . . . . . . . . 107

13 The original RRT algorithm from [LaValle, 1998] . . . . . . . . . . . . . . . . . . . 114

14 The original PRM algorithm from [Kavraki et al., 1996] . . . . . . . . . . . . . . . . 114

18

LIST OF ALGORITHMS

Chapter 1

Introduction

Robots have come a long way since the first modern robot Unimate. Traditionally, robots were robot

arms, fixed in place in a precisely known environment, almost always manufacturing. More and more,

robots are becoming mobile. The first mobile robots relied on special markers to navigate. Nowadays,

mobile robots are used in practice in manufacturing, military and security, disaster mitigation, and in

hospitals. In the last decade, there is increasingly successful research on enabling mobile robots to

navigate autonomously in unstructured environment. This thesis aims at improving mobile robots’

autonomous navigation in this setting.

1.1 Obstacle avoidance

Mobile robots are used in a great many scenarios. These include urban search and rescue (USAR),

exploration, e.g. underwater, in abandoned mines, or on Mars and other celestial bodies, in combat,

in defusing explosives, and also for entertainment. Traditionally, robots have been operated fully

tele-operated by human operators. However, in any scenario, even partial autonomy can reduce the

cognitive load of the robot operators [Parasuraman et al., 2000]. This increases the chances of success

of any mission. For a robot to move at any degree of autonomy, it needs obstacle avoidance.

Obviously, the speed of the computations for obstacle avoidance limits the speed with which the robot

can move. Traditionally, obstacle avoidance has been conducted in 2D with 2D sensors. This applies

to real robots [Simmons, 1996, Borenstein and Koren, 1989, Thrun, 2002, Lapierre et al., 2007,

Blanco et al., 2007] (even Unmanned Aerial Vehicles (UAV) [Bouabdallah et al., 2007]), simulation

[Kim and Khosla, 1991, Barate and Manzanera, 2008], and theory [Ogren and Leonard, 2005].

2D sensing is obviously not enough in a 3D world. 2D laser range finders mounted parallel to

the floor fail to see ditches in the floor or obstacles above the scanning plane. A first approach to

see beyond a strict 2D world is to extract features from a camera image [Hwang and Chang, 2007,

Chen and Tsai, 2000, Ulrich and Nourbakhsh, 2000, LeCun et al., 2005]. Others rely on optic flow

either from a classical camera [Lewis, 2002] or a 1D camera [Zufferey and Floreano, 2005].

Accurate and complete sensing of the 3D world is the easiest with real 3D sensors. There have

been attempts from early on to make use of the third dimension. They relied on vision, namely

binocular stereo [Matthies et al., 1995, Simmons et al., 1996], slider stereo 1 [Moravec, 1980] or structure

from motion [Charnley and Blissett, 1989, Ohya et al., 1998]. [Moravec, 1980] tries to match

1 Slider stereo is a variant of binocular stereo. Only one camera is used, which is slid along a fixed track on the robot. At

certain evenly spaced spots images are taken. They can be processed similar to the ones in a binocular stereo system.

Introduction

all of the few detected 3D points in each image set with all points in the next.

Due to limitations

in computing power at that time this took very long. It was also very error prone. Moravec

assumes an even floor and performs path-planning in 2D. The structure from motion approach in

[Charnley and Blissett, 1989] uses a corner detector which produces relatively sparse features. Based

on these, the authors evaluate the flatness of the terrain as seen from an autonomous vehicle. However,

they consider their inquiry in this direction a first tentative step, to cite:

“ It is clear that we do not drive using Structure-from-Motion alone and we are not

proposing that an autonomous vehicle should do so either. ”

Yet they do manage to extract point clouds and even to calculate the vehicle pose offline by matching

features in consecutive point clouds. By this, they contribute to making their caution a bit less justified

than it was at the time.

One of the first robotic systems with fully working 3D obstacle avoidance used binocular stereo

[Matthies et al., 1995]. They assume a horizontal ground plane, though not a fixed altitude. By

scanning columns in the disparity image, they detect deviation which they classify as obstacles.

[Simmons et al., 1996] uses a similar approach. They also assume a horizontal ground plane, but at a

fixed height. From the points derived from the stereo camera, they generate a local elevation map that

is used for obstacle avoidance. A drastically different approach is pursued in [Ohya et al., 1998]. It is

aimed at office environments where many horizontal and vertical lines can be detected in camera images

and where the assumption of a horizontal ground holds everywhere. Detected lines are matched

to the expected lines given by an a priori wire-frame model of the environment for self-localization.

Obstacle avoidance is done in 2D.

Stereo-vision has remained popular [Chao et al., 2009, Haddad et al., 1998, Schäfer et al., 2005a,

Schäfer et al., 2005b, Okada et al., 2001, Sabe et al., 2004, Larson et al., 2006], but other 3D sensing

methods have gained in use in obstacle avoidance: time-of-flight range cameras [Poppinga et al., 2008a,

Mihailidis et al., 2007] (light or laser based), vision augmented with 2D laser range-finders (LRF)

[Michels et al., 2005], object tracking [Michel et al., 2008], and most prominently LRF. In obstacle

avoidance, actuated ones are used [Kelly et al., 2006, Schafer et al., 2008], although in this case adequate

scanning speed is a challenge [Wulf and Wagner, 2003]. When LRFs are fixed in a position

not parallel to all driving directions, they are either inclined [Thrun et al., 2006] or perpendicular

to the floor. The latter case is found in 3D mapping [Howard et al., 2004, Thrun et al., 2000,

Hähnel et al., 2003], but is not useful for obstacle avoidance, since they do not look ahead.

The full scope of the three-dimensional data is not used by most approaches. Typically, they

assume a horizontal ground plane [Chao et al., 2009, Haddad et al., 1998] and enter the originally

3D data into a 2D map. Often the ground floor assumption is used not only to organize data, but

also to identify obstacles (which are defined as too big deviations from it) [Matthies et al., 1995,

Matthies et al., 2002, Simmons et al., 1996, Schäfer et al., 2005b].

This simplification works in man-made environments, but fails once robots have to operate in an

unstructured environment. In this case, one can identify terrain without searching for a ground plane

at all [Schäfer et al., 2005a]. There are, however, a few approaches that do try to find a ground plane

[Kelly et al., 2006, Simmons et al., 1996].

Recently, stereo-vision has been combined with monocular image classification for greater reliability

[Hadsell et al., 2009]. It also combines the longer range of the latter with the 3D map-building

potential of the former. Regions in the monocular image are classified using a neural network and the

ground plane is identified with a post-processed Hough transform. The results are combined into a

local 2D map.

1.2 3D Mapping

The major problem of 3D obstacle avoidance is high computational cost. [Simmons et al., 1996]

runs at only 4 Hz, [Hadsell et al., 2009] at 8-10 Hz. [Kelly et al., 2006, Schäfer et al., 2005a] do not

give figures of the speeds for their systems. However, the speed achieved by obstacle avoidance algorithms

is crucial. Together with the sensor range and vehicle inertia, it puts a limit on the maximum

safe speed. In chapter 4, we propose a method that is faster than the above solutions and hence allows

for a higher speed.

Figure 1.1: RUGBOT AT A DRILL AND AT ELROB – A Jacobs University robot at a drill dealing with

a hazardous material road accident (upper row) and at the European Land Robotics Trials, ELROB-

2006 (lower row).

1.2 3D Mapping

1.2.1 3D mapping, not obstacle avoidance

In many unstructured environments, e.g. in disaster scenarios or underwater, it is beneficial to use

robots in addition to human workers. As humans are needed to operate the robots, any amount of

autonomy in the robots helps reduce their cognitive load [Murphy et al., 2001]. While obstacle avoidance

can be enough to ease steering and reach short-distance goals, mapping is needed if a robot is to

perform more complex tasks. Examples include returning to the starting point of a mission or even

exploration.

While 2D mapping for ground vehicles in static environments with planar floors is considered

a solved problem [Thrun, 2002], UAV often relied on obstacle avoidance [Bouabdallah et al., 2007,

Pflimlin et al., 2006, Zufferey and Floreano, 2005], local [Meister et al., 2009], or non-metric maps

[Courbon et al., 2009].

Introduction

Figure 1.2: RUGBOT AT ROBOCUP – The RoboCup rescue competition features a very complex

test environment (left), which includes several standardized test elements. The team demonstrated a

combined usage of a tele-operated with a fully autonomous robot at the world championship 2006

(right).

1.2.2 Mapping: from 2D to 3D

The world is three-dimensional, so the most accurate representation must be a 3D map. However,

a two-dimensional map is easier to handle for humans and less computationally intensive. Since

the inherent simplification works most of the time, 2D mapping is popular even when 3D sensing is

present (e.g. in [Thrun et al., 2003]).

Strictly two-dimensional mapping stops to be a useful abstraction of the real world

1. when significant obstacles are invisible to the 2D sensors,

2. when the mapper is not the only user of the map,

3. when multiple levels one above another are to be mapped, or

4. when there is no dominant ground plane.

Condition 1 only occurs when insufficient sensors are used. A typical case is that of a 2D LRF

scanning parallel to the floor and overlooking obstacles above and below the scanning plane. They

cannot be entered into the map, yet they still pose problems to the robot and later to human users

of the map. This situation can be addressed by using 3D sensors. The map might even remain 2D.

Three-dimensional data can be simplified if no intended later user of the map needs it (condition 2).

This breaks down into two different sub-cases: Either a three-dimensional map is desired as the most

accurate representation for human consumers, or a robot with mobility differing from the mapping

robot needs a map for path-planning. Example: A large mapping robot might not fit under a table, so

it enters it as an obstacle. A smaller robot might fit or a flying robot might be able to fly over it. They

cannot fully use their mobility if they rely on a 2D map generated from 3D data which was simplified

based on the large robot’s properties.

Condition 3 can occur when mapping multi-story buildings. This case can be handled with separate

2D maps for each floor (e.g. in [Sakenas et al., 2007]). This approach is limited to intact buildings.

It fails in highly unstructured environments such as outdoors or in partially collapsed buildings

where there is no clear separation into stories.

1.2 3D Mapping

Figure 1.3: An indicator for 3D mapping: Collapsed Building at Disaster City training site in Texas

Condition 4 is the strongest indicator for true 3D mapping (together with 2). In man-made environments

with a well-defined ground-plane (mainly floor, pavement, road) in almost all areas, 2D

scan-matching will work well. The ground-plane does not even have to be perfectly planar. 2D

mapping can still work on slightly hilly terrain: Since scan matching is probabilistic anyway, minor

perturbances are compensated for. In collapsed buildings or very rough terrain, the robot’s roll and

pitch will vary more such that scan matching is no longer possible (see figure 1.3 for an example).

One approach is to compensate this by actuating a 2D LRF such that it is always level [Pellenz, 2007].

However, while roll and pitch may stay the same this way, the robots altitude will change such that the

LRF always hits different obstacles, rendering scan matching moot. Consequently, there is no choice

but to use both 3D sensors and 3D mapping.

1.2.3 3D mapping – state of the art

Attempts at 3D sensing have a long history in mobile robotics [Yakimovsky and Cunningham, 1978,

Moravec, 1980, Harris and Pike, 1988]. These early approaches rely on stereo vision, which, at the

time, only delivered very sparse feature points. This greatly reduced the data load. Still, only short sequences

of images were mapped. Yakimovsky and Cunningham [Yakimovsky and Cunningham, 1978]

present a robot with a binocular stereo camera and an arm. The stereo camera determines the 3D position

of certain image regions. The arm can be controlled by choosing a position on the screen. Harris

and Pike [Harris and Pike, 1988] extract 3D information from an image series from a monocular camera.

Another more recent approach is to assume a flat floor, match scans in 2D, and transform 3D data

accordingly [Howard et al., 2004, Thrun et al., 2000, Hähnel et al., 2003, Thrun et al., 2003]. Howard

et al. [Howard et al., 2004] equipped a Segway base with a horizontal and a vertical laser range finder

(LRF). They steered it around their campus on a 2 km tour. The data from the horizontal LRF was

used for scan matching. They used resulting positions to assemble a 6 million point 3D point cloud in

real time. Thrun et al. [Thrun et al., 2000] use a Pioneer robot base with the same LRF configuration.

They use the same process on several indoor datasets. In one instance, they transform the point

cloud to 82,899 polygons which are then simplified to 8,299. In [Hähnel et al., 2003], this approach

is refined by fitting planes in the complete point cloud to simplify the polygon set more drastically.

When the authors try to extend it to use in underground mines [Thrun et al., 2003], they perform loop

closing with 3D ICP (see below) to compensate for the uneven floor. Sabe et al. [Sabe et al., 2004]

assume presence of a flat floor, but not that it is horizontal. They first detect the ground plane with

Introduction

a 3D Hough transform and then use a 2D obstacle grid on the detected plane for path planning of a

humanoid robot.

Even now, real-time 3D mapping remains a challenge. range cameras return tens of thousands, 3D

laser range-finders even hundreds of thousands of points in each scan. (Taking the higher frequency

of the TOF cameras into account, the data rate is similar.) Stereo cameras return a varying number

of points between these two figures. Operations entailed by the mapping, like rigid transformations

involving quite some multiplications, have to be performed for each point individually.

The most popular approach for mapping with complete point clouds is Iterative closest point

(ICP) [Besl and McKay, 1992]. In each iteration, it correlates points in two point clouds with one

another and then analytically calculates the relative pose that minimize average distance of correlated

points. However, ICP tends to get caught in local minima [Pathak et al., 2010b]. Also, pure ICP

is expensive, so different optimizations have been proposed. [Nüchter et al., 2004] subsample the

point clouds before applying an approximate ICP. [Saez and Escolano, 2004] also subsample point

clouds. Additionally, they assume a flat floor. [Cole and Newman, 2006] focus on effectiveness of

ICP and propose a different error criterion that necessitates an optimization step for relative pose

determination. In [Newman et al., 2006] they detect loops with vision to reduce the computational

load brought about by handling point clouds. [May et al., 2009] apply an ICP based approach to point

clouds from a time-of-flight-camera. Also for point cloud based approaches other than ICP, a main

research focus is reducing the computational load. [Sun et al., 2001] propose a subsampling strategy.

In another approach to tackle the high computational load of 3D data, map representations often

use less memory intensive data structures earlier [Weingarten and Siegwart, 2006, Surmann et al., 2003,

Viejo and Cazorla, 2007] or later [Rusu et al., 2008a] in the pipeline. [Weingarten and Siegwart, 2006]

grow planes in single point clouds and use them for scan matching. Rusu, Marton, et al. register

point clouds together, segment the resulting point cloud, and then fit cuboids into the segments

[Rusu et al., 2008a]. Also in [Rusu et al., 2008b], Rusu, Sundaresan, et al. first register multiple point

clouds and then derive a polygonal model, this time consisting of rectangles in a mesh. Watanabe

et al. [Watanabe et al., 2005] detect line segments in the camera image and matched, though only in

simulation. Surmann et al. [Surmann et al., 2003] operate line by line on a scan from an actuated LRF.

They detect line segments in each scan line and fuse these to planar segments, often over-segmenting

the actual surfaces. Several of these patches are then grouped together in bounding boxes. Likewise

on a single scan, [Unnikrishnan and Hebert, 2003] detects walls as object-aligned bounding boxes. In

contrast, Silveira et al. [Silveira et al., 2006] produce planar patches very early. They combine this

with the generation of 3D information from a sequence of intensity images. However, their patches

are rather sparse and incomplete. [Viejo and Cazorla, 2007] use over-segmenting incomplete planar

patches derived analytically from local neighborhoods in the point cloud. These patches are later on

used for Simultaneous Localization And Mapping (SLAM).

Milroy [Milroy et al., 1996] is the first to use the term “patch” to describe a representation of a

surface. There and in the following (e.g. [Gao et al., 2006]), patches are used in reverse engineering.

Mechanical parts like gear wheel, for which the original plans have been lost are scanned in with 3D

laser scanners. Patches are detected in the point cloud which are then used in computer-aided manufacturing

(CAM) to reproduce the part. Gao et al. [Gao et al., 2006] even go so far as to compensate

for wear.

As we have seen, many attempts at 3D mapping use patches in one form or the other. In this vein,

we propose a data structure for mapping that uses planar patches where possible and general trimeshes

where necessary: the Patch Map. This allows to reduce the data in complexity and volume, while still

allowing for correct collision detection. In chapter 5 we will discuss the data structure in detail and in

chapter 6, we will demonstrate roadmap algorithms on it.

1.3 3D Path Planning

After constructing a 3D map, the next step is to plan a path in it. For online path planning, possible

changes of the map should be taken into account. New data will be added and the relative pose of

previous scans can change due to data registration and relaxation after loop-closing. Therefore, constructing

expensive data structures like a Voronoi diagram should be avoided. Even data-structures

that are computationally inexpensive to construct such as occupancy grids should be avoided to minimize

memory consumption and to maintain flexibility for SLAM.

Probabilistic path planning is typically used for high-dimensional control spaces, but it is also

useful for object maps: [Koyuncu and Inalhan, 2008] and [Andert and Adolf, 2009] use it on extruded

polygons, [Pettersson and Doherty, 2006] on object-oriented bounding boxes (OBB). Probabilistic

path planning algorithms only require sparse sampling which makes their application on object

maps easier than the generation of a derived structure like e.g. Voronoi diagrams. Two popular

probabilistic path planning algorithms are the Rapidly Exploring Random Tree (RRT) proposed by

LaValle [LaValle, 1998] and the Probabilistic Roadmap Method (PRM) proposed by Kavraki et al

[Kavraki et al., 1996]. An overview of uses of the RRT can be found in [Lavalle and Kuffner, 2000]

along with an analysis of how the algorithm converges.

Some examples of RRT usage include [Kagami et al., 2003]. The authors enter 3D surfaces detected

by a stereo camera into an 3D mesh map with the marching cube algorithm and grow a twosided

RRT in the configuration space of the 6-DOF arm of the standing humanoid robot. In contrast,

[Koyuncu and Inalhan, 2008] grow the RRT for a helicopter in a simulated city environment in the

workspace. They later on modify the derived trajectory (using B-splines) to fit the kinodynamic constraints

of the vehicle. A similar approach is used in [Andert and Adolf, 2009]. In their work, a

roadmap is constructed with PRM. Obstacles are represented as prisms, the path is smoothed with

cubic splines. It is also intended for a helicopter, but it also uses both synthetic and real-world data.

PRM was also used in [Pettersson and Doherty, 2006], albeit offline. They conducted real-world experiments

using pre-computed maps consisting of object-oriented bounding boxes (OBB) for a helicopter.

In [Wzorek and Doherty, 2006], the same group additionally uses an RRT. Another helicopter

UAV is used in [Hrabar, 2008]. It uses a 3D occupancy grid for mapping (but no scan-matching or

even SLAM). PRM is used to construct a roadmap on which a D* Lite graph search is performed

in software and hardware simulation. An a priori roadmap on a height map was generated with the

RRT algorithm for a fixed wing UAV in [Saunders et al., 2005]. This is amended by reactive obstacle

avoidance based on 1D LRF, tested in simulation.

The popularity of RRT and PRM can also be seen in the number of proposed variants. To maximize

distance of the roadmap to obstacles, [Wilmarth et al., 1999] propose placing the nodes only

on the maps medial axes (“Medial Axis PRM”). They even avoid having to explicitly calculate the

medial axes and their approach works in any dimensionality. However, we tried it (section 5.5) and

found that the increased amount of time needed does not justify the increased quality of the paths.

A similar approach is presented in [Amato et al., 1998], called “Obstacle Based PRM”. The authors

generate nodes close to obstacles. This follows the observation that that is where planning is hard. In

our case, this approach seems promising at first, since it is relatively simple to generate nodes around

patches. However, due to the nature of the data, patches often intersect and are close to each other

in cluttered configurations. This problem is avoided with the Medial Axis PRM. LaValle himself coauthored

[Kuffner and LaValle, 2000], proposing growing two RRTs at once – one from the starting

point, one from the goal. This brings the problem of having to check for connectability. This entails

many more collision checks. A different kind of improvement is proposed in [Branicky et al., 2001].

Instead of using standard random number generators in roadmap algorithms, the authors propose us-

Introduction

ing pseudo-random numbers for a more even distribution of nodes. In [Andert and Adolf, 2009], this

sampling strategy is compared with the standard random number generator and a third sampling strategy.

Since this work focusses on comparing different map types and different roadmap algorithms,

sampling strategies were out of scope.

Another popular approach to 3D path planning is derived from the A* algorithm [Hart et al., 1968].

For example, [Hwangbo et al., 2007] plans paths for a simulated fixed-wing UAV around polygonal

obstacles with A* planning based on a mesh (with constraints), combined with a local planner. In

[Petres et al., 2007] A* in a continuous domain is used for a simulated AUV in an anisotropic environment

(underwater currents).

Somewhat insular approaches include a graph search on the graph of cells of an occupancy grid for

an UAV with later path refinement [Jun and D’Andrea, 2003] and an Evolutionary Algorithm using B-

splines on a height map for a UAV [Hasircioglu et al., 2008]. Both were tested in simulation only. Path

planning with temporal logic is proposed by [Fainekos et al., 2005], also only tested in simple simulation.

Mark Overmars and Jean-Clause Latombe, who co-authored [Kavraki et al., 1996], propose the

“Corridor Map Method”, which plans 2D paths in 3D environments [Overmars and Geraerts, 2007].

It is parameterizable on the clearance it keeps from obstacles. It is only tested in simulation, albeit

with sophisticated collision detection. The “Elastic Algorithm” is proposed by [Chen et al., 2006]. It

tries to avoid obstacles by first planning a path as if it were a rubber band spanned between start and

goal. This results in paths being very close to obstacles. Also, it is unclear how they intend to find

the way out of spirals or similar configurations. Consequently, they only test on simple maps and also

only in 2D simulation.

1.4 Overview

The rest of this thesis is structured as follows: In chapter 2, we will introduce different methods of

gathering 3D data: stereo cameras, time-of-flight cameras, 3D laser range-finders, and sonars. We

will also discuss the sensors used to gather the data used in this thesis. Furthermore, we investigate

sources of errors and solutions in one specific sensor.

In chapter 3, we present the datasets that have been collected with the sensors from the previous

chapter. These datasets will be used in experiments throughout the thesis. We illustrate the nature of

the data with photos of the sites where they were gathered and renderings of the actual data.

In chapter 4, we demonstrate a method for reactive obstacle avoidance on a mobile land robot.

The method is based on the Hough transform, a method for feature detection. In experiments we

investigate the results achieved in practice.

In chapter 5, we introduce a data structure for 3D mapping: the Patch Map. Its purpose is to

reduce memory consumption of the typically very memory-expensive 3D data. We present several

algorithms that can be performed on the Patch Map and we examine their performance in tentative

experiments on synthetic data.

In chapter 6, we perform extensive experiments with the Patch Map on voluminous real-world

data. We compare the Patch Map to traditional mapping methods like point clouds and trimeshes both

in terms of speed and in terms of accuracy. Chapter 7 concludes this thesis. It summarizes our results

and unique contributions.

Chapter 2

3D Sensing

Mobile robotics in unstructured or semi-structured environments inherently relies on range sensing.

From early on, there have been attempts at 3D range sensing [Moravec, 1980]. But only in the last

decade, computers have become powerful enough to handle 3D data in real time. The most popular

approaches to 3D sensing in mobile robotics are laser range finders and stereoscopic cameras

(stereo cameras for short). Recently, time-of-flight cameras have also become a viable option. In the

underwater domain, 3D sonars are well-established for bathemetry, sea-life surveys, and fishing.

These proper 3D sensors are either expensive, slow, or both. Consequently, there is ongoing

research in extracting 3D information from monocular cameras. The most common approach is structure

from motion (see [Oliensis, 2000] for a survey). While allowing for lower costs, these approaches

deliver data that is lacking in density and reliability. Since the main focus of this thesis is 3D data

processing and not 3D sensing, we will concentrate on dedicated 3D sensors that reliably deliver 3D

data.

All these sensors collect data in different ways and produce raw data in different formats. In the

end, however, all 3D data is transformed into a 3D point cloud, a set of Cartesian points. Point clouds

serve as an abstraction that allows diverse geometric algorithms to work with data from different

sensors. Typically, point clouds maintain a structure that mirrors the layout of the sensor. For example

in a range camera, the points may be serialized row-wise.

In the following, we will discuss the 3D sensing methods mentioned above in principle and in

concrete systems. Table 2.1 gives an overview of the basic properties of the sensors surveyed here.

We will investigate a time-of-flight camera in detail and propose a remedy to the error source most

inhibitive for 3D mapping.

2.1 3D laser range finders

Laser range finders (LRF) rely on the time of flight principle: They send out a laser beam and measure

the time it takes for it to be reflected back. Based on the known constant speed of light c, the distance

to the reflecting surface is calculated and reported. The advantages of laser beams over regular light

are that they reduce scattering and interference, resulting in more reliable measurements and higher

range. The downside is that laser beams are more difficult and energy-intensive to generate.

Single laser range finders are used together with quickly actuated mirrors to scan in multiple

directions. Typically, these directions all lie in one plane (2D LRF), but there are also systems available

which distribute the beam three-dimensionally. Please refer to [Point of Beginning, 2006] for a

survey. These systems are intended for geodesic usage and hence provide high resolutions and preci-

3D Sensing

Table 2.1: SPECIFICATION OF THE 3D SENSORS – JR=Jacobs Robotics, NIR=near-infrared,

TOF=time-of-flight, PS=phase shift, FOV=field of view

ALRF SwissRanger Stereo Camera Eclipse

Manufacturer SICK/JR MESA Img. Videre Design Tritech Int’l

Model S300 SR-3000 STOC Eclipse

Medium Laser NIR light light ultrasound

Principle TOF, direct TOF, via PS Feature triangul. TOF, direct

Range 0 − 20 m .6 − 7.5 m .686 − ∞ m 0.4 − 120 m

Horiz. FOV 270 ◦ 47 ◦ 65.5 ◦ 120 ◦

Vert. FOV 180 ◦ 39 ◦ 51.5 ◦ 30 ◦

Resolution 541 × 361 176 × 144 640 × 480 241 × 31

Approx. data freq. 0.1 Hz 4.6 − 57 Hz 30 0.1 Hz

sion. They are not suited for mobile robotics due to their high weight, power consumption, and slow

scanning speed.

Mobile roboticists have hence taken to generating 3D data with 2D laser scanners. One popular

approach is to rotate the device around an axis in steps, taking one scan at every step. The

single scans can later be combined into a 3D scans since the angle at every step is known. The rotational

axis is typically either perpendicular to the ground (creating yawing motion) or parallel to

the ground and perpendicular to the forward direction of the robot it is mounted on (creating pitching

motion). The actuation may also be continuous, be it yawing [Wulf and Wagner, 2003] or pitching

[Cole and Newman, 2006].

Another approach to generating 3D data with a 2D LRF in mobile robotics is to mount it such that

the scanning plane is not parallel to the ground. In this variant, the device is not actuated, but due to

the motion of the robot, 3D data can be generated [Hähnel et al., 2003]. It is also possible to combine

these two approaches: [Viejo and Cazorla, 2007] actuate the LRF while the robot is moving. Static or

actuated LRF, when the robot is moving between 2D scans, the precision of the 3D data relies on the

precision of the relative poses of the robot.

At Jacobs Robotics, we use a stepwise pitching SICK S300 LRF. The robot stops for each scan.

It has a horizontal field of view of 270 ◦ of 541 beams. The minimum and maximum pitching angles

and the angle step can be configured dynamically, typical values are +90 ◦ and −90 ◦ .

2.2 Stereo cameras

Stereo cameras are a system of two cameras mounted in a specific relative position. Visible light

cameras are most common, but infrared can also be used [Hajebi and Zelek, 2007]. The position

of features in both camera’s images are correlated resulting in a three-dimensional position for that

feature.

In most systems, the two cameras are mounted next to each other since knowing the relative

position of the cameras accurately is crucial for the calculation of the 3D positions. Early stereo

cameras include the JPL stereo system [Yakimovsky and Cunningham, 1978]. Nowadays, one can

buy stereo systems off the shelf.

Figure 2.1 gives an overview of the functional principle of stereo vision. A feature of an object

2.2 Stereo cameras

object

r

f

b

f

d

left image

l

d

r

right image

Figure 2.1: FUNCTIONAL PRINCIPLE OF STEREO VISION – Blue lines are beams, black lines are

distances. After [Konolige, 1999].

is perceived differently by two cameras with co-planar image planes. Assuming a point feature, the

feature will project to a point on the image plane. The distance to the image center is d l for the left

image and d r for the right one. The distance between the two focal points is the baseline b. The

distance of the feature to the baseline, the range r, is calculated by

r =

bf

d l − d r

(f is the cameras’ focal length). Consequently, stereo vision is dependent on features being in the

image.

In commercially available systems, the above assumptions hold: The image planes are co-planar

and the cameras are identical, especially the focal lengths are the same. Furthermore, the image planes

are precisely mounted next to one another such that the imagers’ rows are co-linear.

In other systems, freely placed but calibrated cameras are used. These may even be more then

two (e.g. in [Mulligan et al., 2002]). The approach “structure from motion” applies triangulation, on

which stereo vision is built, to consecutive images from a single camera at different positions (see

[Oliensis, 2000] for a survey).

The stereo camera used in this work is the stereo-on-a-chip (STOC) camera from Videre Design

LLC 1 [Videre-Design, 2006, Konolige and Beymer, 2006]. The cameras have a resolution of 640 ×

480 which puts the upper theoretical limit of points per point cloud at 307,200. Since the range for

a pixel can only be determined if it is part of a feature, the point clouds are typically considerably

smaller. A distinctive feature of this camera is that the stereo computations are done in a dedicated

hardware inside the camera, so that the processing load of the CPU is available for other purposes.

1 http://www.videredesign.com/

3D Sensing

φ

B

A

τ 1 τ 2 τ 3 τ 4

Figure 2.2: WAVE PROPERTIES AND THEIR MEASUREMENT – Phase ϕ, amplitude A, and intensity

B can be derived from four samples τ i . After [Oggier et al., 2003] and [Lange, 2000].

2.3 Time-of-flight cameras

Time-of-flight (TOF) cameras use an active sensing approach to give a range measurement for each

pixel in the image. They typically work with near-infrared light emitted by a modulated light source.

The phase of the light coming in at each pixel of the imager is compared to the current phase at the

light source. This allows the phase shift to be determined and from that, the time of flight of the

light. As with LRF, due to the speed of light c being constant, the time of flight is proportionate to the

distance.

A common approach in TOF cameras is measuring the TOF indirectly by determining the phase

shift ϕ of the emitted modulated light. To this end, the received light in each pixel is sampled four

times at 0°, 90°, 180°, 270°, see figure 2.2. (These numbers refer to the phase of the illumination unit.)

Using these measurements τ i , phase, amplitude A, and intensity B can be determined [Lange, 2000].

ϕ = atan τ 3 − τ 1

τ 2 − τ 0

(2.1)

A =

√

(τ3 − τ 1 ) 2 + (τ 2 − τ 0 ) 2

2

(2.2)

B = τ 0 + τ 1 + τ 2 + τ 3

4

(2.3)

Because of the range of the atan function, ϕ will always lie in [0 ◦ , 360 ◦ [, even if the actual phase

shift was higher. This range is called the non-ambiguity range. Its metric dimension depends on the

modulation frequency of the illumination unit. At 20 MHz, it corresponds to [0 m,7.5 m[.

For further details of the functional principle of time-of-flight cameras, see [B. Buettgen et al., 2005,

Lange, 2000, Oggier et al., 2003].

2.4 Sonar

(a) Distance image (b) Amplitude image (c) Color image (not by SR)

Figure 2.3: A simple scene

2.4 Sonar

While the previous three sensing methods rely on light, sonar relies on sound. Sonars are similar to

time-of-flight range cameras in that they send out a signal and measure the distance to any reflectors

based on the time-of-flight of the reflected signal. Differences arise because of the difference between

light and sound. Light is a transverse wave, i.e. oscillating perpendicular to the direction of propagation.

Sound is a longitudinal wave, i.e. oscillating in the direction of propagation. This means

that sonars need phased arrays to send and receive sound only in one specific direction. Phased arrays

consist of a series of combined sender/receiver units each of which works with a phase that is

incrementally shifted along the array. A short introduction into multi-beam sonar can be found in

[Mayer et al., 2002]. For a general introduction into sonar, refert to [Waite, 2002].

The sonar used at Jacobs Robotics is an Eclipse by Tritech International. It operates with ultrasound

at 240 kHz. The maximum range is variable and inversely proportionate to the data rate. With

a maximum range of 100 m, the data rate is 7 Hz, with 5 m, it is 140 Hz. (This data rate does not refer

to a complete 3D scan but rather to a single column.) What sets the Eclipse apart from the light-based

sensors in terms of data structure is that it can receive multiple echos per beam. This means that it

does not have a fixed maximum number possible data points. It also means that data has to be postprocessed

to conform to algorithms assuming point clouds as delivered by light-based sensors. To this

end, only the echo with highest intensity in each beam is retained.

2.5 Case study: The SwissRanger time-of-flight camera

2.5.1 Introduction

We studied two models of the SwissRanger time-of-flight camera: the SwissRanger SR-3000 and

its predecessor SR-3100. They were originally developed by the “Centre Suisse d’Electronique et

de Microtechnique” (CSEM), today they are maintained and sold by the spin-off MESA Imaging 2 .

There are only minor variations between the two. Evaluations of the predecessor SR-2 can be found in

[May et al., 2006] and [Gut, 2004]. The sensors connect to a computer via USB. They are delivered

with an API and example applications for Windows, Linux and Mac OS X. For Windows, three

applications are available: a calibration program, a viewer with various adjustable parameters and a

2 http://www.mesa-imaging.ch/

3D Sensing

(a) Normal image

(b) SwissRanger distance image

(d) Picture of the point cloud of the stereo camera

Figure 2.4: SAMPLE SCENE SHOT WITH DIFFERENT (3D) CAMERAS – (a) standard camera;

(b) SwissRanger SR-3000 (gray-values encode the measured distance); (c) Videre STOC stereo camera

disparity image; (d) corresponding point cloud

sample application. For Linux, only a viewer program written in Java is available 3 .

In figure 2.3, there is an example for a simple scene. From left to right you can see the grayscale

encoded distance image 2.3(a), the amplitude image as delivered by the SwissRanger 2.3(b) and a

color picture taken by a webcam directly next to the SwissRanger 2.3(c). The sensor generates range

images as well as grayscale images. The first correspond to the measured phase shift, the second to

the amplitude of the signal. For a comparison to data delivered by a stereo camera, please refer to

figure 2.4.

In the images presented here, measured ranges are encoded in grayscale where brighter means

nearer. White pixels indicate either that the measurement could not be performed correctly, or that the

pixel’s measurement was discarded, mostly because too little modulated light was received. Furthermore,

the range are Cartesian, i.e. the distance to the image plane, not the distances to the focal point

that are calculated as an intermediate step. In table 2.2 we present selected technical data given at the

SwissRanger website and, where applicable, compare it to our own measurements.

3 The choice of Java proved to be a disadvantage when running this program remotely on the robot with ssh and X forwarding.

While our self-developed viewer (written in C++ using Qt) runs at a decent speed, the Java viewer is considerably

slower as compared to local execution.

2.5 Case study: The SwissRanger time-of-flight camera

Parameter Unit Value Note Verification

Resolution Pixel 2 √

176 × 144 –

√

Power supply V 12 –

Power consumption W 12 depends on integration time 4 3 – 11.04

√

Modulation frequency MHz 5 – 40 only 8 possible values

Update rate Hz ≤ 50 depends on integration time 4.6 – 57

2.5.2 Parameters

Table 2.2: Technical Data

In this section we will discuss the different parameters of the camera which can be set via the

driver API. Information stem from an electronic manual [MESA Imaging AG, 2006], comments in

the header file of the driver and our experiment. Generally it can be said that documentation is work

in progress.

The most important parameter is the the Amplitude Threshold (AT). When used with this parameter,

the camera only returns range values for pixels with a certain minimal amplitude and sets the range

for the filtered out pixels to zero. This parameter is useful for eliminating various kinds of errors as

discussed later on. Roughly speaking, a higher AT leads to less data points but the data points are of

higher quality. It is, however, rather coarse in many applications, i.e. it produces either many false

positives or false negatives. An example showing the effect of setting an AT can be found in figure 2.5.

Figure 2.5: A LAB SCENE AT DIFFERENT AMPLITUDE THRESHOLDS – On the left, there is the scene

shot with a normal camera, next to it the effects of the Amplitude Threshold (AT) on range images

with a SR-3000. The AT used are 0, 30, and 75.

While AT can serve as a basic remedy in many situations, its problem is to find a suited amplitude

threshold for the current conditions in a particular environment. This can also be seen in figure 2.5.

There is random noise on some objects. This most prominently occurs on dark objects like the black

computer screens and on the glass bricks left of center on the upper edge of the image. If the AT is

chosen too high, correct data points are discarded (rightmost image). If it is chosen too low, invalid

points are kept (second image from the right).

For the reader to get an impression of what different AT values mean in practice, we have annotated

the example pictures in the following with the value that was used acquiring the images.

Integration Time

Another important parameter is the integration time. It is the time used for acquiring each frame. The

perceivable brightness of the illumination unit’s LEDs increases with the integration time. The inte-

4 Please note that the power consumption depends on the last set integration time even if the SwissRanger is not used.

3D Sensing

gration time determines the frame rate and also the power consumption as it influences the brightness

of the illumination LEDs. The quality of the images can increase with the exposure time, but also the

minimal distance increases.

Auto-illumination

An auto-illumination feature for the device can be used to automatically determine integration time

and illumination intensity. The user can supply a certain range for the integration time and the best

value within the boundaries is chosen. This setting yields good results, so it should be used unless

there are reasons not to do so, e.g., the need to save energy or to achieve a certain frame rate.

Modulation Frequency

There is a trade off between range and accuracy. With higher modulation frequencies measurements

get more exact but the non-ambiguity range will decrease. While the SR-3000 supported setting a wide

range of modulation frequencies (5, 6.6, 10, 19, 20, 21, and 30 MHz), the SR-3100 only supports 19,

20, and 21 MHz. Configuration is only supplied for 20 MHz for both models, making the use of any

other frequency inadvisable. However, especially the low values would be interesting for applications

domains that involve large distances like 3D mapping. A modulation frequency of 5 MHz corresponds

to a non-ambiguity range of 30 m.

2.5.3 Accuracy

To measure accuracy, range images were captured of a planar board 1.2 m × 2 m which was posed

at 8 different distances from the SwissRanger: at 1 m, 2 m, ..., 7 m, and at 7.4 m, just short of the

SwissRanger’s maximum distance of 7.5 m. At each distance, 100 samples were taken. Figure 2.6

shows how the range measurements on the central 6 × 6 pixels were distributed. These pixels were

chosen since they are least error prone. Also the central small section are guaranteed to lie on the

board, even at greater distances when its perceived size decreases. One can see that the uncertainty

increases super-linearly with the distance.

2.5.4 Types of Errors

When working with the SwissRanger, several kinds of errors occurred. Partially, this can be attributed

to the CMOS technology being relatively recent. Each pixel contains quite some functionality (binning,

phase correlation) which makes for a demanding manufacturing process. Some of the errors,

however, are inherent in TOF measurement.

Following [May et al., 2009], we divide the errors into random errors and systematic errors.

While random errors have a null arithmetic mean when gathering enough samples, systematic errors

do not.

In the following, each type of error will be explained by means of an example. In these example

pictures, measured distances are encoded by grayscale. Brighter means closer, darker means further

away. For a white pixel, there was no useful information measured, mostly because too little light was

received. Some of the range images are accompanied by a photo to better illustrate the scene. These

are acquired not by the SwissRanger itself (it can only capture grayscale images), but by a camera

right to the left of it, looking down a bit.

2.5 Case study: The SwissRanger time-of-flight camera

Figure 2.6: DEVIATION AT DIFFERENT DISTANCES – Boxes are quartiles, whiskers are extrema.

Random Errors

Inhomogeneous Lighting The main portion of the modulated light emitted by the LEDs is centered

in the middle of the sensor. So when using a relatively low exposure time, the corners of the picture

do not get enough light for decent measurements. An example is given in figure 2.7(a). Please note

that the scale for this image is extremely short; the distance between white and black is 40 cm. When

only the pixels with highest amplitude are used, i.e., AT is accordingly adjusted, the circular shape

of the lighting becomes visible (figure 2.7(b)). Additionally, the peripheral lighting inhomogeneity

further worsens the limitation of the SwissRangers’ small field of view (47 ◦ × 39 ◦ ).

Light scattering A further problem is that objects close to the camera can make objects near them on

the picture appear closer to the camera than they are; especially if the far objects are dark and the material

of the near object is bright, i.e., highly reflective for near-IR light. (see figure 2.10). The problem

is multi-path reflection within the camera: Light not absorbed by the imager is reflected to the back of

the lens and from there back to the imager. This is also described in [MESA Imaging AG, 2006] and

in [May et al., 2006].

Bright background light The SwissRanger is very sensitive to ambient light conditions. Especially

sunlight requires some manual tuning of AT. According to the specifications, the SwissRanger can

suppress background light of up to 400 W/m 2 /µm, which is half of the possible sunlight on the surface

of the earth. When this limit is exceeded, random noise will appear (see figure 2.8). Noise from

external light sources (lamps) has also been described by [May et al., 2006]. The common cause is

that these light sources emit a considerable amount of near-infrared light which disturbs the sensing.

3D Sensing

(a) When using a short exposure

time and AT=0, there is a significant

amount of noise in the corners

of the range image

(b) When increasing the AT to

500, any information from these

regions is completely discarded

and a almost perfect circular

shape caused by the narrow cone

of the modulated illumination

can be seen

across 100 samples with a

distance of 1 m. The variance is

between 5.606 × 10 −6 (black)

and 3.868 × 10 −4 (white), a

non-linear color-scale is used

(4th root).

Figure 2.7: INHOMOGENEOUS LIGHTING ON WHITE HOMOGENEOUS SURFACE – Though the Swiss-

Rangers already have an extremely limited field of view, artifacts from the focused modulated light

sources are apparent.

(a) Photograph of a scene with some

sunlight.

(b) Random noise due to bright light,

AT=0

Figure 2.8: SR ERRORS CAUSED BY AMBIENT LIGHT – The SwissRanger is very sensitive to ambient

light conditions. When confronted with a scene including a bit of sunlight (left), a significant amount

of noise occurs with AT=0 (center). Only when tuning AT, here to 300, the problem is solved (right).

2.5 Case study: The SwissRanger time-of-flight camera

(a) Distance image

(b) Real image of the scene

Figure 2.9: AMPLITUDE INFLUENCES RANGE MEASUREMENT – The sticker on the chair appears

correctly, the main part of the chair does not.

Systematic errors

Range ambiguity: wrap-around There is a maximum distance which can be measured. Obstacles

which are further away will appear somewhere in the range between 0 and this maximum distance.

This is inherent to measuring the time-of-flight indirectly via phase shift. A shift of 10° cannot be

told apart from a phase shift of 370°and both phase shifts will be associated with the same distance

information. We call this the wrap-around error. See figure 2.5 for an illustration of this problem: The

background wall in the top center of the image is light gray, indicating very low reported distances.

There are a few possible ways around the errors caused by wrap-around of the phase shift. The

simplest, using built-in camera parameters straightforwardly is to apply an amplitude threshold as

mentioned before. This works quite often, as the amount of light reflected by an obstacle is proportional

to its distance to the source of light. But when dealing with obstacles with different reflective

properties, this fails.

Another possibility is to use two modulation frequencies in an alternating way. As each modulation

frequency entails a specific non-ambiguity range, the pixels of alternating frames wrap around

at different distances. Thus, the errors in one frame can be evened out by comparing it to another

frame taken at a different modulation frequency. While frames retrieved by this method still contain

some errors and are also not totally dense, this method basically works. The downside is that the

frame rate is reduced, especially as two frames have to be dropped immediately after the change of

the modulation frequency. Also, for modulation frequencies other than 20 MHz, the SwissRangers

are not calibrated. A further disadvantage is that the vulnerability to motion blur is increased.

In the following, we will try to remedy the wrap-around error with the built-in AT. In section 2.5.5,

we will propose a better solution.

Amplitude related range error For pixels with a very low amplitude, the reported range will be

too low, see figure 2.9. This occurs in objects with poor reflectivity for near-infrared light. Table 2.3

show typical reflectivity values for common materials. As a rough orientation based on these values,

wood pallets (25%) offer enough reflectivity for the SwissRanger, asphalt (17%) is critical, and tires

(2%) reflect decidedly too little. A remedy to this problem has been proposed in [May et al., 2009].

3D Sensing

Table 2.3: REFLECTIVITY OF DIFFERENT MATERIALS FOR LIGHT WITH 900 NM WAVELENGTH –

Reproduced from [Lange, 2000].

material reflectivity material reflectivity

White paper up to 100% Carbonate sand (dry) 57%

Snow 94% Carbonate sand (wet) 41%

White masonry 85% Rough clean wood pallet 25%

Limestone, clay up to 75% Smooth concrete 24%

Newspaper 69% Dry asphalt with pebbles 17%

Deciduous trees typ. 60% Black rubber tire 2%

Coniferous trees typ. 30%

(a) Erroneous image due to bright obstacle (left quarter

of the image)

(b) Without the obstacle no pixel appears to close

Figure 2.10: LIGHT SCATTERING – Stray light from the foreground object makes background obstacles

appear closer than they are (AT 100).

2.5 Case study: The SwissRanger time-of-flight camera

Figure 2.11: A ghost image of the rod can be seen on both edges of the image (circled).

Ghost image An other form of irregular distortion appears when an object is close to the camera.

Then, a faint ghost image of it will often appear on the opposite of the frame. The actual object

is depicted correctly though (see figure 2.11). Unfortunately, this error can only be convincingly

illustrated with moving images: The ghost images will move synchronously with the actual object.

Fixed pattern noise Even when calibrated correctly, the camera will show some distortions at the

sides of the image when showing a plane.

Movement Yet another source for distortions are moving objects (figure 2.12). When sampling

them, there are always distortions at their edges, even with low exposure times. (see figure 2.12).

This has already been noted in [MESA Imaging AG, 2006]. The error also occurs in slowly moving

objects, albeit to a lesser degree.

The cause is that the phase is measured via four samples. If some of the samples are taken while

an object is present and otheres while it is not, nonsense values for will be derived for the affected

pixels.

Reflection The SwissRangers behaves much like a regular camera and it is hence can be mislead

by reflections. When a reflective surface appears in their view, the SwissRangers will not measure

the distance to where the reflection occurs, but the perceived distance to the objects visible on the

reflective surface. The measured distance is then the distance from the camera to the reflective surface

plus the distance from the surface to the object. An example is shown in figure 2.13.

3D Sensing

(a) Range image of a non-moving rod. (b) The rod moving down fast (about (c) The same rod moving up slowly

10 m/s). The large blue and green bars (about 0.25 m/s). The errors become

are movement errors.

smaller but are still present.

Figure 2.12: Errors caused by a moving object (AT=500).

(a) Photograph of a scene with reflections

due to a glossy floor.

(b) Distance information with

AT 500

Figure 2.13: Reflections, here on a floor, cause the measurement of false distances.

2.5 Case study: The SwissRanger time-of-flight camera

(a) 60 cm (b) 55 cm (c) 45 cm

Figure 2.14: An object at various too short distances to the camera, AT 0

Minimum distance Objects that are too close to the camera give very great distortions (see figure

2.14). Please also note that the background shade changes too, indicating that the error at the

center also has an impact on the range measurement outside the distorting object. This is due to light

scattering as described above.

2.5.5 A novel solution to wrap-around: Adaptive Amplitude Threshold

Many of the errors presented above either have a null arithmetic mean or they can be tackled with

calibration [Fuchs and Hirzinger, 2008, Fuchs and May, 2007]. Also, some errors can be tackled to a

degree by setting an amplitude threshold: Insufficient received light, inhomogeneous lighting, bright

background light, and light scattering. Depending on the scene, amplitude thresholds between 75 and

500 have successfully been used. The downside is that the number of useable pixels also decreases

considerably.

Furthermore, especially the wrap-around error is not easily handled thusly. As we have seen in

section 2.5.4, it arises as the SwissRangers cannot detect whether a measurement is erroneous due to

being beyond the maximum range. This is because it measures distance via phase shift of the light

emitted from its own light source. Since phase shift is assumed to be in [0 ◦ , 360 ◦ [, a phase shift of

370 ◦ is measured as 10 ◦ . Obviously, this also corrupts the distance measurement. An object beyond

the maximum distance of 7.5 m (assuming a modulation frequency of 20 MHz), which is e.g. at 8 m, is

reported as being at 0.5 m. This the wrap-around error is a fundamental problem of any time-of-flight

measurement via phase shift.

The standard solution to this problem is to set an AT, i.e., to drop pixels which are too dark. As

will be shown in the experiment section, this solution is very crude and produces large portions of

both false negatives and false positives.

In the following section we first explain how we identify pixels with a wrapped around measurement,

then we use this knowledge and the wrong measurement to extend the SwissRanger’s range.

This has already been published [Poppinga and Birk, 2009].

Identification of erroneous pixels

The proposed approach is to sanity-check the reported distance by relating it to the amplitude. The

basic idea is that a wrapped around pixel p will be reported with a low range value r p (measured from

the image plane), but its amplitude a p is significantly darker than it were in case of a correct short

range. Since the illumination is not uniform, the pixel’s position in the image array x p , y p is also

3D Sensing

taken into account. These two factors are used to calculate a minimum expected amplitude for each

pixel:

a p > aw pAAT

rp

2 , (2.4)

where AAT is the advanced amplitude threshold and aw p is the amplitude of pixel p when viewing a

white wall at roughly one meter, approximated by

aw p := Â − ((x p − δx) 2 + (y p − δy) 2 ) (2.5)

(Â being the amplitude constant). The approximation is reasonably close, except for the very edges

of the image (see figure 2.15). An alternative would be to use a look-up table storing empirical values

for each pixel. However, we chose the more memory-efficient method of approximation.

The reasoning for relating distance and amplitude inversely quadratically is that the measured amplitude

is based solely on light emitted by the camera’s illumination unit, hence it decreases quadratically

with the distance. The second observation is that the amplitude produced by the imager decreases

towards the edges of the images (see figure 2.15(a)). This is an effect which is common to time-offlight

cameras due to the usage of multiple near infrared LEDs as light sources. They are grouped in a

pattern around the camera’s lens. Hence, the center of the image is better illuminated than the edges.

To compensate for this effect, the amplitude is also normalized.

In the SR-3000, according to our measurements, the center of the amplitude pattern is not in the

center of the image. Instead, it is shifted upward. Consequently, the offset in y direction is shifted,

namely δy = 61. In x direction it is at half the image height, i.e., δx = 88, as one would expect.

In the SR-3100, the image is brightest in the image center. In experiments calibrating the parameters

Â (amplitude constant) and the AAT in various scenes, we determined the values 12, 000 and 0.2

respectively as working very well.

Additional Error Sources

Material that reflects near-IR light poorly naturally appears very dark to the SwissRanger. Objects

made of such materials can be filtered out alongside with the wrapped-around pixels. As the measurement

of these objects tend to be noisy and sometimes also offset, filtering the resulting pixels out is

desirable.

Furthermore, bright sunlight can cause areas of random noise in the SwissRanger. This also

applies to indirect perception through glass bricks or when reflected by other objects. Most of these

pixels are filtered out by the proposed method, except for those which by coincident have an acceptable

combination of amplitude and distance.

As a third case, pixels that appear to close due to light scattering are excluded. As mentioned in

[CSEM, 2006], bright light from close obstacles can fail to get completely absorbed by the imager.

The remaining light can get reflected back to the imager e.g. by the lens. This way, rather many pixels

can get tainted, showing too close values. A large portion of these will be excluded by the presented

criterion.

Correction of wrong values

Once the incorrect pixels according to the advanced amplitude threshold are identified, they do not

have to be dropped. Since the cause of the error known and constant, the error can be undone. We call

this process unwrapping.

2.5 Case study: The SwissRanger time-of-flight camera

(a) Grayscale image of a white wall as captured by

the SwissRanger

(b) Measured intensity and its approximation by eq. 2.5

Figure 2.15: Irregular distribution of near-IR light from the camera’s illumination unit

3D Sensing

(a) Rack and door (b) Corner window (c) Gates

Figure 2.16: THREE TEST SCENES COMPARING THE PROPOSED METHOD TO THE STANDARD

METHOD – Normal images of the scenes from a webcam next to the SwissRanger are shown, range

images are in figure 2.17.

To unwrap a pixel, the corresponding beam is considered. The length of the non-ambiguity range

(7.5 m at a modulation frequency of 20 MHz) is added. The new end point of the beam is considered

the actual measurement. If the pixel was correctly identified as wrapped-around once, it gives the

correct range.

Given a spacious environment with reflective surfaces, pixels can also wrap around more than

once. However, depending on the type of environment, obstacles further away than 15 m are only

rarely encountered. In roomy environments it may be advisable to use a low AT in addition to the

AAT to filter out multiply wrapped-around pixels which typically have a very low amplitude. Another

problem with unwrapping is that pixels filtered out for other reasons (e.g. from obstacles with poor

reflectivity) are misinterpreted as being far away. In practice, rather few pixels are affected. Still, it

has to be considered for each application if a dense and partially incorrect image with a doubled range

is prefarable, or a sparse, short ranged but almost perfectly correct one.

2.5.6 Experiments and Results

We applied the approach presented in the previous section to snapshots taken around our lab with the

SwissRanger SR-3100. First, we only excluded wrong pixels. To compare it to the standard abilities of

the device, we also used its default error exclusion technique, the AT (excluding pixels only based on

amplitude, regardless of their distance). Depending on the situations, different AT work best. These,

however, have to be manually chosen every time. Unfortunately, when using a widely applicable AT

value, this standard technique excludes far too many pixels (60.3 % on average). As an alternative, we

use a lower AT value which preserves more pixels (it only discards an average of 44.8 %). As can be

expected, quite some of these are wrong. In figure 2.17, three example images are presented together

with a complete distance image of the scene and a color image that was taken by a web-cam which is

mounted on the robot right next to the SwissRanger. Refer to figure 2.16 for webcam images of the

scenes.

As mentioned earlier, not only wrapped around pixels can be filtered out. This is illustrated in

figure 2.19. Here, the improvement in performance towards the conventional method is not as big as

with the wrap-around, but still apparent. For a summary of all results, refer to table 2.4.

The exclusion criterion (eq. 2.4) consists of two factors: Wrap around is detected by comparing

the reported amplitude to inverted squared reported range on the one hand and by making the threshold

dependend on the pixels position in the image on the other. Figure 2.20 shows the contributions of the

two components to the overall result.

The information which pixels are invalid due to wrap-around can be used to correct their value.

2.5 Case study: The SwissRanger time-of-flight camera

(a) Unfiltered distance image:

wrap-around in the corner and

behind the door

(b) Unfiltered distance image:

wrap-around in the corner,

noise on the window

mostly wrapped around, noise

on windows

(d) Proposed method: few

wrong pixels remain – 16.7 %

d. p.

(e) Proposed method: no

wrapped-around pixels remain

– 21.7 % d. p.

(f) Proposed method: no wraparound,

very little noise –

56.3 % d. p.

(g) AT 160: some wrong pixels

remain, correct ones are discarded

– 46.3 % d. p.

(h) AT 160: some wrong pixels

remain, correct ones are discarded

– 43.3 % d. p.

(i) AT 160: almost all valid

pixels removed while some

wrap-around is kept – 92.8 %

d. p.

(j) AT 240 – 60.3 % d. p. (k) AT 240 – 54.7 % d. p. (l) AT 240 – 2 valid pixels

Figure 2.17: THREE TEST SCENES COMPARING THE PROPOSED METHOD TO THE STANDARD AT

METHOD OF THE SWISSRANGER – Two AT are values presented: 240, high enough to exclude every

wrapped around pixel and 160, which only almost high enough, but not as over-restrictive. For each

filtered image, the fraction of discarded pixels (d. p.) is given. Webcam images in figure 2.16.

3D Sensing

(a) Close boxes (b) Dark material (c) Sun

Figure 2.18: WEBCAM IMAGES FOR SCENES DEMONSTRATING THE CORRECTION OF ERRORS

OTHER THAN WRAP-AROUND – The webcam is mounted directly next to the SwissRanger. The

SwissRanger images can be found in figure 2.19.

Table 2.4: Results on all six scenes

Scene Discarded Pixels [%]

Proposed method AT 160 AT 240

Rack and door 16.7 46.3 60.3

Corner window 21.7 43.3 54.7

Gates 56.3 92.8 100

Close boxes 38.1 35.0 –

Dark material 13.1 28.0 43.1

Sun 44.9 73.1 84.6

mean 31.8 53.1 68.5

median 44.8 29.9 60.3

2.5 Case study: The SwissRanger time-of-flight camera

(a) Unfiltered distance image: (b) No wrap-around, but noisy

Dark obstacles in the back appear

too close

dark

and too close trouser due to

material

(d) Unfiltered distance image

with objects removed, for comparison

(e) Proposed method: dark

pixels removed – 13.1 % d. p.

(f) Proposed method: very little

noise remains – 44.9 % d. p.

(g) Proposed method: correct

pixels of both back- and foreground

kept – 38.1 % d. p.

(h) AT 160: dark objects discarded,

on edges more than in

center – 28.0 % d. p.

(i) AT 160: no noise, but parts

of floor and locker removed –

73.1 % d. p.

(j) AT 160: pixels in the front

preserved, those in the back

gone – 35.0 % d. p.

(k) AT 240 – 43.1 % d. p. (l) AT 240 – 84.6 % d. p.

Figure 2.19: SWISSRANGER IMAGES FOR SCENES DEMONSTRATING THE CORRECTION OF ER-

RORS OTHER THAN WRAP-AROUND – From left column to right: light scattering, too low reflectivity,

and sun light. Here, an AT of 160 is sufficient, most AT 240 images are still given for completeness.

We also give the percentage of discarded pixels (d.p.). Webcam images of the scenes are in figure 2.18.

3D Sensing

(a) Full image

(b) Intensity image

(e) Only relating distance and amplitude: Wrap-around (f) Only using position dependent amplitude threshold:

removed, but too many excluded pixels, esp. on the Not increasing strictness towards the edges (as in (d) and

edges of the image

(e)), but keeping wrap-around

Figure 2.20: DIFFERENT ERROR EXCLUSION CRITERIA – Conventional and proposed error correction

and the latter’s components, exemplified on the scene in figure 2.16(b)

2.5 Case study: The SwissRanger time-of-flight camera

(a) Webcam image

(b) Distance image

(d) Invalid pixels corrected

Figure 2.21: An example for correcting wrapped-around pixels

This is demonstrated in figure 2.21. One problem here is that some of the pixels classified as invalid

are not wrapped around but excluded due to other reasons, e.g. because the surface they represent has

poor reflectivity. These very few pixels hence get a wrong value in the corrected image.

50

3D Sensing

Chapter 3

Datasets

In this chapter, we will present 3D datasets collected with the sensors presented in chapter 2. The

datasets are used in the experiments in the following chapters. They were either collected with Jacobs

Robotics’ mobile land robot, the “Rugbot” (short for “rugged robot”, see figure 3.1) or with standalone

sensors. Apart from the 3D data, the Rugbot also collects data from its other sensors, most

notably from the gyroscope and odometry from its motor encoders. However, this data is not used

here.

Pan-Tilt-

Zoom

Camera

Laser

Range

Finder

(LRF)

Inclined

LRF

Webcam

Thermo

Camera

Stereo

Camera

Swiss

Ranger

Figure 3.1: THE AUTONOMOUS VERSION OF A Rugbot WITH SOME IMPORTANT ON-BOARD SEN-

SORS POINTED OUT – The SwissRanger SR-3000 and the stereo camera deliver 3D data.

The datasets have been divided into three groups according to the sensor used: range cameras,

actuated laser range-finder (ALRF), or sonar.

Datasets

(a) Planar: a box

(b) Planar: a ramp

(d) Round: an umbrella

(e) Holes: a box with a hole

(f) Holes: a shelf

Figure 3.2: EXAMPLE RANGE IMAGES FROM THE PLANAR/ROUND/HOLES DATASET – This dataset

is characterized by hand-chosen simple scenes, mostly with one object, sometimes with several.

3.1 Datasets with range cameras

Figure 3.3: A SWISSRANGER POINT CLOUD FROM THE ARENA DATASET – On the left a perspective

view of the point cloud, on the right, the point cloud has been entered into a grid to give the reader

an idea of its geometry. The viewing points and directions match. The point cloud has been rotated

by 33° so that the recorded wall lines up with the grid. A grid cell is only shown if it has at least 20

points in it, the cell side length is 50 mm.

3.1 Datasets with range cameras

As seen in figure 3.1, the Jacobs Robotics “Rugbot” is equipped with a stereo camera (see section 2.2)

and a SwissRanger time-of-flight camera (see section 2.3). We operated the Rugbot in different surroundings,

recording data with both range cameras simultaneously. As the time to acquire each scan

is very short, there is no need to stop the robot to take scans.

3.1.1 Arena

The Jacobs Robotics Group has a dedicated training arena for Urban Search and Rescue (USAR). In

this arena, we recorded a dataset with the Rugbot. We chose scenes that pose challenges to either the

SwissRanger (bad reflectivity for near-infrared light) or to the stereo camera (few visible features).

The Jacobs Robotics Rescue Arena can be seen in figure 3.4. The photos show mostly the area where

the dataset was recorded. One example of a point cloud can be found in figure 3.3.

3.1.2 Outdoor 1-3

On three occasions, we took the Rugbot out on campus. We recorded datasets consisting of the

features found there, e.g. concrete and grass surfaces, small hills, building exteriors, and shrubbery.

Photos of these features can be found in figure 3.5, the associated range data is depicted in figure 3.6.

Outdoor 1 was recorded in relatively open surroundings, Outdoor 2 and 3 were recorded closer to

buildings. This means that in Outdoor 1, a lower ratio of valid points could be achieved with the

SwissRanger.

Datasets

Figure 3.4: Photos from the Arena dataset

(a) Grass (b) Hill (c) Bush

Figure 3.5: Photos of the different types of scenes encountered in the Outdoor 1 dataset. The photos

are taken by a webcam which is mounted right next to the 3D sensors.

3.1.3 Planar/Round/Holes

The Planar/Round/Holes dataset was collected with a stand-alone SwissRanger. Scenes have been

chosen or even set up to present one or at most a few objects with certain characteristics. These are

• dominating planar surfaces,

• dominating spherical, conical, or cylindrical surfaces, or

• non-convex planar surfaces or planar surfaces with holes in them.

Consequently, the point clouds’ bounding box is even smaller than necessitated by the SwissRanger’s

small range. Representative range images can be found in figure 3.2.

3.2 Datasets with actuated LRF

In addition to the Rugbot used in the above experiments, we have one whose only 3D sensor is the

ALRF presented in section 2.1. We used to record datasets in different settings. The pitching angle

goes from −90 ◦ to +90 ◦ on all datasets, but the step size differs. While the potential maximal size

of the sample depends on the step size, the actual size also depends on how many points are invalid

because they were not reflected due to poor reflectivity, none at all (sky), or were excluded because

they hit the actuating mechanism.

3.2 Datasets with actuated LRF

(a) Grass: SwissRanger range image.

(b) Grass: Perspective view of point cloud returned

by the stereo camera.

(d) Hill: Perspective view of point cloud returned

by the stereo camera.

(e) Bush: SwissRanger range image.

(f) Bush: Perspective view of point cloud returned

by the stereo camera.

Figure 3.6: The data returned by the SwissRanger (left) and the stereo camera (right) for the scenes

from the Outdoor 1 dataset. Photos in figure 3.5.

Datasets

3.2.1 Lab

The first dataset was recorded indoor: a circuit around our lab. It consists of 29 point clouds recorded

with an actuated LRF. The step size of the pitching servo was 0.5 ◦ . This gives a 3D point-cloud of

potentially 541 × 361 = 195, 301 points per sample. Photos of the site can be seen in figure 3.7. A

range image from the ARLF can be found in figure 3.8.

Figure 3.7: Photos of the robotics lab with a locomotion test arena in form of a high bay rack.

3.2.2 Crashed Car Park

Two datasets were recorded in the “Disaster City” USAR training site in Texas. The first of these was

collected outdoors at a partially collapsed car park. The data was collected in front of the building

and in the still partially accessible ground floor. It consists of 35 point clouds, also collected with the

actuated LRF. The step size of the pitching servo was 0.5 ◦ resulting in a 3D point-cloud of potentially

541 × 361 = 195, 301 points per sample. Photos can be found in figure 3.9.

3.2.3 Dwelling

The second dataset from Disaster City was recorded partially outdoors and partially indoors. The site

was a house with an attached car-port used by a boat. It consists of 96 ALRF point clouds. The step

size of the pitching servo was 0.5 ◦ resulting in a 3D point-cloud of potentially 541 × 91 = 49, 231

points per sample.

3.2.4 Hannover ’09 Hall

Two further ALRF datasets were recorded at RoboCup German Open ’09 which was co-located with

Hannover Fair. The RoboCup is a competition for robots divided into different leagues. In 2009,

the Jacobs Robotics Group competed in the Rescue League which simulates an Urban Search and

Rescue (USAR) scenario. Both datasets gathered in Hannover are characterized by not only showing

common obstacles close to the ground, but also the unusually high ceiling (c. 12.5 m). The first

Hannover dataset consists of 78 point clouds from a circuit of a fair hall. The step size of the pitching

3.2 Datasets with actuated LRF

Figure 3.8: ALRF RANGE IMAGE FROM THE LAB DATASET – Each 2D LRF scan corresponds to one

row in the image. Since the FOV of the LRF is greater than 180°, the left and right edges of the image

show parts of the environment behind the sensor upside down. The top left and right corners have no

information because they correspond to beams reflected from the actuation mechanism.

Figure 3.9: The Crashed Car Park in Disaster City in Texas where one of the datasets used in the

experiments was recorded.

Datasets

(a) Outside view of a house with attached car-port: 35784 points

(b) Entering the building: 41350 points

Figure 3.10: Perspective view of two point clouds from the Dwelling dataset from Disaster City

3.3 Dataset with sonar: Lesumsperrwerk

Figure 3.11: The Hannover ’09 Hall

servo was 1 ◦ resulting in a 3D point-cloud of potentially 541 × 181 = 97, 921 points per sample. A

photographic overview of the site can be found in figure 3.11.

3.2.5 Hannover ’09 Arena

The second Hannover dataset was recorded in the RoboCup Rescue Arena while the Rugbot moved

through it. The arena consists of 1.20 m×1.20 m elements that are bounded by 1.20 m walls on up to

three sides while being open at the top. At roughly 5 m above ground, there was a frame above the

arena for lights and cameras. It can be seen alongside the very high ceiling. The dataset consists of 79

point clouds. The step size of the pitching servo was 1 ◦ resulting in a 3D point-cloud of potentially

541 × 181 = 97, 921 points per sample. One point cloud is illustrated in figure 3.12.

3.3 Dataset with sonar: Lesumsperrwerk

The last dataset is an underwater set collected at the “Lesumsperrwerk”, a flood gate with a lock

near our university (53° 9’ 36” N, 8° 38’ 49” E, http://www.lesumsperrwerk.de/). It was

gathered with the Eclipse stand-alone 3D sonar (presented in section 2.4) operated from a wharf. The

point clouds show the flood gates and the entrance to the lock, but not the banks or the ground of the

river, nor any vessels. We operated the Eclipse at a maximum range of 120 m. Photos of the site can

be found in figure 3.13. Two views of a point cloud are in figure 3.14.

3.4 Summary

An overview over all datasets can be found in tables 3.1 and 3.2. Table 3.1 presents statistics on

the number of points in the datasets. It can be seen that the SwissRanger returns a high ratio of

valid points, except when working outside in the open (Outdoor 1). The ALRF achieves a good ratio

of 62-83%. The stereo delivers a low ratio of valid points. This is due to the fact that it relies on

detecting features, so featureless area will be completely discarded, except for their edges. In the

worst case (Outdoor 1), the STOC only returned 4.3% valid points. The remaining ∼13,000 points

are not enough for most applications, especially since many of these are erroneous. We could not give

a ratio for the sonar since it returns various numbers of points per beam, so there is no fixed maximum

number of points.

Table 3.2 shows the spatial properties of the gathered data. On can see that the stereo camera delivers

extremely spread out point clouds with very low density. This is due to the type of error occurring

in feature matching: imprecisely matched features smear very far away from the camera’s focal point.

The SwissRanger delivers the densest points, followed by the ALRF and the sonar. In SR and ALRF,

Datasets

(a) Bottom up view showing arena, frame, and ceiling

(b) Top down view of arena, frame visible on top, ceiling out of view

Figure 3.12: TWO VIEWS OF A POINT CLOUD CONTAINING 78528 POINTS FROM THE HANNOVER

’09 ARENA DATASET – The dataset is characterized by the unusually high ceiling (ca. 12.5 m). There

is also a frame for lighting fixtures.

3.4 Summary

Figure 3.13: An overview of the Lesumsperrwerk as seen from the river’s surface.

(a) Front view

(b) Top view

Figure 3.14: TWO VIEWS OF A SONAR POINT CLOUD FROM THE LESUMSPERRWERK DATASET –

In the left and center, three blocks are visible which are between the individual gates. On the right,

one can see the entrace to the lock. In the front view, the individual scan lines are apparent. In the top

view, one can see multiple echos from the obstacles.

Datasets

Table 3.1: NUMBER OF POINTS IN THE DATASETS – RMSE=root mean square error, VP=valid points,

∅=mean

Dataset Sensor #PC Σ#points ∅#points RMSE #points ratio VP

Arena SR 453 10657341 23526.139 2347.125 0.928

Arena STOC 341 7380368 21643.308 7991.060 0.070

Outdoor 1 SR 733 11174009 15244.214 6124.177 0.601

Outdoor 1 STOC 104 3343059 32144.798 20600.686 0.105

Outdoor 2 SR 238 5910801 24835.298 1787.522 0.980

Outdoor 2 STOC 237 3123425 13179.008 6795.924 0.043

Outdoor 3 SR 6457 160074360 24790.825 1238.833 0.978

Outdoor 3 STOC 365 16165611 44289.345 30140.943 0.144

PRH SR 75 1805347 24071.293 3994.622 0.950

CCP ALRF 35 4216448 120469.943 24045.993 0.617

Dwelling ALRF 96 3753320 39097.083 2987.606 0.794

H. Arena ALRF 79 5921144 74951.190 6164.248 0.765

H. Hall ALRF 78 4968246 63695.462 6047.772 0.650

Lab ALRF 29 4706218 162283.379 1454.341 0.831

Lesum Sonar 18 3237337 179852.056 26547.651 –

one can see how the density relates to how confined the environment was. In the SwissRanger, it

ranges from very confined (Arena, 2307 points/m 3 ) to open (Outdoor 1, 161 points/m 3 ). The ALRF

has been used in open (Hannover Arena and Hall), semi-open (Crashed Car Park and Dwelling) and

confined (Lab) environments.

The ratio between median and maximum Euclidean distance from a point to the origin is a measure

of how evenly distributed the points are in the area they cover. This value is highest for SwissRanger

and sonar. For the SwissRanger, this can be attributed to its low range. For the sonar, on the other

hand, it is a property of the dataset: all points lie in one relatively coherent cluster. Also, it does not

see the floor and there are no other short range objects visible. The ratio gives a good illustration of

the difference between the Hannover datasets. While both were recorded at the same venue, the arena

is very confined – the nearest wall is rarely more than one meter away. Yet, the high ceiling is still

visible, making the covered volume almost as high as that of the Hall dataset.

3.4 Summary

Table 3.2: SPATIAL PROPERTIES OF THE DATASETS – All averages (denoted by ∅) are means. Volume

and dimensions (depth×width×height) refer to the bounding box of all points. Each dimension

was averaged independently. The maximum Euclidean range is given as r, the average ratio between

median and maximum Euclidean range as ∅˜r /max r.

Dataset Sensor vol. [m 3 ] dens. [m −3 ] ∅ d [m] ×∅ w [m] ×∅ h [m] ∅ max r [m] ∅˜r /max r

Arena SR 10.199 2.307 2.456×1.644×1.253 3.111 0.597

Arena STOC 21755493.302 0.000995 141.677×99.077×300.423 337.192 0.0914

Outdoor 1 SR 94.548 161 6.672×4.509×2.692 6.932 0.192

Outdoor 1 STOC 45727990.220 0.000703 344.375×140.337×604.775 688.080 0.0457

Outdoor 2 SR 74.545 333 5.396×4.292×2.115 5.802 0.346

Outdoor 2 STOC 102076615.894 0.000129 504.941×250.386×669.230 765.292 0.0103

Outdoor 3 SR 70.536 351 6.347×4.063×2.112 6.706 0.158

Outdoor 3 STOC 92873676.283 0.000477 302.231×307.474×609.771 691.367 0.0369

PRH SR 28.044 858 2.814×2.685×2.000 4.114 0.572

CCP ALRF 9940.652 12.1 28.799×37.422×8.705 22.780 0.0703

Dwelling ALRF 1100.680 35.5 10.822×14.029×4.491 11.962 0.139

H. Arena ALRF 22026.424 3.4 33.635×42.183×14.878 28.698 0.039

H. Hall ALRF 23237.480 2.74 38.320×45.818×13.163 29.114 0.102

Lab ALRF 343.592 472 7.835×7.277×5.450 8.178 0.111

Lesum Sonar 1061865.304 0.169 175.580×103.155×58.643 117.874 0.661

64

Datasets

Chapter 4

Near Field 3D Navigation with the Hough

Transform

In this chapter, we present a solution to obstacle avoidance. The term designates reactive behavior

that does not use internal state like a map. Obstacle avoidance is useful both to assist a human robot

operator and as a low-level component of full autonomy. In autonomous operation, the speed of

obstacle recognition and avoidance limits the speed the robot can drive without risking collisions or

getting stuck.

In environments for humans, a robot can assume a flat floor and obstacles that are mostly detectable

with a 2D range sensor. In unstructured environments it must be prepared not only for obstacles

out of scope of the 2D sensors, the ground it is facing might not even be drivable.

As outlined in chapter 2, with the STOC stereo camera and the SwissRanger time-of-flight camera,

two high-frequency 3D sensors are available. In the following, we will propose an algorithm that

allows a robot to make use of the potential that these sensor’s high speeds offer: High speed 3D data

based obstacle avoidance.

4.1 Approach and Implementation

4.1.1 The Hough transform

The Hough transform is a feature detection heuristic, originally introduced for lines. It has been

popularized and generalized by [Ballard, 1981] to all parameterized shapes like circles, squares and

the like. Though it is typically used for 2D images, it can be extended to any dimension due to its

general nature. In this article, it is used to detect infinite planes in a 3D point cloud returned by 3D

range sensors.

Generally, the Hough transform can be used to detect shapes in points clouds or raster data. Ideally,

these shapes are described by as few parameters as possible. Originally, 2D lines were used,

which can be defined by their angle with the x-axis and the distance to the origin. A point in the

parameter space then represents one of the shapes being searched for. Each point in the input data set

then can be used as a vote for all parameter combinations of the shapes that pass through it.

The parameter space is discretized by dividing it into equally sized bins. For each point p in

the input dataset, counts in the bins corresponding to parameters of shapes passing through p are

incremented. In the case of the presence of a particular shape in the input data, the bin corresponding

to its parameters has a high count or so-called hits of input points voting for it.

Near Field 3D Navigation with the Hough Transform

In the following, the general form of the algorithm for the detection of a geometric shape with m

parameters in n-dimensional space is shortly recapitulated. The input data is given as a point cloud

P C, i.e., a set of points in n-dimensional Cartesian space. The general Hough transform as illustrated

in algorithm 1 iterates over all points p in P C and increments for each all bins that belong to parameter

values of shapes passing through p. In doing so, the parameter v m is quantized, i.e., it is calculated

based on the values for all the other m − 1 parameters and a point from the input data.

Algorithm 1 GENERAL FORM OF THE HOUGH TRANSFORM ALGORITHM – Searching for an m-

parameter shape in an n-dimensional point cloud P C. The m-dimensional parameter space is discretized

as array P S.

for all point p ∈ P C do

for all values v 1 for parameter P 1 do

.

for all values v m−1 for parameter P m−1 do

v m ← calculateV m (p, v 1 , . . . , v m−1 )

P S[v 1 ] . . . [v m ] + +

end for

.

end for

4.1.2 Plane Parameterization

Algorithm 2 HOUGH TRANSFORM APPLIED FOR PLANE DETECTION – Searching for a plane with

its three parameters in a 3D point cloud P C. The 3D parameter space is discretized as array P S.

for all point p ∈ P C do

for all angles ρ x do

for all angles ρ y do

n ← (−sin(ρ x )cos(ρ y ), −cos(ρ x )sin(ρ y ), cos(ρ x )cos(ρ y )) T

d ← np

P S[ρ x ][ρ y ][d] + +

end for

Here the shapes for the Hough transform are planes. The axes are chosen as shown in figure 4.1.

The planes are characterized by the following three parameters:

Two angles: As shown in figure 4.1, the first angle ρ x is the angle of intersection of the given plane

with the xz-plane with the x-axis. The second angle ρ y is the angle of intersection of the given

plane with the yz-plane with the y-axis.

Signed distance to the origin: The signed distance d has the same absolute value as the distance and

the sign is the same as that of the z-axis intercept of the plane

4.1 Approach and Implementation

z

Up

Front

x

ρx

O

ρ y

y

Figure 4.1: THE DEFINITION FOR THE ANGLES ρ x AND ρ y – Figure taken from

[Poppinga et al., 2008a].

So, given a point from the point cloud returned by one of the sensors, and two angles, it is possible

to compute the signed distance to the origin. In algorithm 2, the distance is used accordingly as the

quantized parameter. In other words, the signed distance d corresponds to the quantized parameter

v m in algorithm 1. The parameterization produces a singularity: when one angle is 90°, the other one

is undefined. As we will see below, this is not a problem in our application domain, since we do not

need to consider angles that high.

There are other ways to parameterize a plane, for example the normal vector and the distance to

the origin or a unit quaternion. We chose the parameterization with two angles and the signed distance

because it has as many parameters as degrees of freedom. Consequently, all parameters can be chosen

independently of one another within their respective ranges. We also found this parameterization to

be most intuitive, especially when one ρ D = 0 (with D ∈ {x, y}).

Ground classification

The basic idea for the ground classification is simple: important classes of terrain are characterized

by one plane. If the robot is facing no obstacles and an even terrain, a plane representing the ground

should be detectable. This ground plane has parameters that correspond to the pose of the sensor

relative to the floor. The corresponding bin of the parameter space should have significantly more hits

than any other bin.

Similarly, if the robot faces a ramp, an elevated plateau, or if it is standing at an edge to lower

ground, the corresponding planes should be easily detectable. If no single plane is detected, it can

be presumed that the robot is facing a kind of non-drivable terrain. One option is to detect walls and

other obstacles perpendicular to the floor by looking at the according bins. An easier alternative used

here is to exclude the corresponding areas of the parameter space. So if points of the input data lie

on a wall or other obstacles perpendicular to the floor, their votes are not stored as the corresponding

bins are not existent. The situation is then the same as for other non-drivable areas: no plane bin has

extra-ordinarily many hits.

In the following, five classes of terrain are used: floor, ramp, canyon, plateau, and obstacle.

For all except the last one, there is one characterizing plane. The ”floor” is flat horizontal ground

at the level of the robot. The ”ramp” is inclined terrain in front of the robot, i.e., when traversing

it only the robot’s pitch changes passively. The ”canyon” and the ”plateau” are even ground below

and respectively above the current position of the robot. The classes, especially ramp, canyon and

plateau can be further subdivided to take the physical capabilities of the robot into account. For our

Rugbot, plateaus and canyons with a step of more than 0.2m and ramps with a combined angle of

Near Field 3D Navigation with the Hough Transform

more than 35 ◦ are considered to be not passable. Other ramps, plateaus, and canyons are deemed to

be passable. There is of course the option to have a more refined distinction to adapt for example the

robot’s driving speed or to compute a cost function for path planning.

4.1.3 Processing of the Hough Space for Classification

Algorithm 3 THE GROUND CLASSIFICATION ALGORITHM – It uses three simple criteria organized

in a decision tree like manner. #S is the cardinality of S, bin max is the bin with most hits.

if #bin floor > t h · #PC then

return floor

else

if (#{bin | #bin > t m · #bin max } < t n ) and (#bin max > t p · #P C) then

return type( bin max ) ∈ {floor, plateau, canyon, ramp}

else

return obstacle

end if

The computation of the classification is based on three simple components. The main criterion

is which bin has the most hits. Furthermore, a threshold is used to detect obstacles, i.e., the case

when no bin has significantly more hits than the others. There are two thresholds t h and t p for this

purpose to accommodate different terrain types. As the number of points in the input data can vary,

the thresholds are relative to the cardinality of the processed point cloud. For perfect input data, these

two criteria are sufficient. But as the snapshots from the sensors tend to be very noisy a third heuristic

is used.

This heuristic tries to take the shape of the distribution of the hits into account. It estimates how

peaked the distribution is by determining the cardinality of {bin | #bin > t m · #bin max }, i.e., the set

of bins with more hits than a certain fraction t m of the number of hits of the top bin. This cardinality

is then compared to a threshold t n . The criteria are arranged in a decision tree like manner as shown

in algorithm 3.

The four parameters t h , t p , t m , and t n seem to be quite uncritical as discussed in more detail in

the section presenting experimental results. They have been determined using a few test cases and

performed very well on several thousand input snapshots from four large datasets recorded under

various environment conditions. The used values are: t m = 2 /3, t h = 1 /3, t n = 7, and t p = 1 /5.

The parameters of the discretization of the Hough space are minimum, maximum and resolution

on the three axes. The range on the x-axis (forward) is set to [ -45°, 18°] (positive values running

downhill), since the sensors only detect very few points on ramps that have a stronger downward

inclination than roughly 18°. The range of the y-axis is set to [ -45°, 45°]. A ramp with say 30° roll

might not be drivable from the position it was detected from, yet it can be an indication for the robot to

reposition itself to tackle the ramp. The range for the distance dimension is [ -1m, 2m]. The distance

resolution is set to 0.1m. All these parameters were chosen based on the capabilities and properties of

our robot.

The angular resolution of the discretization is the most interesting of the Hough space parameters.

This parameter influences the run time as well as the potential accuracy of the algorithm. Different

values were hence used for the detailed performance analysis presented in section 4.2.2.

4.2 Experiments and results

Table 4.1: Stereo and TOF point clouds in the four datasets used in the experiments.

dataset description point-clouds (PC) aver. points per PC

stereo camera

Arena inside, rescue arena 408 5058

Outdoor 1 outside, university campus 318 71744

Outdoor 3 outside, university campus 414 39762

TOF

Arena inside, rescue arena 449 23515

Outdoor 1 outside, university campus 470 16725

Outdoor 2 outside, university campus 203 25171

Outdoor 3 outside, university campus 5461 24790

4.2 Experiments and results

The approach was intensively tested with the Arena and Outdoor 1-3 real world datasets presented in

chapter 3. They contain more than 7,500 snapshots of range data gathered by the SwissRanger and

the STOC stereo camera (chapter 2) in a large variety of real world situations. The data includes indoor

and outdoor scenarios, different lighting conditions, open and cluttered environments and other

challenges. Due to limitations in the de-serialization, the datasets were involuntarily randomly subsampled

to c. 85% of the point clouds, hence the different number of points in table 4.1 and in table 3.1

on page 62. Also, the first and last point cloud in every scene were dropped to exclude potentially

erroneous data.

Note the large variation in the mean number of points per point cloud in the different datasets,

especially for the stereo camera. The stereo disparity computation relies heavily on the texture and

feature information in the picture. Outdoor photographs can be particularly rich in these features.

Refer for example to figures 3.6 (page 55) and 4.2. However, the ground in outdoor scenes and walls

in indoor scenes are often featureless and consequently the related data is often missing in stereo

snapshots as illustrated in figure 4.2.

Both the stereo camera and the SwissRanger indicate points in their range images where information

is missing. A preprocessing step is hence used, which estimates whether there are sufficiently

many meaningful data-points in a snapshot. Table 4.2 shows the amount of ruled out snapshots due

to insufficient data. The stereo camera data from the Outdoor 2 dataset was excluded completely due

to its very low ratio of valid points (0.043) and high number of outliers which are typically erroneous

points (∅˜r /max r=0.0103), cf. tables 3.1 and 3.2, pages 62 and 63. The total number of snapshots

used as actual input for classification is still about 6,800. Please note that despite the high amount

of excluded data, the stereo camera is useful as a supplementary sensor to the SwissRanger, which

tends to fail in scenarios with direct exposure to very bright sunlight, which does not affect the stereo

camera.

The four parameters t m = 2 /3, t h = 1 /3, t n = 7, and t p = 1 /5 were determined once based on the analysis

of a few example snapshots considered to be typical. The parameters were not altered during the

experiments. In the following subsection some examples of Hough spaces are presented to illustrate

the working principles of the classification.

Near Field 3D Navigation with the Hough Transform

(a) The webcam image of the scene.

(b) 3D point cloud returned by the stereo camera. The

ground does not show enough features, so it does not have

depth information.

Figure 4.2: An example scene where the stereo camera delivers few data points; especially ground

information is missing.

Table 4.2: PERCENTAGES OF SNAPSHOTS EXCLUDED FROM HOUGH TRANSFORM VIA PREPRO-

CESSING – Especially the stereo camera data suffers if there are few features in a scene (figure 4.2),

but it can complement the SwissRanger SR-3000, which can fail in different environment situations,

e.g., in scenes with direct exposure to very bright sunlight.

dataset point-clouds excluded data

stereo camera

Arena 408 92%

Outdoor 1 318 75%

Outdoor 3 414 70%

TOF

Arena 449 1%

Outdoor 1 470 2%

Outdoor 2 203 2%

Outdoor 3 5461 0%

4.2 Experiments and results

4.2.1 Hough space examples

The real world data is obviously subject to noise and additionally, planes in the world can lie in an

area in the parameter space just on the boundary between two bins. It can hence not be expected that

even a perfect plane will produce hits in a single bin only. In this subsection, the working principles

of our approach are illustrated by discussing a few typical examples. This is followed by a global

performance analysis based on the 6,800 snapshots in the next subsection 4.2.2.

For the discussion in this subsection, the parameter space of the Hough transform of a snapshot is

depicted as a two-dimensional histogram In figure 4.3, there are three histograms depicting the bins

of the parameter space for more or less typical snapshots. The origin of the histogram is in the top

left corner, the down-pointing axis contains the distances, the right-pointing axis contains both ρ x and

ρ y . This is accomplished by first fixing ρ y and iterating ρ x , then increasing ρ y and iterating ρ x again

and so on. This is depicted graphically in sub-figure 4.3(a). The bin which corresponds to the floor is

indicated by a yellow frame.

The magnitude of the bins is represented by shade, where white corresponds to all magnitudes

above the ration t m of the maximum magnitude, t m = 2 /3. All the other shades are scaled uniformly

and thus the different histograms are better comparable in contrast to a scheme where just the top bin

is white. The following discussion motivates why a threshold on only the absolute number of hits is

not sufficient and the shapes of the distribution has to be also taken into account.

In the floor histograms, a single bin sticks out, even with some slightly obstructed ground. But it

can be noticed that the histogram of the ramp differs from the floor-histograms. First of all, the bins

with many hits are concentrated on the right side of the histogram. This is obviously the case because

planes with a greater inclination get more hits, proportionally to their similarity to the actual ramp.

The second difference is that the strip of significantly filled bins is lower. This is because the ramp is

at a greater distance to the camera than the floor.

In the histogram of the random step field, the bins with a relatively high magnitude are most

equally distributed among all the three histograms. In the ramp-histogram, there all also a lot of bins

represented in white, but this is because the assignment of shade to magnitude is done in an absolute

way. So in the ramp-histogram, bins with relatively medium magnitude are painted in white. In the

random-step-field-histogram, there are 14 white bins, in the ramp-histogram there are 35.

Common to all diagrams is that the areas with the higher magnitudes are distributed in a diagonal

shape. This is caused by the following. Suppose there is just a lot of points from the floor. The planes

with most points in them are very similar to the floor. For a plane which is not very similar to the floor

to encompass many points, the best way is to intersect the floor plane in an angle as small as possible.

With growing distance to the floor, the angle has to become greater in order for the plane to intersect

the floor plane in the area of the data points. In the region of the parameter space where the angles are

just big enough, the values are the highest for every distance.

For the ramp, the parameter space is a warped version of the one for the floor, because the region

of the data points is smaller, so there are fewer possibilities to intersect with it. This is also the reason

why the region of non-zero bins is so small in this parameter space. Furthermore, the area of the

highest overall magnitude is higher (on the ρ x -axis), due to the ramp being tilted.

4.2.2 General Performance Analysis

The results presented in this subsection are based on those approximately 6,800 snapshots from the

four datasets recorded in different environment conditions which had meaningful range data. The data

was labeled by a human to provide ground truth references to measure the accuracy of the classifica-

Near Field 3D Navigation with the Hough Transform

(a) Layout of the bins in the depictions of parameter spaces below

(b) Plain floor

(d) Ramp

(e) Random step field

Figure 4.3: TWO-DIMENSIONAL DEPICTIONS OF THE THREE DIMENSIONAL PARAMETER

(HOUGH-)SPACE FOR SEVERAL EXAMPLE SNAPSHOTS – Distances are on the y-axis, angles are

on the x-axis, where ρ y iterates just once and ρ x iterates repeatedly. Hits in the bins are represented

by gray-scale, the darker the less hits.

Table 4.3: Success rates and computation times for drivability detection.

dataset success rate false negative false positive time [ms]

stereo camera

Arena 1.000 0.000 0.000 4

Outdoor 1 0.987 0.00 0.013 53

Outdoor 3 0.977 0.016 0.007 29

TOF

Arena 0.831 0.169 0.000 11

Outdoor 1 1.000 0.000 0.000 8

Outdoor 2 1.000 0.000 0.000 12

Outdoor 3 0.830 0.031 0.139 12

4.2 Experiments and results

55

50

Outdoor 1/STOC

time [ms]

45

40

35

30

25

20

Outdoor 3/STOC

15

10

Arena/SR

Outdoor 2/SR

Outdoor 3/SR

Outdoor 1/SR

5

Arena/STOC

0

0 10000 20000 30000 40000 50000 60000 70000 80000

# points

Figure 4.4: MEAN PROCESSING TIME AND CARDINALITY OF POINT CLOUDS – The mean time per

classification directly depends on the mean number of 3D points per snapshot in each dataset.

Near Field 3D Navigation with the Hough Transform

Table 4.4: Human generated ground truth labels for the stereo camera data of the different scenes

dataset scene description human # of aver. # points

set label PC per PC

Arena 1 lab floor with black plastic floor 32 5058

2 bush1 very near obstacle 30 22151

3 bush2 very near obstacle 1 71646

4 lawn floor 2 11367

Outdoor 1 5 hill with grass ramp 47 107173

6 tree1 very near obstacle 1 15267

7 grass, background sky floor 1 32686

8 tree2 very near obstacle 30 27139

9 tree3 very near obstacle 41 25141

10 railing very near obstacle 2 77342

Outdoor 3 11 concrete slope ramp 27 10770

12 wall very near obstacle 23 113368

tion. As it would be very tedious to label the several thousand snapshots one by one, the four datasets

were broken down into ”scene sets”. Each scene set consists of some larger sequence of snapshots

from one sensor for which the same label can be applied. The properties of the different scenes are

discussed later on in the context of fine grain classification.

The first and foremost result is with respect to coarse classification, namely the binary distinction

between drivable and non-drivable terrain. For this purpose an angular resolution of 45° is used,

i. e. only two bins per axis. As shown in table 4.3, the approach can robustly detect drivable ground

in a very fast manner. The success rates of classifying the input data correctly, i.e., of assigning

the same label as in the human generated ground truth data, range between 83% and 100%. Cases

where the algorithm classifies terrain as non-drivable in opposite to the ground truth data label are

counted as false negatives; when drivability is detected though the ground truth data label points to

non-drivability, a false positive is counted.

As mentioned before, the stereo camera has the drawback that it does not deliver data in featureless

environments, but it allows an almost perfect classification ranging between 98% and 100%. The

SwissRanger has a very low percentage of excluded data (see table 4.2), but it behaves poorly in

strongly sunlit situations. In these cases, the snapshot is significantly distorted but not automatically

recognizable as such. Such situations occurred during the recording of the outdoor data of the Arena

and Outdoor 3 datasets causing the lower success rates for the SwissRanger in these two cases. The

two sensors can hence supplement each other very well.

The processing can be done very fast, namely in the order of 5 to 50 ms. Please note that these

run times are for the full approach, i.e., the run times include the preprocessing (which is always

successful for this data) as well as the computation of the Hough transformation and the execution of

the decision tree. As illustrated in figure 4.4, the variations in processing time are due to the variations

in the number of points per snapshot.

Any standard 2D obstacle sensor would have done significantly worse in the test scenarios. It

would especially have failed to detect perpendicular obstacles as well as rubble on the floor. The approach

can hence serve as serious alternative to 2D obstacle detection for motion control and mapping

in real world environments.

The next question is of course to what extent the approach is capable of more fine grain classifi-

4.2 Experiments and results

Table 4.5: Human generated ground truth labels for the SwissRanger data of the different scenes

dataset scene description human # of aver. # points

set label PC per PC

13 lab floor, background barrels floor 55 23484

14 lab floor with black plastic floor 33 23256

Arena 15 boxes very near obstacle 43 25344

16 lab floor, dark cave floor 75 25344

17 wooden ramp with plastic cover ramp 113 19691

18 red-cloth hanging down obstacle 124 25344

19 bush1 very near obstacle 32 18121

20 bush2 very near obstacle 49 24604

21 car very near obstacle 44 17396

Outdoor 1 22 concrete ground1 floor 68 11372

23 lawn floor 70 13812

24 hill with grass and earth ramp 59 7235

25 concrete with rubble obstacle 90 20375

26 tree1 very near obstacle 50 23512

27 concrete ground2 floor 28 24726

Outdoor 2 28 grass, background far wall floor 35 25342

29 mix of grass and concrete floor 50 25004

30 grass, background close wall floor 86 25344

31 grass, background sky floor 13 25343

32 grass, background building1 floor 233 25343

33 grass, background building2 floor 892 25250

34 stone ramp ramp 150 25287

35 hill1 ramp 169 19853

Outdoor 3 36 tree2 very near obstacle 728 25344

37 tree3 very near obstacle 904 22625

38 railing very near obstacle 616 25344

39 grass, background building3 floor 538 25343

40 concrete slope ramp 548 25202

41 grass hill ramp 987 25343

42 wall very near obstacle 655 25344

Near Field 3D Navigation with the Hough Transform

Table 4.6: CLASSIFICATION RATES AND RUN TIMES FOR STEREO CAMERA DATA PROCESSED AT

DIFFERENT ANGULAR RESOLUTIONS OF THE HOUGH SPACE – Though drivability can be robustly

detected with stereo, finer classification performs rather badly with this sensor.

45° 15° 9°

dataset scene human class. time class. time class. time

set label rate [ms] rate [ms] rate [ms]

Arena 1 floor 1.00 4 0.00 20 0.00 51

2 obstacle 1.00 16 1.00 92 1.00 232

3 obstacle 1.00 60 1.00 320 1.00 850

4 floor 1.00 15 1.00 60 1.00 160

Outdoor 1 5 ramp 0.00 78 0.00 431 0.00 1089

6 obstacle 0.00 10 0.00 50 0.00 130

7 floor 0.00 1 0.00 30 0.00 70

8 obstacle 1.00 22 1.00 109 1.00 279

9 obstacle 1.00 19 1.00 100 1.00 255

10 obstacle 0.00 35 0.50 180 0.50 455

Outdoor 3 11 ramp 0.00 8 0.00 42 0.00 109

12 obstacle 1.00 84 1.00 457 1.00 1163

mean 0.58 29 0.54 158 0.54 404

cation of terrain types. Tables 4.4 and 4.5 show the different scene sets and their general properties.

Plateaus and canyons onto which our robot can drive, like a curb between a lawn patch and a concrete

ground, had to be included in the label ”floor” to make the manual labeling of the data feasible.

It can be expected that the angular resolution of the Hough space discretization has an influence

on the finer classification. Hence, experiments with 9°, 15°, and 45° are conducted. Table 4.6 shows

the results for the stereo camera data. It can be noticed that the success rates of 54% to 58% are rather

poor, especially when compared to drivability detection where 98% to 100% are achieved. The main

reason seems to be the high rate of noise in the stereo data. This is also supported by the fact that the

classification rates are hardly influenced by the angular resolution of the discretization of the Hough

space.

The situation is a bit different for more fine grain classification with the SwissRanger, which

provides less noisy data. As shown in table 4.7, a higher angular resolution allows for this sensor at

least some more fine grain classification with success rates of up to 70%. But the limitations in more

fine grain classification seem also for the SwissRanger to lie in the noise level of the data.

It is expected that the high quality data from a 3D laser scanner would allow a very fine grain

classification with the presented approach. Pursuing experiments are left for future work. This type

of classification could for example be used for detailed semantic map annotation. But data acquisition

with a 3D laser scanner is very slow, namely in the order of several seconds per single snapshot. This

means that the robot has to be stopped and that the updates rates are very low.

A stereo camera or a SwissRanger in contrast allow very fast data acquisition on a moving robot.

The detection of whether the robot can drive over the terrain covered by the sensor is extremely

fast and very robust with the presented approach. It is hence an interesting alternative for obstacle

detection in non-trivial environments, which can be used for reactive locomotion control as well as

mapping.

4.2 Experiments and results

Table 4.7: CLASSIFICATION RATES AND RUN TIMES FOR SWISSRANGER DATA PROCESSED AT

DIFFERENT ANGULAR RESOLUTIONS OF THE HOUGH SPACE – Unlike with the stereo camera, a

higher angular resolution improves finer classification.

45° 15° 9°

dataset scene human class. time class. time class. time

set label rate [ms] rate [ms] rate [ms]

13 floor 0.98 12 0.96 59 0.96 148

14 floor 1.00 12 0.97 60 0.97 148

Arena 15 obstacle 0.79 10 1.00 64 1.00 160

16 floor 0.00 13 0.00 64 0.00 161

17 ramp 0.00 10 0.00 49 0.30 124

18 obstacle 0.00 12 1.00 64 1.00 160

19 obstacle 0.00 9 0.00 44 0.00 113

20 obstacle 0.00 12 0.00 61 0.24 155

21 obstacle 0.00 8 0.00 44 0.50 109

Outdoor 1 22 floor 1.00 5 1.00 29 1.00 72

23 floor 1.00 8 1.00 34 1.00 88

24 ramp 0.00 3 0.22 18 0.00 45

25 obstacle 1.00 10 1.00 51 1.00 128

26 obstacle 1.00 11 1.00 60 1.00 146

27 floor 1.00 13 1.00 64 1.00 160

Outdoor 2 28 floor 1.00 13 1.00 65 1.00 158

29 floor 1.00 13 1.00 65 0.98 162

30 floor 1.00 11 1.00 63 1.00 160

31 floor 1.00 10 1.00 62 1.00 158

32 floor 1.00 12 0.00 64 0.00 160

33 floor 0.00 12 0.00 64 0.00 160

34 ramp 0.00 14 0.00 64 1.00 160

35 ramp 0.00 10 0.00 51 0.00 128

Outdoor 3 36 obstacle 0.00 12 1.00 63 1.00 160

37 obstacle 0.00 11 1.00 57 1.00 143

38 obstacle 0.00 13 1.00 64 1.00 169

39 floor 1.00 13 0.00 65 0.00 163

40 ramp 0.00 12 0.00 66 0.00 160

41 ramp 0.00 12 0.00 61 0.00 161

42 obstacle 1.00 12 1.00 63 1.00 157

mean 0.55 12 0.64 63 0.70 158

78

Near Field 3D Navigation with the Hough Transform

Chapter 5

Patch Map Data-Structure

This chapter describes the 3D mapping data structure conceived for this work. It gives an overview of

its different variants implemented for this thesis. They differ in the used algorithm and hence in their

capabilities.

5.1 Mapping with planar patches

Planar patches are two-dimensional polygons (possibly with holes) on infinite planes. They result

from applying a plane extraction algorithm on a 3D point cloud. In this work, [Poppinga et al., 2008b]

is used, but any algorithm returning an set of planes, each with a set of associated points (e.g.

[Okada et al., 2001]) will work.

The patches’ polygons’ are defined by their 2D vertices V =v 0 , . . . , v n , by the 3D position of the

plane they lie on and by the position and orientation of a 2D coordinate system on that plane. We

define the latter two by an affine transform T consisting of a rotational and a translational part. It is

applied to the two-dimensional polygons assumed to be in the xy-plane. Using a complete transform

looks redundant at first glance since a plane can be transformed into any other plane in infinitely many

ways (it is invariant with respect to rotation around the normal and to translation within itself). One

potential alternative would be to use a standard plane definition (e.g. plane normal n and distance to

the origin d ) and use a canonical 2D coordinate system on the planes. This would reduce the amount

of data that needs to be stored. However, this brings about a number of problems: Polygons do not

necessarily stay in their initial position. Typically they are in sensor coordinates initially. They might

be transformed to local robot coordinates with a transform T s→l and then again to global coordinates

with a transform T l→g . With a canonical 2D coordinate system, each vertex in V would have to be

transformed whenever a planar patch is transform within its plane. It is quite likely that this is part

of either T s→l or T l→g . Also, there are necessarily discontinuities when defining the orientation of

the 2D coordinate system. (As an example, one 2D axis might be defined as the cross product of

n and a fixed base vector b, e.g. one of the 3D axes. When n is similar to b, the 2D axis is not

properly defined.) This would mean in practice that the vertices would have to be modified in each

transform. Using a full transform, the polygon can stay the same and only the plane transform needs

to be modified.

The two approaches to defining a 2D coordinate system on the plane are equivalent with respect to

outline merging. Outline merging is the combination of two outlines into one, typically as the union

of the two polygons. It is desirable if the same real-world surface appears in two subsequent scans.

After initial processing, it will be represented by two overlapping planar patches in the map. These

Patch Map Data-Structure

polygon in 3D

z

x

y

polygon in xy-plane

Figure 5.1: APPLYING A 3D TRANSFORM TO A 2D POLYGON – The transform is represented in

grey; the transform’s translation as a grey arrow, the transform’s rotation axis as a grey cylinder.

two should be merged to further losslessly simplify the data and to represent the environment more

truthfully. Two polygons can only be joined if they lie on the same plane. In practice, two polygons

that are to be joined will not lie on identical, but only on very similar planes. In order to meaningfully

join the polygons, they to be transformed to lie in the same plane. Both with a full transform and with

a canonical coordinate system, this means that all vertices have to be modified. In the case of the full

transform, even if the two planes can be transformed to be identical, the 2D coordinate system will

most likely not be. So, at least one polygon’s vertices have to be transformed to the 2D coordinate

system of the other 1 . In the case of a canonical coordinate system, the vertices have to be modified

with high probability during the transform unifying the planes.

Another advantage of using a full transform is that this is the convention used to note the robot’s

global pose and often also for the sensor’s pose relative to the robot’s reference frame. When an

outline is to be transformed to the global reference frame, it is simply a matter of concatenating three

transforms.

The reason for using any kind of combined 2D/3D definition as opposed to a pure 3D polygon

is first that basic operations on polygons are useful in this context, e.g. joining, simplification, or

growing. These could not be defined as easily on truly three-dimensional polygons. The other reason

is cleanliness: In our approach, all polygon vertices will be in the same plane by definition. In a

polygon with three-dimensional vertices, this is not guaranteed. And even if one attempted to, they

could never be, due to numerical limitations. Our solution avoids having to remedy the resulting

errors.

The advantages of using planar patches include simplification, entailing reduction in memory size,

and more economical processing. In a largely man-made environment, many surfaces are planar, especially

the larger ones (e.g. a potted plant is not, but the much larger wall and floor are). Representing

1 This seeming advantage vanishes once we consider that in practice, often more than two outlines are merged. In the

case of n outlines, the vertices of n − 1 have to be modified.

5.2 Surface Patches

Data set

Patch storage method

Plane + outl. Triangulated outline Triangulated region

size [B] size [B] increase size [B] increase

Lab 151 280 1.94406 3913 25.2947

Car Park 207 445 2.225 3062 15.8319

Dwelling 199 444 2.17544 3191 18.0892

Hannover ’09 Arena 191 398 2.11261 4600 23.8958

Hannover ’09 Hall 222 492 2.19919 2902 14.253

Table 5.1: A COMPARISON OF THE SIZES OF DIFFERENT STORAGE METHODS FOR SURFACE

PATCHES – For a fair comparison, all methods describe the same original data points. The increase

given is the increase in size as compared to storing plane and outline. Both the size and the increase

given are the median of all surface patches in the data set. ’Triangulated region’ refers to a Delaunay

triangulation of the sub point cloud to which the plane was fitted.

planar surfaces with point clouds is redundant for most applications (for archaeological applications

it may be important to keep track of each of a wall’s tiny dents and bumps, for search and rescue,

surveillance, or exploration it is not). Storing the infinite plane and its perceived outline is much more

efficient. Table 5.1 compares the different sizes for different variants on a number of 3D LRF datasets.

These datasets are presented in detail in chapter 3.

In many robotics scenarios, a map is not only to be generated online, but paths are simultaneously

being planned on it. For this requirement, path planning using collision detection is a simple, effective,

and suitable solution. It is simple since relatively user-friendly, highly optimized collision detection

libraries are available for games and other virtual reality applications. It is suitable for dynamic

use, since it allows for inclusion of the polygons as they are, without having to calculate derivative

structures like a Voronoi diagram. It is effective since derivation of surfaces is only done once. When

using point clouds directly, there are two possible approaches: calculating an implicit surface or using

just the points for collision detection. In the first approach, the implicit surface has to be calculated on

each collision check [Klein and Zachmann, 2004, Ho et al., 2001] which adds processing overhead.

In the second, it has to be assumed that point clouds are dense enough not to let the collision object

pass between them. And even if that holds true, the collision detection will produce incorrect results

for most cases (non-cuboidal or non-aligned collision shape). This assumption is not needed when

planes are fitted into the point cloud.

5.2 Surface Patches

Surface Patches describe the surfaces of objects in an environment. They are typically derived from

range sensor data, exclusively so in this work. Two types of patches are used: planar patches and

trimesh patches.

5.2.1 Planar Patches

As discussed in section 5.1, the planes are stored as an affine transform on the xy-plane. CGAL is used

for polygons. CGAL (Computational Geometry Algorithms Library, http://www.cgal.org/)

is a peer-reviewed open-source library offering algorithms and data structures in several areas of geometry.

In this vein, CGAL offers a data structure for polygons with optional holes and also operations

Patch Map Data-Structure

on it (e.g. union and outlining), [Giezeman and Wesselink, 2008]. Other operations are easily defined

using standard algorithms (simplification with Douglas-Peucker [Douglas and Peucker, 1973] – the

Douglas-Peucker algorithm and its results are presented on page 97 and in algorithm 9) or CGAL

library functions and documentation (polygon growing, [Cacciola, 2008]).

5.2.2 Trimesh Patches

Trimesh patches are triangular meshes, possibly with additional information like color or texture.

Trimesh patches are used in two ways: Firstly, maps consisting completely of trimeshes are used

as a comparison to the proposed method of using planar patches. Alternatively, hybrid maps containing

planar patches for planar elements and trimesh patches for non-planar ones are also used for

comparison.

5.2.3 Other Patch Types

It is possible to extend the presented framework with other types of patches. These could be patches

defined on higher order surfaces instead of planes. By way of example, an umbrella can be roughly

described as a part of a sphere. A possible candidate for higher order surfaces are quadrics (a survey

of methods for detecting these is in [Petitjean, 2002]) or primitives of collision detection engines like

spheres, cones, and cylinders.

5.3 Generation of patches from point clouds

3D range sensors deliver data in a raw format (e.g. range image for range cameras or sets of 2D laser

scans for actuated laser range finders, ALRF), which is generally converted into a 3D point cloud, i.e.

a set of 3D points. Especially in the case of a range camera, the scanning order is usually retained such

that points in the point cloud correspond to the camera’s pixels e.g. row-wise. From point clouds, the

types of patches presented are derived with different levels of ease.

5.3.1 Trimesh Patches

Trimesh patches are included mainly to allow for a comparison of the patch map with traditional

methods. Therefore, the vast literature on trimesh simplification was not consulted.

Trimeshes are grown on each point cloud individually and in two phases. In the first phase, for

each pixel the optimal plane under least mean square criterion is computed. The triangles of the

triangulation induced by the scan geometry are classified in growable and un-growable. A triangle is

growable iff all points have similar optimal planes. Than trimeshes are grown from different starting

points using only growable triangles. Note that the algorithm is independent of the location of the

starting points (algorithm 4). It does rely on two thresholds θ ang for angular similarity and θ d for

distance similarity. The second phase grows trimeshes in all remaining points on closely located

points (algorithm 5). It uses the threshold θ s for maximum side-length of a triangle.

5.3.2 Planar Patches

Approximately planar regions in the point cloud are detected as described in [Poppinga et al., 2008b].

These regions are originally sub point clouds. Then, an outlining 2D polygon is derived. This 2D

polygon and the infinite plane in which it lies define the planar patch (which is a 3D entity).

5.3 Generation of patches from point clouds

Algorithm 4 THE FIRST PHASE OF TRIMESH GROWING – returns the set of trimeshes M; the basic

triangulation (esp. the neighborhood function for triangles N()) is defined by the grid structure of the

scan; Π ⊂ N × N: set of input pixels, p : Π ↦→ R 3 : pixel to point map, π ±1 (P ): eight-neighborhood

of pixel P

1: for P ∈ Π do

2: p c = 1 ∑

9 P ′ ∈π ±1 (P ) p(P ′ )

3: A = ∑ P ′ ∈π ±1 (P ) (p c − p(P ′ )(p c − p(P ′ )) T

4: n(P ) ← {eigenvalue of A corr. to smallest eigenvalue}

5: d(P ) ← n(P )p c

6: if d(P ) < 0 then

7: d(P ) ← −d(P ), n(P ) ← −n(P )

8: end if

9: end for

10: for P = (r, c) ∈ Π do

11: triangle T 0 = ((r, c), (r + 1, c), (r, c + 1))

12: trimesh m

13: queue Q.push(T 0 )

14: while not Q.empty() do

15: T = Q.pop()

16: if not used(T ) then

17: if max{acos( n(t 0)n(t 1 )

|n(t 0 )||n(t 1 )| )|t 0, t 1 ∈ T } < θ ang and max{|d(t 0 )−d(t 1 )||t 0 , t 1 ∈ T } < θ d

then

18: m.add(T )

19: used(T )←true

20: Q.push(N(T ))

21: end if

22: end if

23: end while

24: M ← M ∪ m

25: end for

Patch Map Data-Structure

Algorithm 5 THE SECOND PHASE OF TRIMESH GROWING – returns the set of trimeshes M; the basic

triangulation (esp. the neighborhood function for triangles N()) is defined by the grid structure of the

scan; Π ⊂ N × N: set of input pixels; p : Π ↦→ R 3 : pixel to point map; π ±1 (P ): eight-neighborhood

of pixel P ; τ 0 (T ), τ 1 (T ): the two sides of a triangle that only differ in one dimension in pixel space

1: for P = (r, c) ∈ Π do

2: triangle T 0 = ((r, c), (r + 1, c), (r, c + 1))

3: trimesh m

4: queue Q.push(T 0 )

5: while not Q.empty() do

6: T = Q.pop()

7: if not used(T ) then

8: if τ 0 (T ) < θ s and τ 1 (T ) < θ s then

9: m.add(T )

10: used(T )←true

11: Q.push(N(T ))

12: end if

13: end if

14: end while

15: M ← M ∪ m

16: end for

Projection for outlining

Plane fitting yields a sub point cloud and the infinite plane on which these points all lie (within an error

bound). So while these points are originally 3D, they can be considered points in a 2D coordinate

system in a specified pose in 3D. We would like to find a small polygon containing the 2D point set.

We call this process outlining. It uses the information that a relatively dense sub point cloud lies on a

plane for compression.

Outlining presents us with a fundamental choice: We can either outline the sub point cloud in

the pixel domain of the sensor and then project only the points of the outline onto the optimal plane

(late projection). Or we can project all points from the sub point cloud onto the plane and then find

an outline in the 2D floating point domain (early projection). Obviously, early projection brings the

burden of projecting a considerably larger number of points. Also the 2D grid containing a contiguous

region of pixels is computationally much easier to handle. However, late projection leads to other

problems as described below. In addition to being less obvious, they are less straight-forward to

handle.

There are two methods for projecting the 3D points onto the plane: orthogonally or along the

sensor beam 2 to which they correspond (projection along the beam, see figure 5.2). Both methods

come with drawbacks, most prominently when the plane they project to has an acute angle with the

beams the points being projected come from (figure 5.3). Orthogonal projection can change the order

of points such that neighboring pixels correspond to points that do not neighbor (i.e. they are not

visible to each other). This manifests itself in two problems for late projection. Firstly, points that lie

inside the online before projection may end up outside the constructed outline after late orthogonal

projection. The polygon would cease to be a proper outline of the sub point cloud. Secondly, the

projection may create self-intersections of the outline. Projection along the beam causes distortions

2 For range cameras, a beam in the center of the pixel frustum is assumed.

5.3 Generation of patches from point clouds

(a) Orthogonal projection

detected point

projected point

optimal plane

beam

(b) Legend

Figure 5.2: Different methods of projecting the point of a sub point cloud to the optimal plane fitted

to them

(a) Inconvenient behavior of orthogonal projection: order

of points changes

(b) Distortion by projection along the beam: size of regions

grows enormously

Figure 5.3: PROBLEMS WITH THE TWO TYPES OF PROJECTION – Gray bars are sensor beams, black

line is fitted plane, crosses are distance measurements, circles are points projected to the fitted plane

Patch Map Data-Structure

(a) Huge erroneous region due to projection along the

beam

(b) Same scene with orthogonal projection

Figure 5.4: Differences between orthogonal projection and projection along the beam

because the plane fitting which forms the sub point cloud uses orthogonal projection. Points which lie

close together when projected orthogonally can spread out when projected along the beam (figure 5.4,

also compare the projected point positions in figure 5.3(a) to those in figure 5.3(b)). This problem

haunts both early and late projection.

There is an additional problem case with late projection when used with ALRF. The straightforward

way to enter ALRF scans into a grid for processing is to enter the first LRF scan as the first

row, the second scan as the second row and so on. But, as opposed to range cameras, the plane on

which all points of one row lie typically extends to behind the origin. Imagine a laser scanner with

an opening angle of 240°. Let the beam going forward be at 0°. If the 0° beam points upward in the

first scan, all beams beyond 90° or -90° will point downward. In the last scan the converse is true.

Consequently, while the center of the top row describes the top of the scene, the ends will describe the

bottom of the scene (the reader is referred to figure 3.8 on page 57 for an ALRF range image). The

scan lines intersect at 90°and -90°. Now if a sub point clouds has points corresponding to pixels both

from the center columns of the image and from the left and right edges, after projection the outline

will self-intersect (see figure 5.5). This affects both orthogonal projection and projection along the

beam.

So in conclusion, early orthogonal projection is the only projection method that can be used without

having to solve additional problems.

Outlining

Once all points of an approximately planar sub point cloud have been projected to a 2D coordinate

system, 2D outlining algorithms can be used to find a descriptive polygon. Outlines are of interest

in their own right. They provide a compact representation of planar patches and they are used in

subsequent processing steps, most importantly patch maps. For display purposes however, be it via

X3D, in an OpenGL context or similar, planar regions have to be triangulated to be displayed. This

can either be done indirectly by first generating an outline and then triangulating that outline or by

directly generating a trimesh out of the point set. For this reason, we benchmarked several direct and

indirect triangulation algorithms on a variety of datasets, namely

5.3 Generation of patches from point clouds

(a) An ALRF scan consisting of 3

scan lines (color-coded), opening angle

240°, horizontal step 30°, ALRF

is depicted as gray box, coinciding

scan points at 90° and -90° have been

shifted for visibility.

(b) The ALRF scan inserted into a

grid. Suppose the three left-most

columns are identified as one region.

Its best outline solely based on the

grid is depicted.

back into 3D (an orthogonal

view of the plane they lie in is depicted).

Due to how the scan was entered

into the grid, an intersection occurs.

Figure 5.5: LATE PROJECTION CAN CAUSE INTERSECTIONS WHEN APPLIED TO ALRF DATA

• convex hull with Graham’s line scan, a standard algorithm (yielding the smallest simple polygon

containing all points of a point cloud), implementation by CGAL [Hert and Schirra, 2008],

• RI, line-fitting based on the plane-fitting presented in [Poppinga et al., 2008b], implemented by

Narunas Vaskevicius 3 ,

• naive triangulation based on the range image nature of the point clouds,

• scanLineV1,

• scanLineV2,

• α-shapes, and

• chain code.

Only chain code, α-shapes and convex hull use indirect triangulation. For the triangulation step,

CGAL [Yvinec, 2008] was used. Since it relies on the polygons being simple, but the chain code

algorithm uses late projection, chain code could only be used on true range images and not on scans

from actuated LRF. When run on ALRF scans, it often produces polygons with many more involved

self-intersections than the heuristic for removing them can remove.

The α-shapes algorithm is a standard algorithm from geometry for finding a polygon to a set

of points [Edelsbrunner and Mücke, 1994]. The α-shape of a 2D point cloud P is the set of all α-

exposed edges of the Delauney-triangulation. An edge is α-exposed if there is no circle of radius α

containing the edge and another point from P . The α-shape of P is a generalization of the convex hull,

parameterized on α: the ∞-shape of P is the convex hull. For smaller values, the α-shape may be

non-convex, contain holes, or even consist of multiple components. See figure 5.6 for an illustration.

For pseudo-code of the algorithm, please refer to algorithm 6. We used the implementation available

in CGAL [Da, 2008].

One natural limitation of the α-shapes algorithm is the dependency on the parameter α. Too great

values will close gaps and holes, rendering the representation of the environment less truthful. On

3 The algorithm is benchmarked since it was part of the Jacobs Robotics library, but it will not be discussed in detail

Patch Map Data-Structure

Figure 5.6: AN α-SHAPE (GREY) OF A 2D POINT CLOUD (ORANGE) – The radius of the circles is

α. Taken from [Da, 2008].

Algorithm 6 GENERATING THE α-SHAPE A FOR POINT CLOUD P

D ←Delauney-triangulation(P )

for all edges e ∈ D do

c e ←circumcircle(e)

if radius r ce < α and ∀p ∈ P : p ∈ c e ⇒ p ∈ e then

C ← C ∪ {e}

end if

end for

A = {e ∈ C|e outer edge}

5.3 Generation of patches from point clouds

Figure 5.7: SOME EXAMPLES FOR CONVEX POLYGONS ON A GRID

(a) Before processing the last line: three polygons in P

(b) After processing the last line: two polygon transferred to R, four in P and one link

polygon (dashed) in R

Figure 5.8: AN EXAMPLE ITERATION OF THE ALGORITHM – The polygons in P are depicted on top

of the pixels which are part of the region of the range image currently being processed

the other hand, small values might produce more than one polygon. Thankfully CGAL offers the

possibility to compute the optimal α-value to achieve a given number of polygons. In our case, we

obviously would like to have only one component.

The scanLine algorithms use late projection. Their approach is to find a set of convex polygons

covering the same area as the set of triangles that is produced by connecting all triplets of pixels

neighboring in the distance image. This set is supposed to have a triangular decomposition with as

few triangles as possible.

All convex polygons on a grid are (possibly degenerated) octagons (see figure 5.7 for some examples).

It can furthermore be observed that it can be decided whether a polygon is convex when

scanning it row by row. So an algorithm can be conceived, which reads the input regions row by

row and gradually builds convex polygons, starting a new polygon whenever the current one would

become non-convex (algorithm 7). This obviously has the advantage of linear runtime. The algorithm

uses the notion of a link polygon. Its construction is given in algorithm 8. An example iteration of the

algorithm is shown in figure 5.8. It exemplifies most of the cases from the algorithm. The stretches in

5.8(b) correspond, from left to right, to lines 11 (sub-case line 15), 9, 11 (sub-case line 13), and 3.

Triangulation of convex polygons on a grid is mostly trivial. The basic mode of triangulation can

be seen in the leftmost polygon in figure 5.9. Depending on how degenerated the octagon is, some

Patch Map Data-Structure

Algorithm 7 THE SCANLINE ALGORITHM FOR RANGE IMAGE TRIANGULATION – In the current

line i, build a list of polygons P i from the previous line’s list P i−1 , sorted from left to right. Result is

the set of convex polygons R. Versions 1 and 2 differ in how they define “convex” in lines 9 and 13.

1: for all lines i do

2: for all continuous stretches of pixels s do

3: if |P i−1 | = 0 then

4: P i ← P i ∪ s

5: else

6: p j ← left-most element of P i−1

7: s ′ ← last stretch added to p j

8: if p j and s share one or more columns then

9: if p j + s convex then

10: P i ← P i ∪ p j + s

11: else

12: R ← R ∪ {p j }

13: if s ′ + s convex then

14: P i ← P i ∪ s ′ + s

15: else

16: R ← R ∪ L(s ′ , s)

17: P i ← P i ∪ s

18: end if

19: end if

20: P i−1 ← P i−1 \ p j

21: else

22: if p j < s then

23: R ← R ∪ p j

24: P i−1 ← P i−1 \ p j

25: goto line 3

26: else

27: P i ← P i ∪ s

28: end if

29: end if

30: end if

31: end for

32: end for

33: for p ∈ P i do

34: R ← R ∪ {p}

35: end for

5.3 Generation of patches from point clouds

Algorithm 8 CONSTRUCTION OF A LINK POLYGON – Given two stretches of pixel s i , s i+1 on consecutive

lines i, i + 1 with s j = p j,1 , . . . , p j,nj , the link polygon L(s i , s i+1 ) =(TL,BL,BR,TR) is

derived as follows.

if p i,1

TL← (i, p i+1,1 − 1)

BL← (i + 1, p i+1,1 )

else if p i,1 = p i+1,1 then

TL← (i, p i,1 )

BL← (i + 1, p i+1,1 )

else

TL← (i, p i,1 )

BL← (i + 1, p i,1 − 1)

end if

if p i,ni

TR← (i, p i,ni )

BR← (i + 1, p i,ni + 1)

else if p i,ni = p i+1,ni+1 then

TR← (i, p i,ni )

BR← (i + 1, p i,ni+1 )

else

TR← (i, p i,ni+1 + 1)

BR← (i + 1, p i,ni+1 )

end if

Figure 5.9: TRIANGULATION OF CONVEX GRID POLYGONS – The numbers on the leftmost polygon

indicate all possible occurring triangles. On the other polygons, some of these have been left out based

on a simple decision tree using at most two tests for equality per triangle. Note that all vertices are

points on the grid.

Patch Map Data-Structure

(a) Triangulation with scanLineV1: 5 triangles. Accurate

coverage.

(b) Triangulation with scanLineV2: 1 triangle. Coverage

is less accurate.

Figure 5.10: COMPARISON OF TRIANGULATION WITH/WITHOUT THE RESTRICTION TO THE AREA

COVERED BY THE NAIVE TRIANGULATION – Please keep in mind that this is an extreme corner case,

since the data is usually much too noisy to allow outlines like the depicted one to continue over more

than two rows.

5.0

0.0

0.5

1.0 1.5

2.0

2.5

3.0

3.5

4.0

4.5

Figure 5.11: TRIANGLES WITH ANNOTATED SPIKYNESS

of the triangles can be left out [ibid.]. The only caveat is keeping watertightness. Two vertices of a

link polygon are typically on edges of polygons adjacent to the link polygons. This is not allowed in

trimeshes, so a vertex has to be inserted into the adjacent polygon which effects triangulation. These

vertices are marked ’L’ in sub-figure 5.8(b).

The major advantage of the scan line algorithm is its high speed. In fact, 64 % of the processing

time is used for projecting the points of the ideal plane of their region and related pre-processing.

Further 30 % are needed for the scan line algorithm and a mere 6 % for the actual polygonization.

One disadvantage of the algorithm are the very sharp polygons it outputs. Also, the restriction to the

area covered by the naive triangulation can produce an excess of triangles (figure 5.10).

In the first working version of the algorithm, scanLineV1, the edges are restricted to vertical and

horizontal lines and diagonals of squares of four pixels. This is relaxed in scanLineV2, which allows

for more types of diagonal lines. This reduces the number of triangles. On the other hand, the accuracy

is reduced (see figure 5.10).

For benchmarking, we define the spikyness of a triangle. The spikyness of a triangle T with inner

angles α 0 , α 1 , α 2 is defined as ∑ n=2

i=0 (π /3 − α i ) 2 . As an orientation: the spikyness of a symmetric

right triangle is ∼0.411, for a regular triangle it is 0. See figure 5.11 for examples.

5.3 Generation of patches from point clouds

Figure 5.12: Time taken, number of polygons and spikyness for the different triangulation algorithms

on the German Open ’09 Arena dataset.

Patch Map Data-Structure

Figure 5.13: Time taken, number of polygons and spikyness for the different triangulation algorithms

on the German Open ’09 Hall dataset.

5.3 Generation of patches from point clouds

The experiments were conducted on three of the datasets presented in chapter 3. The reader is

referred there for an in-depth analysis. Two of the datasets were recorded at the RoboCup German

Open ’09 at Hannover Fair. One is a run through the RoboCup Rescue arena consisting of 79 point

clouds (results in figure 5.12). The other is a circuit of the fair hall in which the arena was located

(figure 5.13). It consists of 78 point clouds. These datasets were recorded with an actuated laser range

finder. The other dataset was recorded with a range camera. It consists of hand chosen scenes in our

laboratory. They were picked for one of three criteria: being dominantly planar, dominantly spherical,

conical or cylindrical or for containing planar regions with holes in them. The dataset is hence called

’Planar/Round/Holes’. It contains 75 point cloud. Its results are in figure 5.14.

The two scanLine algorithms and the convex hull are the fastest contenders. Naive triangulation is

in the mid-field, while α-shapes and RI are slowest, with a slight advantage for RI. Where chain code

could be used, it performed between naive triangulation and RI.

In terms of number of polygons, the trivial triangulation was by far worst (as could be expected).

Then follow, from worse to better, scanLineV1, scanLineV2, and α-shapes. Convex hull and RI

perform best.

The spikyness depends on the dataset. A common feature is that the scanLine algorithms produce

quite spiky triangles, with a slight advantage of scanLineV2 and α-shapes and RI produce less spiky

triangles, with a slight advantage for RI. In the point clouds collected with an ALRF, the trivial triangulation

performs as well as RI and convex hull is worst. In the point clouds collected with a range

camera, trivial triangulation was slightly better than RI, chain code was slightly worse than α-shapes

and convex hull was slightly better than scanLineV2.

These differences can be attributed to the properties of the different ranging sensors. Laser beams

are reflected at steep angles. The near-IR light from the range camera’s light-source is not reflected

in a sufficient brightness beyond a certain angle. This means that on areas at steep angles to the

sensor, the ALRF will produce points which are stretched over a large area, while the range camera

will produce no points at all. Additionally, even without this distortion, the range camera’s pixels are

more evenly distributed than the ALRF’s beams. With the naive triangulation, this results in triangles

which are close to right triangles (spikyness 0.5) for the range camera. On the ALRF’s more irregular

pattern, triangles get spikier. Something similar goes for the convex hull: A more stretched polygon

as produced by the ALRF’s more stretched out point sets will be triangulated into spikier triangles.

In conclusion, the RI performed best, closely followed by the α-shapes. Since only the α-shapes

produce outlines which are needed henceforth, the α-shape algorithm will be used. In a context where

only trimeshes are needed, the RI is preferred. The convex hull was actually fastest, but is too coarse

an approximation for mapping purposes. It might be interesting though for display in a system with

different levels of detail as a coarse approximation to be used at great distances.

Outline simplification

Outlines constructed by outlining algorithms typically contain many vertices more than necessary for

subsequent processing. We call the process of removing excess vertices while maintaining resemblance

to the original shape outline simplification. There is a rich literature on algorithms for this

task. A survey can be found in [Heckbert and Garland, 1997]. Also, Mustafa gives an introduction

into the field in his Ph. D. thesis [Mustafa, 2004]. There are three groups of algorithms: Global ones

using constrained Delauney triangulation, local general ones and local ones working on monotone

subsections. A further distinction mark is whether the vertices of the simplification are supposed to

be a subset of the input polygon (strong simplification) or if they can be placed freely (weak simplification).

He mentions that the creation of an optimal weak simplification without self-intersection and

Patch Map Data-Structure

Figure 5.14: Time taken, number of polygons and spikyness for the different triangulation algorithms

on the Planar/Round/Holes dataset.

5.3 Generation of patches from point clouds

(a) The outline of a region as found by the α-shapes

algorithm: 141 vertices

(b) The same outline simplified by the Douglas-

Peucker algorithm: 9 vertices

Figure 5.15: An example of outlining

within a bound ɛ is NP hard and states that no approximation algorithm is known (p. 57).

However, [Mustafa, 2004] only covers the min-#-problem: finding an approximation within bound

ɛ in some error measure with as few nodes as possible. There is also the the min-ɛ-problem: finding

an approximation with a fixed number of nodes with minimal error ɛ (cf. e.g. [Imai and Iri, 1988]).

A very early and still very popular strong min-# simplification algorithm is by Douglas and

Peucker [Douglas and Peucker, 1973]. It is known as Ramer’s algorithm in vision and the sandwich

algorithm in computational geometry [Hershberger and Snoeyink, 1997].

While this classic works locally on general polylines, the most widely researched class is the

one of monotone polylines (i.e. one of the coordinates grows across all vertices) [Varadarajan, 1996,

Goodrich, 1995, Aronov et al., 2004]. This is because they can be approximated by a piecewise linear

function. It is, however, not a real limitation, since each polyline can be broken into a series of

monotone polylines.

A weak simplification algorithm on polylines was proposed by Imai and Iri [Imai and Iri, 1988].

In later works [Guibas et al., 1991], an algorithm with a lower runtime complexity was proposed. The

runtime complexity that actually applies depends on certain definitions and the type of object used.

In [Imai and Iri, 1988], piecewise linear functions can be approximated in linear time, while the run

time for general polygonal curves varies depending on the definition of the error criterion. For the

latter case, run time can be O(n 2 log n) or O(n log n). In [Guibas et al., 1991], it depends on the

definition of visiting the points “in order”.

Simplification algorithms can either work online, as the nodes arrive, or it can work offline, on

the entire polyline. In some domains such as sampling time series of data in medicine or natural

sciences, it is desirable to process data as it comes in. This is equivalent to exclusively using a sliding

window in an offline algorithm. This is, however, strongly discouraged by the pathological examples

these types of algorithms can yield as given in [Shatkay, 1995]. An improved variant is proposed

in [Keogh et al., 2003]. Since we use an algorithm with separate steps (plane extraction, outlining,

outline simplification, triangulation), we can use offline algorithms.

Since the Douglas-Peucker has proven itself so extensively, there was no reason not to use it.

Its only small drawback is that it assumes an open polyline as input and not a close polyline, i.e. a

polygon. However, using the polyline between the first and last vertex as arbitrarily defined by CGAL,

the results look very convincing. This way the first and last vertex will always stay a part of the outline,

5.4 Patch Maps

5.4.1 Patch Map Capabilities

Potential capabilities of a patch map framework are defined by simple interfaces. This allows for

flexibility. Not all implementations have to offer all kinds of queries e.g. for collision or proximity.

Algorithms specify which exact capabilities they need to run and thus indirectly which patch map

implementation they can run on.

PatchMapInterface The PatchMapInterface represents the basic capabilities a patch map framework

must offer. These are adding of patches and read-only sequential access to all patches. A patch

map framework must be able to store both planar and trimesh patches. If it gives each a special

treatment or e.g. simply triangulated the planar patches is up to the individual implementation.

ProximityQueryPatchMap The ProximityQueryPatchMap interface represents the ability to return

the patch which is closest to a specified point.

Un/SpecificPatchStepIntersector The UnspecificPatchStepIntersector interface represents the

ability to test for collision freedom when traversing from one point to another. In this context “unspecific”

means that in the case of collision the colliding patch is not returned. This interface does

not precisely define behavior; this is done by its children UnspecificPatchLineIntersector and UnspecificPatchCapsuleIntersector.

Still, it can be used in constructing a simple RRT. In case of a

collision, the SpecificPatchStepIntersector also returns the patch which the line collides and the

collision point on the patch.

Un/SpecificPatchLineIntersector The interface UnspecificPatchLineIntersector represents the

ability to test for collisions on a line between two points. Since this means that it does not consider

object size, it can only be used as an approximation. In case of a collision, the SpecificPatchLineIntersector

also returns the patch which the line collides and the collision point on the patch.

Un/SpecificPatchCapsuleIntersector A capsule is the convex hull of two spheres of identical radius.

In other words, it is the volume a sphere sweeps when cast from one point to another. The

UnspecificPatchCapsuleIntersector interface represents the capability to test for collisions of a

sphere on a path between to points. Considering the sphere to be the bounding sphere of a vehicle,

this test is useful for motion planning. In case of a collision, the SpecificPatchCapsuleIntersector

also returns the patch which the line collides and the collision point on the patch.

5.4.2 Patch Map Frameworks

In the following, we will present different Patch Map Frameworks. Here, this term refers to a datastructure

to store patches together with the associated algorithms. The central problem patch maps try

to solve is collision detection.

Collision detection means efficiently testing whether two or more geometric primitives or complexes

intersect. It typically consists of two phases. A broad phase perform a cheap check on all

existing objects and returns a list of objects potentially in collision. On these, a more expensive and

more accurate check is performed in the narrow phase.

The main motivation and the most prominent case to use a Patch Map Framework are planar

patches. Planar patches are relatively simple which tempts to implement from scratch the subset of

Patch Map Data-Structure

collision detection needed. Additionally, collision detection libraries typically do not handle nonconvex

polygons as primitives, making it necessary to increase the complexity of the data (from

polygons to trimeshes). A first attempt yielded SimplePatchMap. Adding a simple broad-phase

to the collision detection lead to BoundingBox6DPatchMap.

During implementation, it became clear that it would be very hard to fulfil all requirements on the

patch map with naive approaches. And even in case of success, they would probably still be lacking

in terms of speed and stability. So we decided to use an external collision detection library. We

considered several popular libraries. In general it can be said that we chose the library which left the

least amount of programming work to be done. That is, we reviewed the supplied collision primitives

and algorithms. Specifically, we discarded RAPID [Gottschalk, 1997] because it only offers twobody

collision check [Gottschalk, 1998] (i.e. whether two given collision objects collide) and not the

n-body check we need (i.e. whether a given collision objects collides with a collection of collision

objects [the map]). In other words, it does not implement a broad phase collision detection. It also

lacks primitives other than triangles. We could not make SOLID [van den Bergen, 2004] work 4 . Since

there were many alternatives available, we avoiding spending too much time on it. Our first working

library was OPCODE for its small code size and high speed [Terdiman, 2003]. It sufficed for early

experiments with ray casting. For a proper roadmap, we need to cast bounding volumes, so we turned

to more elaborate libraries. We chose Bullet [Coumans, 2009] because

• it is the popular free physics engine among game developers [ibid.],

• the author of OPCODE, whose work there we liked, contributed to it,

• it has a lively community to help with potential programming issues, and

• one of our colleagues has had bad experiences with ODE [Smith, 2006], a potential alternative.

To use a full physics engine might seem too much for the problem, but firstly this additional functionality

does not inhibit the performance as a collision detection library and secondly a physics engine

might prove useful in the future, e.g. for drivability tests for ground based robots.

SimplePatchMap

The SimplePatchMap is the reference implementation for most interfaces. It does not have a broad

phase; it simply iterates over an array of all patches. In the narrow phase, it can natively collide planar

patches: It first checks for collision with the infinite plane of the patch. If it collides, the intersection

point is transformed to the 2D coordinate system of the polygon and checked for inclusion in it with

the algorithm from CGAL [Giezeman and Wesselink, 2008].

BoundingBox6DPatchMap

The BoundingBox6DPatchMap uses a variant of the sweep and prune algorithm [Baraff, 1992] to

add a broad phase to the SimplePatchMap’s narrow phase. Patches’ bounding boxes are entered

into a six dimensional kD-tree. (The first three dimensions are the minima, the last three dimensions

are the maxima.) To get patches potentially in collision with a line from point a to b, a query on the

kD-tree for all points in the partially open hyper-cuboid Q is used. Here, Q is defined by its minimum

and maximum point min Q and max Q . On min Q , the lower three dimensions are unbounded,

4 We tried FreeSolid version 2.1.1 on Ubuntu Linux 9.10. Installation worked flawlessly. However, some of the header

files provided included header files that were supposed to be among the installed files but were not.

100

5.4 Patch Maps

while the upper three are equal to the minimum of the bounding box of a and b: min [1−3]

Q

= −∞,

min [4−6]

Q

= min BB{a,b} . Conversely, max [1−3]

Q

= max BB{a,b} , min [4−6]

Q

= ∞. We illustrate this in

one dimension in figure 5.17.

While on first glance this interpretation may look like a mere re-ordering of inequality-checks, it

allows for optimization. Framing the interference check as a containment check on a kD-tree allows

us to use its efficient look-up. As kD-trees are popular data-structures, there are heavily optimized

implementations available. The effort for the BoundingBox6DPatchMap is reduced to one insertion

per patch and one look-up per collision check broad phase. This also simplifies the implementation.

OpcodePatchMap

The OpcodePatchMap relies on the OPCODE library for collision detection [Terdiman, 2003].

Since OPCODE does not store polygonal information, only bounding boxes, the patches are additionally

stored in an array. Planar patches are triangulated for use with OPCODE. It offers lines and

spheres as collision objects, but neither swept objects nor capsules. Hence, it only served for prototyping

using line-collisions with the UnspecificPatchLineIntersector. To achieve more useful results,

Bullet was adopted for collision detection. (For the choice of the collision detection library also see

above.)

BulletPatchMap

The BulletPatchMap uses the Bullet Physics Engine for collision detection [Coumans, 2009]. Bullet

uses trimeshes, so planar patches have to be triangulated to be used in Bullet. The trimeshes are

generated directly in a format the Bullet can use. In the future, it might be worth trying to decompose

the patch’s polygons into convex polygons because these are also supported by Bullet. Since such a

decomposition yields far fewer components then a triangulation, there is potential for higher speed.

PointCloudMap

The PointCloudMap is a collection of point clouds. Although there are no patches present, some

patch map interfaces are implemented to allow for a comparison of the patch maps with a traditional

map type. Proper patches (i.e. planar or trimesh patches) cannot be entered into this map.

The first approach was to render points as spheres with radius 0 or a very small radius as Bullet

collision objects. However, point clouds can get quite large, especially if they originate from an ALRF

(typically hundreds of thousands of points). Entering each as its own collision is quite challenging

to the collision frameworks. Alternatively, one could enter the entire point cloud as one compound

object. If the sensor covers a wide spectrum of ranges or has a very large opening angle, the bounding

volume of the point cloud would be overly large and hence not very useful. A typical example is a

robot standing in the middle of a room. Floor, ceiling, walls and objects are all covered by its ALRF.

Should it now enter the complete PC as a compound collision object, the resulting bounding volume

would encompass the whole room. Depending on how the collision detection framework handles

compound objects, this could render the broad phase of collision detection useless. In that case, all

collisions between the robot and the features of the room would have to be determined in the typically

expensive narrow phase.

We used a simple middle-ground algorithm. The point cloud was entered into a kD-tree which is

then balanced. For each node N at a certain depth d, all points under N were used to generate sub

point clouds, which were then entered as compound objects into Bullet. Due to the properties of the

101

Patch Map Data-Structure

P 2

P 3

P 0

P 1

a

b

(a) One-dimensional bounding boxes and two points from the collision query: “Does the line from a to b intersect any

bounding boxes?”

x

P 2

P 3

x max

a

b

P 0

P 1

a

P P

b

P 2

3

0

P1

x min

(b) Corresponding higher dimension point bounding boxes and open hyper-cuboid for collision query (unbounded towards

−∞ on x min axis, towards ∞ on x max axis). Since the points for P 1 and P 3 lie inside the hyper-cuboid, the broad-phase

would now pass the corresponding patches to the narrow phase collision detection.

Figure 5.17: ONE DIMENSIONAL CASE OF BOUNDING BOX BASED BROAD PHASE INTERSECTION

TEST – It uses higher dimensional point bounding boxes and open hyper-cuboid queries.

102

5.4 Patch Maps

kD-tree, the point cloud is repeatedly broken up at the median coordinate in one dimensions, cycling

through all three. This is a very simple way to break up the big bounding volume of the point cloud

into relatively well chosen sub bounding volumes. We took into account some over-segmentation, but

limited under-segmentation. A value used for d in many experiments with ALRF points clouds of up

to 200,000 points is 9. This means that one point cloud was split up into 512 sub point clouds. Please

note that setting d to 0 or a very high value allowed us to also test the two naive approaches ruled out

above.

However, since Bullet requires its objects to be of a certain minimum size [Coumans, 2010], this

approach did not deliver correct results. Since points are not commonly used as collision objects,

none of the surveyed collision detection libraries supports them. As we have seen in the failed attempt

with Bullet, simulating points with objects of size zero offers dubious chances of success. Hence, we

implemented a simple collision detection algorithm ourselves.

Our collision algorithm is given in pseudo-code as algorithm 10. It assumes that point clouds

densely sample surfaces. Specifically, it is assumed that the radius will always be big enough such

that a sphere cannot pass between points which lie on the same surface. For an optimized nearest

neighbor search, the points are stored in a CGAL kD-tree [Tangelder and Fabri, 2008] which offers

such a facility. An example of how the algorithm works can be found in figure 5.18. For it to always

terminate, l 2 > 0 has to hold at all times. This is guaranteed due to the termination criterion in line 7.

A theoretical limitation is the algorithm’s dependence on the point cloud structure. If a successful test

is performed in a very dense area with many points close to the capsule, it will be very slow. However,

this is a rare case since generally in denser areas tests are less likely to be successful.

Algorithm 10 COLLISION DETECTION FOR A CAPSULE IN A POINT CLOUD – Given: point cloud

P , capsule end points p 1 , p 2 , capsule radius r. Code in .

1: if min p∈P ||p − p 1 || ≤ r or min p∈P ||p − p 2 || ≤ r then

2: return true

3: end if

4: p ← p 1

5: s ← (p 2 − p 1 )/||p 2 − p 1 ||

6: while p i between p 1 , p 2 do

7: if min p∈P ||p − p i || ≤ r then

8: return true

9: end if

10: l ← √ min p∈P ||p − p i || 2 − r 2

11: p i ← p i + sl

12: end while

13: return false

The PointCloudMap only supports the unspecific collision queries (UnspecificPatchCapsuleIntersector).

This is because the two most commonly used algorithms do not use the “specific” interfaces.

Hence the effort would not have justified the reward. Also, for the other algorithms, it does

not make sense to consider a complete point cloud as a patch. Consequently, they would have to be

subdivided in a way that makes sense.

Furthermore, the PointCloudMap does not support proximity queries. This is because they rely

on the notion of the “distance to a patch”. Apart from the difficulty with identifying an equivalent

to patches as outlined above, it is also challenging to define distance to a patch in a way such that it

carries across the implications from planar patches (and to a lesser extent, trimeshes). This is because

103

Patch Map Data-Structure

(a) Nearest neighbor searches are conducted

at each end of the capsule.

Since neither of the nearest neighbors is

within the capsule, the algorithm continues.

(b) The algorithm starts at one end of

the capsule and progresses to the other

one. When the nearest neighbor to the

query point is further away than the

capsule radius, it continues to query at

a point closer to the capsule’s other end.

of the capsule and a sphere with radius

equal to the distance to the nearest

neighbor, center in the old query point,

projected to the spine.

(d)

(e) The current query point is dark

green, the query for the next iteration

is light green.

(f)

(g) Not pictured: If a nearest neighbor

search finds a point closer than the capsule

radius, the algorithm terminates reporting

a collision.

(h)

(i) When the new query would have

to start behind the second end of the

capsule, the algorithm returns reporting

that there is no collision.

Figure 5.18: A RUN OF THE KD-TREE BASED COLLISION DETECTION ALGORITHM DETECTING

NO COLLISION. – The blue area is to be checked for any of the black points. The green arc and the

red marking of a point represent the result of a nearest-neighbor search.

104

5.5 Algorithms on patch maps

Algorithm Proximity Specific Step Unspecific Step

√

RRT – –

√

Evasion RRT –

–

√ √

MA RRT

–

√

PRM – –

√ √

MA PRM

–

Table 5.2: REQUIRED INTERFACES FOR PATCH MAP BASED ALGORITHMS – Proximity is short for

ProximityQueryPatchMap, Specific Step for SpecificPatchStepIntersector, and Unspecific Step

for UnspecificPatchStepIntersector

there, the distance is defined along the plane normal for planar patches and along the local normal

for trimeshes. Producing comparable behavior in the PointCloudMap is not inconceivable, but out of

scope of this work.

5.5 Algorithms on patch maps

We implemented a number of algorithms on patch map frameworks. They do not use the patch map

frameworks directly, via different capabilities, i.e. interfaces. Which capabilities an algorithm needs

determines which patch map framework it can run on. These algorithms include

• the Rapidly Exploring Random Tree (RRT) [LaValle, 1998]

• the Evasion RRT, a variant of the RRT,

• the Probabilistic Roadmap Method (PRM) [Kavraki et al., 1996],

• the Medial Axis PRM (MA PRM) presented in [Wilmarth et al., 1999] adapted to 3D patch

maps, and

• the Medial Axis RRT (MA RRT) - inspired by [Wilmarth et al., 1999].

Table 5.2 shows which algorithms require which interfaces. The RRT algorithms starts adds collisionfree

edges to random points (shortened to a maximum length) successively to a tree starting at a

specified root. The PRM algorithm works similarly, only that it grows in potentially disconnected

components on the entire map and does not limit the length of edges. The RRT and PRM algorithms

were chosen for further, in-depth experiments on real data, which are presented in chapter 6. To make

that chapter more coherent, we present these two algorithms in detail in section 6.1.

5.5.1 Evasion RRT

The Evasion RRT algorithm can be found in algorithm 11. The difference to standard RRT is that

upon collision, it tries to place a non-colliding vertex in the vicinity of the first tried vertex. The

position will be next to the planar patch in the planar patch’s plane. The rationale is that it is desirable

to have many vertices near collision objects. A limitation is that it is currently only defined for planar

patches.

105

Patch Map Data-Structure

Algorithm 11 THE EVASION RRT ALGORITHM – Building roadmap T on map M. The

COLLISION*(M,x 1 , x 2 ) function family checks the map for patches between x 1 and x 2 . If p is a

planar patch, n p is its normal.

1: proc BUILD EVASION RRT(x init ,M)

2: T.init(x init )

3: for k:=1 to K do

4: x rand ← RANDOM POINT()

5: EXTEND(T,M,x rand )

6: end for

7: return T

8:

9: proc EXTEND(T,M,x)

10: x near ← NEAREST NEIGHBOR(x,T)

11: u new ← x near − x

12: u new ← lstep

|u new| u new

13: x near ← x + u new

14: while COLLISION(M,x, x near ) and p = p old do

15: p ← COLLISION PATCH(M,x, x near )

16: x c ← COLLISION POINT(M,x, x near )

17: d ev ← u new × n p

18: d ev ← lev

|d ev| d ev

19: x near ← x near + d ev

20: end while

21: T.add vertex(x new )

22: T.add edge(x near , x new , u new )

106

5.5 Algorithms on patch maps

5.5.2 Medial Axis algorithms

Given a point set P and a point q ∉ P , the distance d P (q) of q to P is the distance of the closest point

in P to q.

d P (q) := min ||p − q||

p∈P

The medial axis P MA of two point-sets P 1 , P 2 is the set of all points with equal distance to P 1 , P 2 .

P MA := {p|d P1 (p) = d P2 (p)}

The Medial Axis algorithms attempt to create smooth paths that keep maximum distance to the obstacles

by placing vertices only on the medial axes. It does so by relocating random points to the

nearest medial axis and thus avoids calculating the medial axes explicitly. In [Wilmarth et al., 1999],

a volumetric method like an occupancy mesh is assumed. This means that a big part of their algorithm

is dedicated to finding the border between free space and occupied space. In a patch map, there is no

occupied space. Still, the patches correlate to this border: they cannot be crossed. Thus, adapting the

algorithm to patch maps actually simplifies it (algorithm 12).

Algorithm 12 MEDIAL AXIS PRM ADAPTED TO PATCH MAPS – Building roadmap T on map M.

The COLLISION(M,x 1 , x 2 ) function checks the map for patches between x 1 and x 2 .

1: T.init(x init )

2: for k ← 1 to K do

3: x rand ← RANDOM POINT()

4: p ← CLOSEST PATCH(M,x rand )

5: c ← CLOSEST POINT ON PATCH(M,x rand )

6: determine s such that x ′ rand = x rand + s(x rand − c) has two closest patches

7: T.add vertex(x ′ rand )

8: end for

9: for v vertex of T do

10: V near ← N NEAREST NEIGHBORS(T, v, N)

11: for v near ∈ V near , in order of increasing ||v − v near || do

12: if not SAME COMPONENT(T,v, v near ) and not COLLISION(M,v, v near ) then

13: T.add edge(v, v near )

14: end if

15: end for

16: end for

17: return T

5.5.3 Experiments

Thorough experiments with the roadmap algorithms will be discussed in chapter 6. Using all mapping

frameworks and all algorithms there would have been overkill. To decide which mapping framework

and which algorithms to use, we conducted preliminary experiments on all combinations for benchmarking.

Another goal was to direct the implementation effort: A certain functionality was needed

for the thorough experiments (most importantly capsule collisions). To save time, full functionality

was only to be implemented on the mapping framework that came out of the preliminary experiments

(using line collision checks) best.

107

Patch Map Data-Structure

Figure 5.19: MAP USED IN PRELIMINARY EXPERIMENTS – A 3D map was generated from this floor

plan. Walls are black, doors are light grey, regions of interest (ROIs) are red, starting point is blue.

A 3D patch map was generated from the floor plan in figure 5.19. Each wall was transformed into

a planar patch of fixed height, with a door if specified. A floor and ceiling were added.

For each patch map framework (simple, 6D bounding box, OPCODE, and Bullet), one map was

generated. The five algorithms presented in the previous section were run on each map. The only

exception is OPCODE. Since it only offers checks if there was a collision, but no straight-forward

way to detect with which part of the map, three mapping algorithm could not be run on the OPCODE

map. Two other limitations should be noted: Only Bullet offers collision checks for capsules, so all

test were run using ray collision checks. Also, the simple implementation and the implementation

based on 6D bounding boxes only offer support for planar patches.

All roadmap algorithms use different methods. For example, the Medial Axis PRM can be expected

to generate less vertices than RRT in a given amount of time. But on the other hand, these

vertices can be expected to be more useful. A common task was defined to even out these differences.

The algorithms had to reach all four regions of interest (ROIs) in the map in figure 5.19 from a common

starting point in the center, then the run was terminated. The experiments were executed 50 times

with different random seeds. The tables below show medians and both quartiles of all successful runs.

The Medial Axis PRM was given special treatment. It was implement as closely as possible to

[Wilmarth et al., 1999], which uses two phases: a preliminary phase to place the vertices on the medial

axes and a final phase to connect the vertices with the PRM. For evaluation, reaching all ROIs had

to be guaranteed. However, placing vertices is relatively expensive. Since using enough iterations to

reach all ROIs every time would be prohibitive, the optimal (within 5%) number of vertices reaching

all ROIs was determined with a variant of binary search. The results presented below are from the

runs with this number. The premise was that, should the MA PRM prove itself, a one-phase anytime

implementation could be written (placed vertices would be connected right away, but edges could be

removed later on). However, it turned out that MA PRM could not always reach all ROIs within the

maximum iterations (required by limits in system resources). Out of 50 runs, it reached all ROIs 28

times for OPCODE, simple and 6D bounding box based implementations and 30 times for the Bullet

map. The numbers on the performance only refer to successful runs, further skewing the outcome in

favor of the MA PRM. As will turn out, it still does not perform well.

108

5.5 Algorithms on patch maps

time [s] Simple BB6D OPCODE Bullet

RRT 0.010057 0.006137 0.002435 0.004721

MA RRT 0.124055 0.060409 – 0.055298

Ev. RRT 0.18627 0.156209 – 0.099351

PRM 0.013578 0.010544 0.001309 0.008801

MA PRM 1.96683 1.20087 – 0.678831

Table 5.3: MEDIAN WALL TIME TAKEN TO REACH THE ROIS FROM THE STARTING POINTS

time [factor] Simple BB6D OPCODE Bullet

RRT .547/1.874 .570/1.837 .716/1.544 .570/1.730

MA RRT .513/1.484 .550/1.643 – .520/1.545

Ev. RRT .339/2.141 .305/1.909 – .337/2.124

PRM .525/2.621 .520/2.830 .770/1.388 .452/3.123

MA PRM .253/1.307 .239/1.243 – .354/1.558

Table 5.4: LOWER AND UPPER QUARTILE OF WALL TIME TAKEN TO REACH THE ROIS FROM THE

STARTING POINTS – The given values are the ratios of the quartiles to the medians.

5.5.4 Results

All values except time should be the same on all map implementations. After all, the random number

generator was seeded with the same seed, so the same collisions should be checked. However, collision

detection uses many floating point operations. Numerical stability is a goal, but it is not always

achieved.

Table 5.3 shows the median wall time taken. We can see that RRT is the fastest algorithm with

PRM a respectable runner-up. All other algorithms are one or even two (MA PRM) orders of magnitude

slower. In terms of maps, Bullet and OPCODE are roughly the same for RRT, but OPCODE

shows some surprising weakness with PRM. The naive CGAL based implementations come relatively

close to Bullet – factor 2.35 in the worst case (Simple/MA PRM), factor 1.09 in the best case

(BB6D/MA RRT). Examination of the quartiles given in table 5.4 shows that PRM has the largest

variance. MA PRM, while slowest, shows the smallest upward variance. It also has the largest downward

variance, but this also a good thing. Among the mapping frameworks, Bullet and the naive

implementation show most variance, OPCODE is most stable.

Reaching the ROIs quickly is commendable, however, producing paths with favorable properties

is also important. A basic, yet important property is the length (medians in table 5.5). MA RRT

produces the shortest paths, the ones produces by RRT are only marginally longer. Especially the

PRM based algorithm produce long paths. For the basic PRM, this can be blamed on the unlimited

path length [m] Simple BB6D OPCODE Bullet

RRT 23.1608 23.1608 21.5803 23.1608

MA RRT 22.166 22.3942 – 22.1411

Ev. RRT 25.7439 26.621 – 26.621

PRM 41.3277 41.3277 40.2935 41.3277

MA PRM 66.0655 64.6227 – 68.2961

Table 5.5: MEDIAN ADDED LENGTH OF PATHS FROM STARTING POINT TO ALL ROIS

109

Patch Map Data-Structure

path length [factor] Simple BB6D OPCODE Bullet

RRT .904/1.060 .904/1.060 .926/1.158 .904/1.060

MA RRT .947/1.042 .935/1.034 – .943/1.047

Ev. RRT .923/1.104 .932/1.066 – .924/1.066

PRM .838/1.212 .838/1.212 .843/1.291 .838/1.212

MA PRM .797/1.094 .815/1.167 – .771/1.158

Table 5.6: LOWER AND UPPER QUARTILE OF ADDED LENGTH OF PATHS FROM STARTING POINT

TO ALL ROIS – The given values are the ratios of the quartiles to the medians.

steep/total turns/ratio Simple BB6D OPCODE Bullet

RRT 3/20/0.150 3/20/0.150 2/18/0.111 3/20/0.150

MA RRT 3/18/0.167 3/18/0.167 – 2/18/0.111

Ev. RRT 3/23/0.130 3/24/0.125 – 3/24/0.125

PRM 7/14/0.500 7/14/0.500 4/8/0.500 7/14/0.500

MA PRM 36/160/0.225 36/167/0.216 – 38/189/0.201

Table 5.7: MEDIAN STEEP TURNS ON PATHS FROM STARTING POINT TO ROIS – Steep is defined

to mean greater than 90°. The number given are median number of steep turns in paths/median total

number of turns in paths/ratio.

edge length which produces longer detours. The MA PRM on the other hand is designed to use

detours along the medial axes of the map to maximize distance to obstacles. It achieves this better

than the MA RRT, which immediately connects nodes moved to the medial axes. In table 5.6, lower

and upper quartile of the path lengths are given as a factor of the respective median. We can see that

the RRT-based algorithms are rather stable while the PRM based ones show slightly more variance.

Paths in 3D are to be used by UAV (Unmanned Aerial Vehicle) or AUV (Autonomous Underwater

Vehicle). Both types of vehicles suffer from kinematic and kinodynamic constraints, so steep turns

are to be avoided. This is considered in two tables. Table 5.7 shows the median number of steep

turns on the paths from the starting point to the ROIs and the total number of turns. A steep turn is a

turn of over 90°. PRM produces paths with few total turns, however, many of these (50%) are steep.

As with the length, this can be blamed on the unlimited edge length. MA PRM, on the other hand

produces less than half as many steep turns. This is a property of the algorithm. (Unfortunately, it also

produces many more vertices than any of the other algorithms.) However, the RRT based algorithms

did significantly better still. Considering the variances, the lower quartiles are in table 5.8, the upper

ones in table 5.9. Evasion RRT further shows its potential to create the smoothest paths all over. PRM

steep/total turns/ratio Simple BB6D OPCODE Bullet

RRT 2/17/0.118 2/17/0.118 1/16/0.062 2/17/0.118

MA RRT 2/17/0.118 1/17/0.059 – 2/17/0.118

Ev. RRT 2/21/0.095 2/22/0.091 – 2/22/0.091

PRM 4/9/0.444 4/9/0.444 3/5/0.600 4/9/0.444

MA PRM 22/91/0.242 22/95/0.232 – 24/95/0.253

Table 5.8: LOWER QUARTILE OF STEEP TURNS ON PATHS FROM STARTING POINT TO ROIS – Steep

is defined to mean greater than 90°. The number given are lower quartile of number of steep turns in

paths/lower quartile total number of turns in paths/ratio.

110

5.5 Algorithms on patch maps

steep/total turns/ratio lower Simple BB6D OPCODE Bullet

RRT 4/22/0.182 4/22/0.182 3/21/0.143 4/22/0.182

MA RRT 4/20/0.200 4/20/0.200 – 4/20/0.200

Ev. RRT 5/25/0.200 5/25/0.200 – 5/25/0.200

PRM 9/18/0.500 9/18/0.500 7/11/0.636 9/18/0.500

MA PRM 46/242/0.190 48/242/0.198 – 52/238/0.218

Table 5.9: UPPER QUARTILE OF STEEP TURNS ON PATHS FROM STARTING POINT TO ROIS – Steep

is defined to mean greater than 90°. The number given are upper quartile of number of steep turns in

paths/upper quartile total number of turns in paths/ratio.

curviness [deg] Simple BB6D OPCODE Bullet

RRT 78.0185 78.0185 71.383 78.0185

MA RRT 78.5776 77.6607 – 78.8892

Ev. RRT 80.0567 78.5778 – 77.8766

PRM 119.903 119.903 135.332 119.903

MA PRM 86.8547 86.8317 – 86.5707

Table 5.10: MEDIAN CURVINESS OF PATHS FROM STARTING POINT TO ROIS – Curviness is the

upper quartile of turns on the paths from starting point to ROIs.

has almost no variance. RRT, MA RRT, and MA PRM show little enough variance to be considered

reliable.

Table 5.10 confirms these findings. It shows the curviness, the upper quartile of the angle of all

turns on paths to the ROIs. The path generated by the RRT based planners is similarly smooth, while

MA PRM trails behind and PRM is the worst by far. The variance of the curviness is so small that all

algorithms can be considered stable in this regard (table 5.11). The lower quartile is only once lower

than 83% of the median: 79.1% for MA RRT/Simple. Only for the two algorithms on the OPCODE

implementation does the upper quartile exceed 110% of the median: 115.3% and 115.8% for RRT

and PRM respectively.

In conclusion, RRT stands out as the winner, especially concerning efficiency, However, PRM is

a widely recognized standard algorithm, so it is worth further consideration. In terms of mapping

frameworks, we found that Bullet, in addition to being functionally most complete, is also the fastest

implementation.

curviness [factor] Simple BB6D OPCODE Bullet

RRT .850/1.093 .850/1.093 .858/1.153 .850/1.093

MA RRT .791/1.048 .838/1.060 – .832/1.065

Ev. RRT .913/1.042 .937/1.060 – .946/1.063

PRM .876/1.080 .876/1.080 .905/1.158 .876/1.080

MA PRM .973/1.028 .968/1.029 – .960/1.042

Table 5.11: LOWER AND UPPER QUARTILE OF CURVINESS OF PATHS FROM STARTING POINT TO

ROIS – Curviness is the upper quartile of turns on the paths from starting point to ROIs. The given

values are the ratios of the quartiles to the medians.

111

112

Patch Map Data-Structure

Chapter 6

3D Roadmaps for Unmanned Aerial

Vehicles on Planar Patch Map

In the previous chapter we introduced the patch map framework that represents the real world as a

collection of patches. In this chapter, we set out to thoroughly test its usability for path planning.

Based on the experiments conducted in section 5.5.3, we choose the Rapidly-exploring Random Tree

(RRT) and the Probabilistic Roadmap Method (PRM).

While RRT and PRM were originally intended for motion problems in high-dimensional spaces

with non-holonomic constraints [LaValle and Kufner, 2001], they are also suitable in problems of low

dimensionality [Koyuncu and Inalhan, 2008, Andert and Adolf, 2009, Pettersson and Doherty, 2006].

In our case, we do not use the configuration space [Lozano-Perez, 1983] and a point robot as originally

proposed, but operate directly on the workspace. We check for collisions with a bounding sphere.

This means that we do not need to handle robot orientation. The three remaining degrees of freedom

correspond to the three spatial dimensions. These restrictions mean that generated paths will most

likely be unusable for fixed-wing Unmanned Aerial Vehicles (UAV). They are, however, usable by

AUV and Vertical Take-off and Landing (VTOL) UAV. On the positive side, this approach allows us

to use available highly optimized professional grade collision detection frameworks.

6.1 Experiments

We implemented the RRT and PRM algorithm for the patch map. For now, we plan piecewiselinear

paths to evaluate patch maps in path planning. That is sufficient to evaluate performance,

correctness and path lengths. As we saw in the literature review (section 1.3), the piecewise-linear

path can be smoothed according to the vehicles’ motion constraints in a separate later phase (e.g.

[Andert and Adolf, 2009, Pettersson and Doherty, 2006, Hrabar, 2008]).

The RRT algorithm from [LaValle, 1998] is reproduced as algorithm 13. When applied to the

patch map, RANDOM STATE() (line 4) is implemented as a uniformly distributed random point

from the bounding box of all patches. The NEAREST NEIGHBOR() method (line 10) uses an octree.

NEW STATE() in line 11 is implemented as an interference check.

Algorithm 14 reproduces the original PRM algorithm. The method NEAREST NEIGHBORS()

(line 5) is taken from CGAL [Tangelder and Fabri, 2008]. SAME COMPONENT() (line 8) is implemented

with an associative array which associates each node with its component.

113

3D Roadmaps for Unmanned Aerial Vehicles on Planar Patch Map

Algorithm 13 The original RRT algorithm from [LaValle, 1998]

1: proc BUILD RRT(x init )

2: T.init(x init )

3: for k ← 1 to K do

4: x rand ← RANDOM STATE()

5: EXTEND(T,x rand )

6: end for

7: return T

8:

9: proc EXTEND(T,x)

10: x near ← NEAREST NEIGHBOR(x,T);

11: if NEW STATE(x, x near , x new , u new ) then

12: T.add vertex(x new );

13: T.add edge(x near , x new , u new )

14: if x new = x then

15: return Reached

16: else

17: return Advanced

18: end if

19: end if

20: return Trapped

Algorithm 14 THE ORIGINAL PRM ALGORITHM FROM [KAVRAKI ET AL., 1996] – Notation

adapted to that used in RRT algorithm

1: T.init()

2: for k ← 1 to K do

3: x rand ← RANDOM STATE()

4: X near ← N NEAREST NEIGHBORS(T,x rand , N)

5: T.add vertex(x rand )

6: for x near ∈ X near , in order of increasing ||x rand − x near || do

7: if not SAME COMPONENT(T, x rand , x near ) and not COLLISION(x rand , x near ) then

8: T.add edge(x rand , x near )

9: end if

10: end for

11: end for

114

6.1 Experiments

Data structure and nearest neighbor search

The basic data structure is an “adjacency list” type graph from the boost graph library [Siek et al., 2001]

with 3D Cartesian points as vertices. For use with the RRT, it is supplemented by an octree à

la [Poppinga et al., 2007] in which the vertices are additionally entered. This achieves a very quick

nearest neighbor search. Since this octree implementation cannot store data on the nodes and since

more than one vertex might fall into one cell, an additional data structure is used. A hashtable stores

a mapping from an octree cell to a list of vertices. Because of the way the roadmap algorithms used

here are constructed, nodes are quite sparse. Only very rarely do two or more points fall within one

cell of the octree, so the list of vertices typically contains only one element.

For the PRM, a CGAL kD-tree [Tangelder and Fabri, 2008] is used instead of an octree. This

provides for an n-nearest-neighbors-search.

NEW STATE() and COLLISION()

The NEW STATE() and COLLISION() functions are where the collision detection takes place. They

are implemented by each mapping framework used. In our experiments, we used the Bullet physics

library [Coumans, 2009] and our point cloud map (see section 5.4.2). Since Bullet’s collision primitives

do not include non-convex polygons with holes, the planar patches were triangulated using the

Delaunay triangulation from CGAL [Yvinec, 2008].

The function NEW STATE(x, x near , x new , u new ) explores one possible extension of the roadmap.

First, it defines u new , a new edge from x to x near . Then it shortens u new to a fixed distance from x

such that it ends in x new . Then it checks for collisions. The return value depends on whether a sphere

cast from x near to x new has a collision with any of the trimeshes (or point clouds in the case of the

PC map).

The simpler COLLISION(x, y) only returns if there is a collision if a sphere is cast from x to y.

6.1.1 Explored Volume

Apart from the time taken, we also consider the explored volume of the algorithms for each map. Each

roadmap was entered into an octree specific to the map and algorithm. Each node was entered as a

sphere, each edge as a cylinder with the radius also used in building the roadmap. All octrees had the

same dimensions: The smallest octree covering all maps. It had a grid size of 1024 3 , resulting in a cell

side length of 38.684 mm. A real-world example can be found in figure 6.1, an illustrative example in

figure 6.2.

In the case of the PRM algorithm, the collision detection for the point cloud map did not work

reliably. This was probably caused by the very short edges occurring. We implemented our own

collision detection based on CGAL’s kD-tree [Tangelder and Fabri, 2008], see section 5.4.2. While

the times given for the purpose of speed comparison are for the Bullet-based maps, the explored

volume was determined using our method of collision detection. Incidently our algorithm was roughly

twice as fast as Bullet was in this special case.

6.1.2 Implementation

The algorithm was implemented under GNU/Linux in C++ based on the Jacobs Intelligent Robotics

Library and the libraries mentioned above. It ran on a PC with an Intel Core i7 920 CPU with

2.67 GHz, 8 MB cache and 6 GB memory (not all of which could be used, but it was not needed)

115

3D Roadmaps for Unmanned Aerial Vehicles on Planar Patch Map

Figure 6.1: REAL-WORLD EXAMPLE FOR AN EXPLORED VOLUME – A cross-section parallel to the

XY-plane, close to the floor. Black pixels correspond to voxels that have been explored by one or

more instances of the RRT on the Lab map.

116

6.2 Results

(a) Volume explored on patch map

(points added for illustration)

(b) Volume explored on point cloud

map (patches added for illustration)

plane map, but not on point cloud map;

green – explored on point cloud map,

but not on plane map; black – explored

on both maps

Figure 6.2: USING EXPLORED VOLUME TO COMPARE DIFFERENT MAP TYPES – Synthetic examples

under Ubuntu Linux 9.04 32-bit in a single thread. The program was started four times, once for each

map. First the map was read, than the path-planning algorithm was executed from scratch 96 times.

Time was measured with the POSIX method gettimeofday(). Times exclude loading the maps

and entering road-maps into the octree. While the latter is linear dependent on the number of nodes

and edges in the roadmap, the former varies depending on the type of map. For the lab maps, loading

the plane map took 6.0 s, the hybrid map took 6.8 s, the trimesh map took 21.3 s, the point cloud map

took 22.8 s.

6.1.3 Datasets

We used the Lab, Crashed Car Park, and Lesumsperrwerk datasets. For an extensive presentation,

please refer to chapter 3. These datasets were processed with the plane extraction algorithm from

[Poppinga et al., 2008b], which returned for each point cloud a list of planar patches and a remaining

point cloud of points which were found not to belong to planar areas. To get global poses,

the lists of planar patches were registered with the technique developed in [Pathak et al., 2010a,

Pathak et al., 2010b]. With these, maps in the different patch map frameworks were generated. The

“planes” maps use just the planar patches, for the “hybrid” maps, the remainder point clouds were

transformed to trimeshes and used alongside the planar patches, for the “trimesh” maps, the complete

point clouds were triangulated, and for the “pc” maps, just the complete point clouds were used. For

preliminary comparison, the actual file sizes of the maps and other properties can be found in table 6.1.

Aside from the geometrical features described above, the files contain uncertainty information (a Hessian

matrix) for each planar patch and color, string ID, transparency and group number for each patch.

As an example for the used maps, perspective views of the planes map of the Lab dataset can be found

in figure 6.3.

6.2 Results

6.2.1 Results of RRT

The path planner was run from four different starting locations (in the four corners of the room for

the lab dataset, spread across the site for the crashed car park dataset) with 24 different seeds for the

117

3D Roadmaps for Unmanned Aerial Vehicles on Planar Patch Map

Map type File size Patches Triangles

planar trimesh pc

hybrid 8.7 MB 5906 4142 – 345533

pc 54.0 MB – – 29 –

planes 3.0 MB 5906 0 – 86174

trimesh 106.4 MB 0 17273 – 5600612

Table 6.1: THE ACTUAL FILE SIZES OF THE DIFFERENT MAP TYPES OF THE LAB DATASET – The

median number of points in a polygon of a planar patch is 11. The median number of points in a point

cloud is 163,439.

(a) Perspective wire-frame view

(b) Perspective surface view

(d) Inside view with many details, including multiple levels

Figure 6.3: THE LAB MODEL THE ROADMAP EXPERIMENTS ARE RUN ON. – The robot location is

shown by a red circle at the height of the sensor. An X3D model and video files for this dataset can

be found at http://robotics.jacobs-university.de/projects/3Dmap.

118

6.2 Results

Map Volume [m 3 ] not in map [%]

planes hybrid trimesh pc

planes 9714.015 – 0.557 1.739 0.986

hybrid 9693.171 0.343 – 1.551 0.803

trimesh 9564.215 0.200 0.223 – 0.347

pc 9651.058 0.340 0.370 1.244 –

Table 6.2: EXPLORED VOLUME ON DIFFERENT MAPS FOR THE RRT ON THE LAB DATASET – The

four columns on the right give the volume that was explored on the map in the line, but not on the map

in the column as a percentage of the total volume explored on the map in the line (which is given in

the second column).

random number generator. Figure 6.4 shows how long the RRT algorithm took on the four different

map types in the lab dataset. The performance of the Crashed Car Park dataset is in figure 6.5.

The first thing to notice is that the trimesh and point cloud maps are more than an order of magnitude

slower than plane and hybrid maps. For lower numbers of iterations (as they would be used in

a real world planning scenario), exploration on the trimesh map is faster than on the point cloud map.

For higher numbers, we can see that the point cloud map offers higher speed and also slower growth

than the trimesh map. Also, the trimesh map shows a much higher variance in speed than any other

map. This is most likely due to it being over-restrictive (see below). But even the lower quartile is

still in the area of the PC map’s median. The fact the increase of time needed is less than linear can

be explained by the structure of the maps. For both datasets, a large majority of collision objects is

concentrated in the center of the map. Outside of that, there are only few collision objects. These typically

represent objects that are outside the mapped area, but visible under certain circumstances, e.g.

through a window or through a door left ajar. The planning starts inside the main structure. Initially,

there are many collision objects to check in each expansion. But when the RRT has nodes outside the

main structure on more and more sides, more and more expansions of the RRT can be done with few

or no collision objects to check.

For a useful number in a real world scenario of 500 iterations, RRT growing on both trimesh

and PC map needs well over 4 seconds. This allows for the generation of one roadmap on start-up.

If, however, frequent re-planning should be used, this run-time is too high. In contrast, on hybrid

and plane map 500 iterations take less then 0.4 seconds. This allows for frequent re-planning on a

powerful computer or infrequent planning on a slower one.

Our proposed method of using planes and polygons as an abstraction for a number of points

obviously decreases the number of collision objects. A speed advantage should hence surprise no

one. But we also have to address the question if our method is a valid simplification or a distortion

of the original sensor data. To this end we compare the explored volume of each method. Since it

was stored in an octree of identical dimensions for each map, we can subtract the explored volumes

from one another to see how much volume was covered on one map but not on another (congruence,

Table 6.2).

We can see the overall congruence is high. The biggest differences arise with the trimesh. In

the second column we see that the explored volume is lowest on the trimesh map. The fifth column

affirms the interpretation that the trimesh is over-restrictive: all maps allow for exploration of quite

some volume that is not explored on the trimesh map. This shows that the naive trimesh growing

algorithms presented in Section 6.1 in some instances closes gaps between points that should have

119

3D Roadmaps for Unmanned Aerial Vehicles on Planar Patch Map

Map Volume [m 3 ] not in map [%]

planes hybrid trimesh pc

planes 106379.8199 – 2.760 4.145 3.458

hybrid 106062.0166 2.468 – 3.739 3.156

trimesh 105153.1485 3.027 2.907 – 3.104

pc 106073.6549 3.179 3.167 3.945 –

Table 6.3: EXPLORED VOLUME ON DIFFERENT MAPS FOR THE RRT ON THE CRASHED CAR PARK

DATASET – The four columns on the right give the volume that was explored on the map in the line,

but not on the map in the column as a percentage of the total volume explored on the map in the line

(which is given in the second column).

been left open 1 . It is not surprising that naive trimesh growing performs sub-optimally, since a very

naive algorithm for generation and no simplification were used. The inclusion of the trimesh map in

the experiments is still justified to allow for a comparison of its speed as a gross approximation. The

small incorrectness in exploration can be assumed not to have a decisive impact on the speed.

Concerning the other maps, the incongruence is low enough to warrant calling the plane map

and even more the hybrid map a valid simplification of the original point cloud data for all practical

purposes. The small incorrectness can be compensated by low level obstacle avoidance.

When interpreting the percentages in Table 6.2, it has to be taken into account that they result

from a limited number of runs of a probabilistic algorithm. That means that even on identical maps,

a comparison of the explored volume calculated in two different ways based on different sets of

random seeds would yield small chance incongruences. This is evidenced by the percentage of volume

explored on the hybrid map which was not explored on the plane map (0.343%). Since the plane map

is a subset of the hybrid map, this percentage should be 0 ideally. To minimize this effect, in the

calculation of the explored volume, 120 random seeds were used instead of 24, totalling 480 RRT

grown on each map. The number of iterations K was 10,000 to ensure a high coverage.

The same standard has to be applied when evaluating the numbers for the Crashed Car Park dataset

in which can be found in Table 6.3. Although the plane map is a subset of the hybrid map and the

incongruence should be 0, it is 2.468%. By this measure, none of the other numbers are too high to

claim the proposed method of patch based mapping inaccurate. Inspection of the difference octree

revealed that the high numbers encountered in this map are explained by areas which are hard to reach

from the starting points yet highly complex. Specifically, this refers to the underside of a large pile of

rubble.

6.2.2 Results of PRM

The PRM was run with 24 different random seeds on the maps. The time taken can be found in

figure 6.7 for the lab dataset and in figure 6.8 for the Crashed Car Park dataset. Again it can be

noticed that the plane and hybrid maps are considerably faster than trimesh and PC maps. However,

this effect becomes less pronounced as more iterations are performed. This is caused by the decreasing

mean distance covered in collision checks as the PRM progresses. As vertices become denser, the

nearest neighbors are found ever shorter distances. Empirical data of the distances covered in collision

detection is depicted in figure 6.9. In this case they were collected on the hybrid map of the lab

1 The possibility that the other maps leave gaps open that should have been closed can be excluded since we are using a

map with a big loop: all parts of walls, ceiling and floor that were seen from afar (with big gaps between the points) have

also been seen from up close and the point clouds resulting from all sightings overlap.

120

6.2 Results

100

planes

hybrid

pc

trimesh

10

1

0.1

0.01

0 200 400 600 800 1000

1000

planes

hybrid

pc

trimesh

100

10

1

0.1

0 2000 4000 6000 8000 10000

Figure 6.4: TIME TAKEN BY THE RRT ALGORITHM ON THE FOUR DIFFERENT MAP TYPES ON THE

LAB DATASET. – The number of iterations (K in algorithm 13) is on the x-axis, time taken for one

RRT on the y-axis. The lines represent the median time for one run, the error bars extend to lower

and upper quartile. Please note the logarithmic scale on the y-axis. The RRT was grown from four

different starting locations with 24 different random seeds at each.

121

3D Roadmaps for Unmanned Aerial Vehicles on Planar Patch Map

10

pc

trimesh

hybrid

planes

1

0.1

0.01

0.001

0 200 400 600 800 1000

10

pc

trimesh

hybrid

planes

1

0.1

0.01

0 2000 4000 6000 8000 10000

Figure 6.5: TIME TAKEN BY THE RRT ALGORITHM ON THE FOUR DIFFERENT MAP TYPES ON THE

CRASHED CAR PARK DATASET. – The number of iterations (K in algorithm 13) is on the x-axis,

time taken for one RRT on the y-axis. The lines represent the median time for one run, the error bars

extend to lower and upper quartile. Please note the logarithmic scale on the y-axis. The RRT was

grown from four different starting locations with 24 different random seeds at each.

122

6.2 Results

Figure 6.6: A visualization of an RRT generated in 100 iterations in the Jacobs Robotics lab hybrid

map.

dataset with random seed 1. They are typical of all datasets, maps, and random seeds. The shorter the

distance covered in the collision check, the less likely it is that a collision object is encountered. The

fewer collision objects encountered, the less prominent the distances between the map types whose

differences are the type of collision object used.

The congruences are in tables 6.4 and 6.5. As with the RRT, the incongruences are higher on the

Crashed Car Park map. However, the overall congruence is high enough to call the simplification true

to the original data.

Map Volume [m 3 ] not in map [%]

planes hybrid trimesh pc

planes 11859.6544 – 0.160 0.737 0.387

hybrid 11841.7095 0.008 – 0.593 0.251

trimesh 11776.3023 0.035 0.041 – 0.056

pc 11818.7659 0.042 0.057 0.416 –

Table 6.4: EXPLORED VOLUME ON DIFFERENT MAPS FOR THE PRM ON THE LAB DATASET – The

four columns on the right give the volume that was explored on the map in the line, but not on the map

in the column as a percentage of the total volume explored on the map in the line (which is given in

the second column).

123

3D Roadmaps for Unmanned Aerial Vehicles on Planar Patch Map

100

pc

trimesh

hybrid

planes

10

1

0.1

0.01

0 200 400 600 800 1000

1000

pc

trimesh

hybrid

planes

100

10

1

0 2000 4000 6000 8000 10000

Figure 6.7: TIME TAKEN BY THE PRM ALGORITHM ON THE FOUR DIFFERENT MAP TYPES ON

THE LAB DATASET. – The number of iterations (K in algorithm 14) is on the x-axis, time taken for

one PRM on the y-axis. The lines represent the median time for one run, the error bars extend to

lower and upper quartile. Please note the logarithmic scale on the y-axis. The PRM was grown with

24 different random seeds.

124

6.2 Results

100

pc

trimesh

hybrid

planes

10

1

0.1

0.01

0 200 400 600 800 1000

1000

pc

trimesh

hybrid

planes

100

10

1

0.1

0 2000 4000 6000 8000 10000

Figure 6.8: TIME TAKEN BY THE PRM ALGORITHM ON THE FOUR DIFFERENT MAP TYPES ON

THE CRASHED CAR PARK DATASET. – The number of iterations (K in algorithm 14) is on the x-

axis, time taken for one PRM on the y-axis. The lines represent the median time for one run, the error

bars extend to lower and upper quartile. Please note the logarithmic scale on the y-axis. The PRM

was grown with 24 different random seeds.

125

3D Roadmaps for Unmanned Aerial Vehicles on Planar Patch Map

Collisioncheckdistance[mm]

Collisionchecknumber

Figure 6.9: COLLISION DETECTION DISTANCE FOR PRM – The graph shows distance checked in

collision detection over the course of 10,000 iterations with random seed 1 on the hybrid map on the

lab dataset. There are typically 6 collision checks per iteration. Please note the logarithmic scale on

the y-axis.

Map Volume [m 3 ] not in map [%]

planes hybrid trimesh pc

planes 115177.2955 – 0.134 0.332 0.261

hybrid 115091.7210 0.060 – 0.216 0.179

trimesh 114948.2374 0.134 0.091 – 0.173

pc 115064.0317 0.162 0.155 0.274 –

Table 6.5: EXPLORED VOLUME ON DIFFERENT MAPS FOR THE PRM ON THE CRASHED CAR PARK

DATASET – The four columns on the right give the volume that was explored on the map in the line,

but not on the map in the column as a percentage of the total volume explored on the map in the line

(which is given in the second column).

126

6.2 Results

Figure 6.10: WITHOUT USING A BOUNDING BOX, ROAD-MAPS MAY BE PLANNED OUTSIDE THE

VALID AREA – EVEN ABOVE WATER AS IN THIS CASE. – Yellow sphere (background) – starting

point, green sphere – goal point, edges are shown as two cylinders each, a more opaque one to highlight

graph structure and a more transparent one with the actual radius

6.2.3 Lesumsperrwerk dataset

To demonstrate the versatility of the roadmap algorithms on patch maps, we now present results on

data from another kind of sensor: a 3D sonar. This sensor covers a much smaller fraction of the

surrounding surfaces. This is due to the sonar’s small field of view (FOV), the small reflectivity of

river bed and water surface, and the fact that no complete panorama can be collected – for practical

purposes, a river is unbounded in two directions. Due to this limitation, instead of building a complete

roadmap, a path from a given starting point to a goal was searched. The algorithms were adapted such

that after each addition of a node, it was checked whether the goal was reachable from there directly.

If yes, the algorithm terminated with success.

Paths were searched for all combinations of three starting points and three end points with 24

different random seeds each. The starting and goal points were chosen such that a path had to be

planned from one of three positions upstream of the flood gate to one of three positions downstream

of the flood gate. Due to the data being incomplete, a bounding box around the actually collected data

had to be imposed to prevent the algorithms from finding invalid paths through areas where no data

was collected (figure 6.10 illustrates what can go wrong without a bounding box).

When planning a path from a starting point to a goal, the general direction in which the path

must be planned is clear. We therefore tried to speed up the process by biasing the distribution of

the random points towards the goal. Instead of a uniform distribution as usual, we used a normal

distribution around the goal. Each dimension was handled separately. The variance was the distance

of the goal to that border of the bounding box that is further away times a bias factor. Success rates

of experiments with this variant are in figure 6.11. It can be seen that smaller bias factors contribute

negatively to the success of the algorithm. Only when the factors get so large that they are irrelevant

does the success resemble that of the unbiased algorithm.

When the RRT is run on the incomplete data with a bounding box, from a number of starting

127

3D Roadmaps for Unmanned Aerial Vehicles on Planar Patch Map

1

pc

planes

1

pc

planes

0.8

fraction of successful runs

0.6

0.4

fraction of successful runs

0.6

0.4

0.2

0

0 20 40 60 80 100

iterations

(a) Bias factor 1,000

(b) Bias factor 4,000

1

pc

planes

1

pc

planes

0.8

fraction of successful runs

0.6

0.4

fraction of successful runs

0.6

0.4

0.2

0

0 20 40 60 80 100

iterations

(d) Bias factor 50,000

1

pc

planes

1

pc

planes

0.8

fraction of successful runs

0.6

0.4

fraction of successful runs

0.6

0.4

0.2

0

0 20 40 60 80 100

iterations

(e) Bias factor 200,000

0 20 40 60 80 100

iterations

(f) No bias

Figure 6.11: FRACTION OF RUNS WHERE THE GOAL WAS REACHED FOR RRT ON THE FIRST

TWO SCANS OF THE LESUMSPERRWERK DATASET – Radius was 5733mm, maximum step size was

28666mm.

128

6.2 Results

0.0001

pc

planes

0.001

pc

planes

time/node [s]

0.0001

1e-05

0 200 400 600 800 1000

iterations

(a) First scan, radius 2m, maximum step 4m

1e-05

0 200 400 600 800 1000

iterations

(b) First scan, radius 4m, maximum step 8m

0.01

pc

planes

0.01

pc

planes

time/node [s]

0.001

time/node [s]

0.001

0.0001

0 20 40 60 80 100

0.0001

0 20 40 60 80 100

iterations

28666mm

(d) Scan 8, radius 4m, maximum step 16m

Figure 6.12: TIME TAKEN FOR DIFFERENT PARAMETERS ON DIFFERENT SCANS OF THE

LESUMSPERRWERK DATA SET – With object oriented bounding boxes

points to a number of goal points, but not biased, the plane map still allows for faster planning. This

can be seen in figure 6.12 which contains results for maps generated from different scans from the

dataset and different radii. Regardless of these variations, the plane map is always faster.

129

130

3D Roadmaps for Unmanned Aerial Vehicles on Planar Patch Map

Chapter 7

Conclusion

Autonomous navigation is an important challenge in many application fields of mobile robotics. It

relieves the human operator of an essentially dull task and allows a gradual transition to the degree

of robot autonomy most suited to a particular task. This thesis made contributions to mobile robot

autonomous navigation at different levels from sensing to path planning.

In chapter 2, different methods for 3D sensing were discussed. These were 3D laser range-finders,

stereo vision, time-of-flight range cameras, and 3D sonars. In a case study, we evaluated a time-offlight

range camera in various situations. We developed a method to detect and, if desirable, heuristically

correct the most inhibitive error source. Range is detected via phase shift. Because of the way

phase shift is measured, values beyond 360° will be mapped back into the [0°, 360°[ interval. The

derived range is consequently also affected. The error is inherent not only to this model, but to all

range cameras measuring the time-of-flight in the same way. Our correction makes it possible to use

the range camera on a mobile robot for high-frequency 3D sensing.

In chapter 3, we presented several 3D point cloud datasets gathered with the time-of-flight camera

presented in the preceding chapter and other 3D sensors: an actuated laser range-finder, a stereo

camera, and a 3D sonar. The datasets represent a wide variety of real-world and set-up scenes, indoors

and outdoors, underwater and land, and small scale and large scale. These datasets were used in the

experiments in the following chapters.

In chapter 4, we presented a robust short-range obstacle detection algorithm that runs fast enough

to actually make use of the range camera’s high update frequency. It is based on the Hough transform

for planes in 3D point clouds. The bins of the discritized Hough space were relatively coarse which

turned out to be enough for testing for drivability. The method allows a mobile robot to reliably detect

the drivability of the terrain it faces. With the same method and finer bins it is possible to classifiy the

terrain, albeit not as reliably as checking drivability. Experiments with two types of sensors on data

from indoors and outdoors demonstrated the algorithm’s performance. The processing time typically

lies between 5 and 50 ms. This is enough for real-time processing on a robot moving at reasonable

speed.

In chapter 5, we developed the Patch Map data-structure for memory efficient 3D mapping based

on planar surfaces extracted from 3D point cloud data. It is flexible enough to allow for different kind

of surface representations, different methods of collision detection, and different kinds of roadmap

algorithms. Surface representations are planar polygons and trimeshes. The method is extendable to

other surface representations like quadrics or collision detection library primitives. We surveyed and

benchmarked different methods of generating planar patches from a point cloud segmented into planar

regions. The α-shape algorithm provided the best results. We also implemented a point cloud based

131

Conclusion

variant of the map data-structure that allows for comparison with this standard data-structure. Collision

detection was implemented in a way to exploit the fact that planar patches are polygons and also

based on two external collision detection libraries, Bullet and OPCODE. The implemented roadmap

algorithms were Rapidly-exploring Random Tree (RRT), Probabilistic Roadmap Method (PRM) and

variants of these, most notably variants of RRT and PRM that place vertices on the medial axes of

the map without explicitly computing it. We benchmarked all collision detection methods with all

roadmap algorithms on synthetic data to find out the most efficient ones. The test scenario was finding

paths from a starting point to four regions of interest. RRT was the best roadmap algorithm. The

tested criteria were time, path length and path smoothness Bullet was the fastest collision detection

method. Thus, the Patch map data-structure showed its flexibility and usability in practice.

In chapter 6, we thoroughly tested the Patch Map data-structure developed in the preceding chapter

on real-world data to investigate size reduction, performance and fidelity. In terms of size, the Patch

Map consisting only of planar patches was 18 times smaller than the point clouds it was based on. We

performed both roadmap generation with PRM and RRT and way finding from start to goal, based on

RRT. We compared our approach to the established methods of trimeshes and point clouds and found

that it performs an order of magnitude faster on 3D LRF data and also considerably better on sonar

data. We also showed that this speed advantage does not come at the cost of loss of precision. To

this end, we compared the total explorable space on the different map representations and found they

only marginally differ. In summary, the Patch Map has proven to be a viable alternative to standard

methods that is much more economical with memory.

132

Appendix A

Addenda & Errata

These addenda have not been reviewed by the referees.

• Section 2.5.5, Identification of erroneous pixels: It is interesting to note that in active sensing,

three quadratic relations impact the number of photons per pixels (the amplitude). Let r be

the distance to an obstacle. The number of photons from the illumination unit per fixed area

on an obstacle is inversely proportionate to r 2 . We assume the pin-hole camera-model Since

the obstacle typically reflects diffusely, the number photons received from a fixed area on an

obstacle in a fixed area (i.e. a pixel) is also inversely proportionate to r 2 . But then again, the

area mapped to a pixel is directly proportionate to r 2 . The last two effects cancel each other

out, a simple inverse proportionality to r 2 remains.

• Section 3.4: In a point cloud with equally distributed points filling a sphere, the value for

∅˜r /max r would be 2 − 1/3

or roughly 0.8.

• Section 5.1: An advantage of using a canonical coordinate system is smaller memory size: It

only needs four floating point numbers (three for n, one for d) as opposed to six or seven for

a full transform (three for the translational part and three or four for rotational part, depending

on whether full traditional form is stored (e.g. unit quaternion or axis and angle) or compressed

(unit quaternion has only three DOF, the axis can be stored as a unit vector which has only

two). However, this advantage vanishes once we consider that the vertices need to be stored as

well, each requiring two further floating point numbers. Assuming only ten vertices, memory

requirements compare as 10˙2 + 4 = 24 for a canonical coordinate system versus 10˙2 + 7 = 27

for a full transform. The latter is only 11% more. However, most planar patches have a lot more

vertices such that the difference becomes negligible.

• Section 5.3.2, Generation of Planar Patches: One step towards fewer intersection using late

projection with ARLF scans is this: In the range image, the column containing the ±90°-points

should be invalidated (with the possible exception of the central pixel).

• Section 5.3.2, Generation of Planar Patches: Another important source for intersections being

generated by late projection on ALRF is that the LRF is not identical with the rotation origin.

An example for a unsimplyfiable outline can be found in figure A.1.

• Section 5.4.2, BulletPatchMap: A possible alternative to triangulating planar patches is decomposing

them into convex polygons. CGAL provides functions for that and convex polygons are

133

Addenda & Errata

(a) Outline on the

range image

(b) Outline projected to

optimal region, cropped

to bounding box of projected

points

Figure A.1: UNSIMPLIFYABLE OUTLINE

a primitive of Bullet. For a given planar patch, the number of convex polygons would obviously

be much lower than the number of triangles. Theoretically, this would imply a speed up.

However, it is not clear whether this speed-up would really materialize, since triangle meshes

are much more commonly used than sets of convex polygons, so there will be quite some optimizations

aimed at the former case.

134

References

For the readers’ (and the author’s) convenience, hyperlinks to electronic versions of most references

are supplied with the electronic version of this thesis. The anchor is =⇒.

[Amato et al., 1998] Amato, N., Bayazit, O. B., Dale, L., Jones, H., and Vallejo, D. (1998). OBPRM:

An Obstacle-Based PRM for 3D Workspaces. In Agarwal, P., Kavraki, L., and Mason, M., editors,

Robotics : The Algorithmic Perspective. The Third Workshop on the Algorithmic Foundations of

Robotics, pages 156–168. A.K. Peters.

[Andert and Adolf, 2009] Andert, F. and Adolf, F. (2009). Online world modeling and path planning

for an unmanned helicopter. Autonomous Robots. =⇒.

[Aronov et al., 2004] Aronov, B., Asano, T., Katoh, N., Mehlhorn, K., and Tokuyama, T. (2004).

Polyline fitting of planar points under min-sum criteria. In Fleischer, R. and Trippen, G., editors,

Algorithms and Computation: 15th International Symposium ISAAC 2004, volume 3341 of LNCS,

pages 77–88. Springer.

[B. Buettgen et al., 2005] B. Buettgen et al. (2005). CCD/CMOS lock-in pixel for range imaging:

Challenges, limitations and state-of-the-art. ETH 1st RIM Days.

[Ballard, 1981] Ballard, D. H. (1981). Generalizing the hough transform to detect arbitrary shapes.

Pattern Recognition, 13(3):111–122.

[Baraff, 1992] Baraff, D. (1992). Dynamic Simulation of Non-Penetrating Rigid Bodies. PhD thesis,

Computer Science Department, Cornell University.

[Barate and Manzanera, 2008] Barate, R. and Manzanera, A. (2008). Automatic Design of Vision-

Based Obstacle Avoidance Controllers Using Genetic Programming, volume 4926/2008, pages

25–36. Springer Berlin / Heidelberg. =⇒.

[Besl and McKay, 1992] Besl, P. J. and McKay, N. D. (1992). A method for registration of 3-d

shapes. IEEE Trans. on Pattern Analysis and Machine Intelligence, 14(2):239–256.

[Birk et al., 2007a] Birk, A., Pathak, K., Poppinga, J., Schwertfeger, S., and Chonnaparamutt, W.

(2007a). Intelligent Behaviors in Outdoor Environments. In 13th International Conference

on Robotics and Applications, Special Session on Outdoor Robotics - Taking Robots off road.

IASTED.

[Birk et al., 2007b] Birk, A., Pathak, K., Poppinga, J., Schwertfeger, S., Pfingsthorn, M., and Blow,

H. (2007b). The Jacobs Test Arena for Safety, Security, and Rescue Robotics (SSRR). In WS on

Performance Evaluation and Benchmarking for Intelligent Robots and Systems, Intern. Conf. on

Intelligent Robots and Systems (IROS). IEEE Press.

135

[Birk et al., 2010] Birk, A., Pathak, K., Vaskevicius, N., Pfingsthorn, M., Poppinga, J., and Schwertfeger,

S. (2010). Surface Representations for 3D Mapping: A Case for a Paradigm Shift. KI -

German Journal on Artificial Intelligence.

[Birk et al., 2009a] Birk, A., Poppinga, J., and Pfingsthorn, M. (2009a). Using different Humanoid

Robots for Science Edutainment of Secondary School Pupils. In Iocchi, L., Matsubara, H.,

Weitzenfeld, A., and Zhou, C., editors, RoboCup 2008: Robot WorldCup XII, Lecture Notes in

Artificial Intelligence (LNAI). Springer.

[Birk et al., 2009b] Birk, A., Poppinga, J., Stoyanov, T., and Nevatia, Y. (2009b). Planetary Exploration

in USARsim: A Case Study including Real World Data from Mars. In Iocchi, L., Matsubara,

H., Weitzenfeld, A., and Zhou, C., editors, RoboCup 2008: Robot WorldCup XII, Lecture Notes in

Artificial Intelligence (LNAI). Springer.

[Birk et al., 2008] Birk, A., Stoyanov, T., Nevatia, Y., Ambrus, R., Poppinga, J., and Pathak, K.

(2008). Terrain Classification for Autonomous Robot Mobility: from Safety, Security Rescue

Robotics to Planetary Exploration. In Planetary Rovers Workshop, International Conference on

Robotics and Automation (ICRA). IEEE.

[Birk et al., 2009c] Birk, A., Vaskevicius, N., Pathak, K., Schwertfeger, S., Poppinga, J., and Buelow,

H. (2009c). 3-D Perception and Modeling: Motion-Level Teleoperation and Intelligent Autonomous

Functions. IEEE Robotics and Automation Magazine (RAM), December.

[Blanco et al., 2007] Blanco, J.-L., González, J., and Fernndez-Madrigal, J.-A. (2007). Extending obstacle

avoidance methods through multiple parameter-space transformations. Autonomous Robots,

24(1):29–48. =⇒.

[Borenstein and Koren, 1989] Borenstein, J. and Koren, Y. (1989). Real-time obstacle avoidance for

fact mobile robots. Systems, Man and Cybernetics, IEEE Transactions on, 19(5):1179–1187. =⇒.

[Bouabdallah et al., 2007] Bouabdallah, S., Becker, M., and Perrot, V. D. (2007). Computer obstacle

avoidance on quadrotors. In Proceedings of the XII International Symposium on Dynamic Problems

of Mechanics. =⇒.

[Branicky et al., 2001] Branicky, M. S., Lavalle, S. M., Olson, K., and Yang, L. (2001). Quasirandomized

path planning. In Robotics and Automation, 2001. Proceedings 2001 ICRA. IEEE

International Conference on, volume 2, pages 1481–1487. =⇒.

[Cacciola, 2008] Cacciola, F. (2008). 2d straight skeleton and polygon offsetting. In Board, C. E.,

editor, CGAL User and Reference Manual. 3.4 edition.

[Chao et al., 2009] Chao, C.-H., Hsueh, B.-Y., Hsiao, M.-Y., Tsai, S.-H., and Li, T.-H. S. (2009).

Fuzzy target tracking and obstacle avoidance of mobile robots with a stereo vision system. International

Journal of Fuzzy Systems, 11(3). =⇒.

[Charnley and Blissett, 1989] Charnley, D. and Blissett, R. (1989). Surface reconstruction from outdoor

image sequences. Image Vision Comput., 7(1):10–16. =⇒.

[Chen and Tsai, 2000] Chen, K.-H. and Tsai, W.-H. (2000). Vision-based obstacle detection and

avoidance for autonomous land vehicle navigation in outdoor roads. Automation in Construction,

10:1–25. =⇒.

136

[Chen et al., 2006] Chen, Y., Wu, K.-Q., Wang, D., and Chen, G. (2006). Elastic Algorithm: A New

Path Planning Algorithm About Auto-navigation in 3D Virtual Scene, pages 1156–1165. Springer

Berlin / Heidelberg. =⇒.

[Cole and Newman, 2006] Cole, D. and Newman, P. (2006). Using laser range data for 3D SLAM

in outdoor environments. In Robotics and Automation, 2006. ICRA 2006. Proceedings 2006 IEEE

International Conference on, pages 1556–1563. =⇒.

[Coumans, 2009] Coumans, E. (2009). Physics Simulation Forum. http://bulletphysics.

com/. retrieved on September 1st, 2009.

[Coumans, 2010] Coumans, E. (2010). Physics simulation forum – do these limitations still apply?

http://bulletphysics.org/Bullet/phpBB3/viewtopic.php?f=9&t=3413, last accessed May 6th, 2010.

[Courbon et al., 2009] Courbon, J., Mezouar, Y., Guenard, N., and Martinet, P. (2009). Visual navigation

of a quadrotor aerial vehicle. In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ

International Conference on, pages 5315–5320. =⇒.

[CSEM, 2006] CSEM (2006). The SwissRanger, Manual V1.02. 8048 Zurich, Switzerland.

[Da, 2008] Da, T. K. F. (2008). 2d alpha shapes. In Board, C. E., editor, CGAL User and Reference

Manual. 3.4 edition.

[Douglas and Peucker, 1973] Douglas, D. H. and Peucker, T. K. (1973). Algorithms for the reduction

of number of points required to represent a line or its caricature. The Canadian Carthographer,

10(2):112–122.

[Edelsbrunner and Mücke, 1994] Edelsbrunner, H. and Mücke, E. (1994). Three-dimensional alpha

shapes. ACM Transactions on Graphics, 3(1):43–72. =⇒.

[Fainekos et al., 2005] Fainekos, G., Kress-Gazit, H., and Pappas, G. (2005). Temporal logic motion

planning for mobile robots. In Robotics and Automation, 2005. ICRA 2005. Proceedings of the

2005 IEEE International Conference on, pages 2020–2025. =⇒.

[Fuchs and Hirzinger, 2008] Fuchs, S. and Hirzinger, G. (2008). Extrinsic and depth calibration of

tof-cameras. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference

on, pages 1–6. =⇒.

[Fuchs and May, 2007] Fuchs, S. and May, S. (2007). Calibration and registration for precise surface

reconstruction with ToF cameras. In Proceedings of the Dynamic 3D Imaging Workshop in

Conjunction with DAGM (Dyn3D), volume I, Heidelberg, Germany. =⇒.

[Gao et al., 2006] Gao, J., Chen, X., Zheng, D., Yilmaz, O., and Gindy, N. (2006). Adaptive restoration

of complex geometry parts through reverse engineering application. Advances in Engineering

Software, 37:592–600. =⇒.

[Giezeman and Wesselink, 2008] Giezeman, G.-J. and Wesselink, W. (2008). 2d polygons. In Board,

C. E., editor, CGAL User and Reference Manual. 3.4 edition.

[Goodrich, 1995] Goodrich, M. T. (1995). Efficient piecewise-linear function approximation using

the uniform matrix. Discrete Comput Geom, (14):445–462.

137

[Gottschalk, 1997] Gottschalk, S. (1997). RAPID – Robust and Accurate Polygon Interference Detection.

http://www.cs.unc.edu/˜geom/OBB/OBBT.html. retrieved on July 6th, 2009.

[Gottschalk, 1998] Gottschalk, S. (1998). RAPID 2.01. http://www.cs.unc.edu/˜geom/

OBB/request.html. retrieved on July 6th, 2009.

[Guibas et al., 1991] Guibas, L. J., Hershberger, J. E., Mitchell, J. S. B., and Snoeyink, J. S. (1991).

Approximating polygons and subdivisions with minimum link paths. In ISAAC: 2nd International

Symposium on Algorithms and Computation (formerly SIGAL International Symposium on Algorithms),

Organized by Special Interest Group on Algorithms (SIGAL) of the Information Processing

Society of Japan (IPSJ) and the Technical Group on Theoretical Foundation of Computing of the

Institute of Electronics, Information and Communication Engineers (IEICE)).

[Gut, 2004] Gut, O. (2004). Untersuchungen des 3D-sensors swissranger. Master’s thesis, =⇒.

[Haddad et al., 1998] Haddad, H., Khatib, M., Lacroix, S., and Chatila, R. (1998). Reactive navigation

in outdoor environments using potential fields. In Robotics and Automation, 1998. Proceedings.

1998 IEEE International Conference on, volume 2, pages 1232–1237. =⇒.

[Hadsell et al., 2009] Hadsell, R., Sermanet, P., Scoffier, M., Erkan, A., Kavackuoglu, K., Muller, U.,

and LeCun, Y. (2009). Learning long-range vision for autonomous off-road driving. Journal of

Field Robotics, 26(2):120–144. =⇒.

[Hähnel et al., 2003] Hähnel, D., Burgard, W., and Thrun, S. (2003). Learning Compact 3D Models

of Indoor and Outdoor Environments with a Mobile Robot. Robotics and Autonomous Systems,

44(1):15–27. =⇒.

[Hajebi and Zelek, 2007] Hajebi, K. and Zelek, J. (2007). Dense surface from infrared stereo. In

Applications of Computer Vision, 2007. WACV ’07. IEEE Workshop on, pages 21–26.

[Harris and Pike, 1988] Harris, C. G. and Pike, J. M. (1988). 3D positional integration from image

sequences. Image Vision Comput., 6(2):87–90.

[Hart et al., 1968] Hart, P., Nilsson, N., and Raphael, B. (1968). A formal basis for the heuristic

determination of minimum cost paths. Systems Science and Cybernetics, IEEE Transactions on,

4(2):100 –107. =⇒.

[Hasircioglu et al., 2008] Hasircioglu, I., Topcuoglu, H. R., and Ermis, M. (2008). 3-d path planning

for the navigation of unmanned aerial vehicles by using evolutionary algorithms. In GECCO ’08:

Proceedings of the 10th annual conference on Genetic and evolutionary computation, pages 1499–

1506, New York, NY, USA. ACM. =⇒.

[Heckbert and Garland, 1997] Heckbert, P. S. and Garland, M. (1997). Survey of polygonal surface

simplification algorithms. Technical report, Pittsburgh.

[Hershberger and Snoeyink, 1997] Hershberger, J. and Snoeyink, J. (1997). Cartographic Line Simplification

and polygon CSG formulae in O(n log*n) time. Springer Berlin/Heidelberg.

[Hert and Schirra, 2008] Hert, S. and Schirra, S. (2008). 2d convex hulls and extreme points. In

Board, C. E., editor, CGAL User and Reference Manual. 3.4 edition.

138

[Ho et al., 2001] Ho, S., Sarma, S., and Adachi, Y. (2001). Real-time interference analysis between

a tool and an environment. Computer-Aided Design, 33(13):935 – 947. =⇒.

[Howard et al., 2004] Howard, A., Wolf, D. F., and Sukhatme, G. S. (2004). Towards 3D Mapping in

Large Urban Environments. In Proceedings of the IEEE/RSJ International Conference on Intelligent

Robots and Systems (IROS). Sendai, Japan.

[Hrabar, 2008] Hrabar, S. (2008). 3D path planning and stereo-based obstacle avoidance for rotorcraft

UAVs. In Intelligent Robots and Systems, 2008. IROS 2008. IEEE/RSJ International Conference

on, pages 807–814. =⇒.

[Hwang and Chang, 2007] Hwang, C.-L. and Chang, L.-J. (2007). Trajectory tracking and obstacle

avoidance of car-like mobile robots in an intelligent space using mixed h2/h decentralized control.

Mechatronics, IEEE/ASME Transactions on, 12(3):345–352. =⇒.

[Hwangbo et al., 2007] Hwangbo, M., Kuffner, J., and Kanade, T. (2007). Efficient two-phase 3D

motion planning for small fixed-wing UAVs. In Robotics and Automation, 2007 IEEE International

Conference on, pages 1035–1041. =⇒.

[Imai and Iri, 1988] Imai, H. and Iri, M. (1988). Polygonal approximations of a curve – formulations

and algorithms. In Toussaint, G. t., editor, Computational Morphology, pages 71–86. Elsevier

Science Publishers.

[Jun and D’Andrea, 2003] Jun, M. and D’Andrea, R. (2003). Path planning for unmanned aerial

vehicles in uncertain and adversarial environments. In Butenko, S., Murpheym, R., and Pardalos,

P., editors, In Cooperative Control: Models, Applications and Algorithms. Kluwer Academic

Publishers. =⇒.

[Kagami et al., 2003] Kagami, S., Kuffner, J. J., Nishiwaki, K., Okada, K., Inoue, H., and Inaba,

M. (2003). Humanoid arm motion planning using stereo vision and RRT search. In Intelligent

Robots and Systems, 2003. (IROS 2003). Proceedings. 2003 IEEE/RSJ International Conference

on, volume 3, pages 2167–2172, Washington, DC, USA. IEEE Computer Society. =⇒.

[Kavraki et al., 1996] Kavraki, L. E., Svestka, P., Latombe, J.-C., and Overmars, M. (1996). Probabilistic

roadmaps for path planning in high dimensional configuration spaces. IEEE Trans. on

Robotics and Automation, 12:566–580.

[Kelly et al., 2006] Kelly, A., Amidi, O., Bode, M., Happold, M., Herman, H., Pilarski, T., Rander, P.,

Stentz, A., Vallidis, N., , and Warner, R. (2006). Toward Reliable Off Road Autonomous Vehicles

Operating in Challenging Environments, volume 21/2006 of Springer Tracts in Advanced Robotics,

pages 599–608. Springer Berlin / Heidelberg. =⇒.

[Keogh et al., 2003] Keogh, E., Chu, S., Hart, D., and Pazzani, M. (2003). Segmenting time series:

A survey and novel approach. In Data Mining In Time Series Databases, chapter 1, pages 1–22.

World Scientific Publishing Company.

[Kim and Khosla, 1991] Kim, J.-O. and Khosla, P. (1991). Real-time obstacle avoidance using harmonic

potential functions. In Robotics and Automation, 1991. Proceedings., 1991 IEEE International

Conference on, pages 790–796 vol.1. =⇒.

[Klein and Zachmann, 2004] Klein, J. and Zachmann, G. (2004). Point cloud collision detection.

Computer Graphics Forum, 23(3). =⇒.

139

[Konolige, 1999] Konolige, K. (1999). Stereo geometry. , last visited July 22nd, 2010.

[Konolige and Beymer, 2006] Konolige, K. and Beymer, D. (2006). SRI Small vision system, user’s

manual, software version 4.2.

[Koyuncu and Inalhan, 2008] Koyuncu, E. and Inalhan, G. (2008). A probabilistic b-spline motion

planning algorithm for unmanned helicopters flying in dense 3d environments. In Intelligent Robots

and Systems, 2008. IROS 2008. IEEE/RSJ International Conference on, pages 815–821.

[Kuffner and LaValle, 2000] Kuffner, J.J., J. and LaValle, S. (2000). RRT-connect: An efficient approach

to single-query path planning. In Robotics and Automation, 2000. Proceedings. ICRA ’00.

IEEE International Conference on, volume 2, pages 995–1001. =⇒.

[Lange, 2000] Lange, R. (2000). 3D time-of-flight distance measurement with custom solid-state

image sensors in CMOS/CCD-technology. PhD thesis, Department Of Electrical Engineering And

Computer Science. =⇒

[Lapierre et al., 2007] Lapierre, L., Zapata, R., and Lepinay, P. (2007). Combined Path-following and

Obstacle Avoidance Control of a Wheeled Robot. The International Journal of Robotics Research,

26(4):361–375. =⇒.

[Larson et al., 2006] Larson, J., Bruch, M., and Ebken, J. (2006). Autonomous navigation and obstacle

avoidance for unmanned surface vehicles. In SPIE Proc. 6230: Unmanned Systems Technology

VIII, pages 17–20. =⇒.

[LaValle, 1998] LaValle, S. (1998). Rapidly-Exploring Random Trees. Computer Science Department

Iowa State University, October.

[LaValle and Kufner, 2001] LaValle, S. and Kufner, J. (2001). Rapidly-Exploring Random Trees:

Progress and Prospects. In Donald, B., Lynch, K., and Rus, D., editors, Algorithmic and Computational

Robotics: New Directions, pages 45–59. A.K. Peters.

[Lavalle and Kuffner, 2000] Lavalle, S. M. and Kuffner, J. J. (2000). Rapidly-exploring random trees:

Progress and prospects. In Algorithmic and Computational Robotics: New Directions, pages 293–

308. =⇒.

[LeCun et al., 2005] LeCun, Y., Muller, U., Ben, J., Cosatto, E., and Flepp, B. (2005). Off-road

obstacle avoidance through end-to-end learning. In Advances in Neural Information Processing

Systems (NIPS 2005). MIT Press. =⇒.

[Lewis, 2002] Lewis, M. (2002). Detecting surface features during locomotion using optic flow.

In Robotics and Automation, 2002. Proceedings. ICRA ’02. IEEE International Conference on,

volume 1, pages 305–310.

[Lozano-Perez, 1983] Lozano-Perez, T. (1983). Spatial planning: A configuration space approach.

Computers, IEEE Transactions on, C-32(2):108–120. =⇒.

[Matthies et al., 1995] Matthies, L., Kelly, A., Litwin, T., and Tharp, G. (1995). Obstacle detection

for unmanned ground vehicles: a progress report. In Proceedings of IEEE Intelligent Vehicles ‘95

Conference, pages 66 – 71. =⇒.

140

[Matthies et al., 2002] Matthies, L., Xiong, Y., Hogg, R., Zhu, D., Rankin, A., Kennedy, B., Hebert,

M., Maclachlan, R., Won, C., Frost, T., Sukhatme, G., McHenry, M., and Goldberg, S. (2002).

A portable, autonomous, urban reconnaissance robot. Robotics and Autonomous Systems, 40(2-

3):163 – 172. =⇒.

[May et al., 2009] May, S., Droeschel, D., Holz, D., Fuchs, S., Malis, E., Nüchter, A., and Hertzberg,

J. (2009). Three-dimensional mapping with time-of-flight cameras. Journal of Field Robotics,

26(11-12):934–965. =⇒.

[May et al., 2006] May, S., Werner, B., Surmann, H., and Pervolz, K. (2006). 3D time-of-flight

cameras for mobile robotics. In Intelligent Robots and Systems, 2006 IEEE/RSJ International

Conference on, pages 790–795. =⇒.

[Mayer et al., 2002] Mayer, L., Li, Y., and Melvin, G. (2002). 3D visualization for pelagic fisheries

research and assessment. ICES J. Mar. Sci., 59(1):216–225. =⇒.

[Meister et al., 2009] Meister, O., Frietsch, N., Ascher, C., and Trommer, G. (2009). Adaptive path

planning for VTOL-UAVs. Aerospace and Electronic Systems Magazine, IEEE, 24(7):36 –41. =⇒.

[MESA Imaging AG, 2006] MESA Imaging AG (2006).

MESA Imaging AG, Zürich, Switzerland.

SwissRanger SR-3000, Manual V1.02.

[Michel et al., 2008] Michel, P., Chestnutt, J., Kagami, S., Nishiwaki, K., Kuffner, J., and Kanade,

T. (2008). GPU-accelerated real-time 3D tracking for humanoid autonomy. In Proceedings of the

JSME Robotics and Mechatronics Conference (ROBOMEC’08). =⇒.

[Michels et al., 2005] Michels, J., Saxena, A., and Ng, A. Y. (2005). High speed obstacle avoidance

using monocular vision and reinforcement learning. In ICML ’05: Proceedings of the 22nd

international conference on Machine learning, pages 593–600, New York, NY, USA. ACM. =⇒.

[Mihailidis et al., 2007] Mihailidis, A., Elinas, P., Boger, J., and Hoey, J. (2007). An intelligent powered

wheelchair to enable mobility of cognitively impaired older adults: An anticollision system.

Neural Systems and Rehabilitation Engineering, IEEE Transactions on, 15(1):136–143. =⇒.

[Milroy et al., 1996] Milroy, M. J., Weir, D. J., Bradley, C., and Vickers, G. W. (1996). Reverse

engineering employing a 3d laser scanner: A case study. International Journal of Advanced Manufacturing

Technology, 12(2):111–121. =⇒.

[Moravec, 1980] Moravec, H. (1980). Obstacle avoidance and navigation in the real world by a

seeing robot rover. Technical report, Stanford University. Available as Stanford AIM-340, CS-

80-813 and republished as a Carnegie Mellon University Robotics Institue Technical Report to

increase availability, =⇒

[Mulligan et al., 2002] Mulligan, J., Isler, V., and Daniilidis, K. (2002). Trinocular stereo: A realtime

algorithm and its evaluation. International Journal of Computer Vision, 47:51–61. =⇒.

[Murphy et al., 2001] Murphy, R. R., Casper, J., and Micire, M. (2001). Potential Tasks and Research

Issues for Mobile Robots in RoboCup Rescue. In Stone, P., Balch, T., and Kraetszchmar, G.,

editors, RoboCup-2000: Robot Soccer World Cup IV, volume 2019 of Lecture notes in Artificial

Intelligence (LNAI), pages 339–334. Springer Verlag. =⇒.

141

[Mustafa, 2004] Mustafa, N. H. (2004). Simplification, Estimation and Classification of Geometric

Objects. PhD thesis, Department of Computer Science, Duke University.

[Newman et al., 2006] Newman, P., Cole, D., and Ho, K. (2006). Outdoor slam using visual appearance

and laser ranging. In Robotics and Automation, 2006. ICRA 2006. Proceedings 2006 IEEE

International Conference on, pages 1180–1187. =⇒.

[Nüchter et al., 2004] Nüchter, A., Surmann, H., Lingemann, K., Hertzberg, J., and Thrun, S. (2004).

6d slam with an application in autonomous mine mapping. In Robotics and Automation, 2004.

Proceedings. ICRA ’04. 2004 IEEE International Conference on, volume 2, pages 1998 – 2003.

=⇒.

[Oggier et al., 2003] Oggier, T., Lehmann, M., Kaufmann, R., Schweizer, M., Richter, M., Metzler,

P., Lang, G., Lustenberger, F., and Blanc, N. (2003). An all-solid-state optical range camera for

3D real-time imaging with sub-centimeter depth resolution (swissranger). In Proceedings of SPIE,

volume SPIE-5249, pages 534–545. =⇒.

[Ogren and Leonard, 2005] Ogren, P. and Leonard, N. (2005). A convergent dynamic window approach

to obstacle avoidance. Robotics, IEEE Transactions on, 21(2):188–195. =⇒.

[Ohya et al., 1998] Ohya, A., Kosaka, A., and Kak, A. (1998). Vision-based navigation by a mobile

robot with obstacle avoidance using single-camera vision and ultrasonic sensing. IEEE Transactions

on Robotics and Automation. =⇒.

[Okada et al., 2001] Okada, K., Kagami, S., Inaba, M., and Inoue, H. (2001). Plane segment finder:

Algorithm, implementation and applications. In Robotics and Automation, 2001. Proceedings 2001

ICRA. IEEE International Conference on, volume 2, pages 2120–2125, Washington, DC, USA.

IEEE Computer Society. =⇒.

[Oliensis, 2000] Oliensis, J. (2000). A critique of structure-from-motion algorithms. Computer Vision

and Image Understanding, 80(2):172 – 214. =⇒.

[Overmars and Geraerts, 2007] Overmars, M. H. and Geraerts, R. (2007). The corridor map method:

a general framework for real-time high-quality path planning. Computer Animation and Virtual

Worlds, 18:107–119(13). =⇒.

[Parasuraman et al., 2000] Parasuraman, R., Sheridan, T. B., and Wickens, C. D. (2000). A model for

types and levels of human interaction with automation. IEEE Transactions on Systems, Man, and

Cybernetics, Part A: Systems and Humans, 30(3):286–297. =⇒.

[Pathak et al., 2008] Pathak, K., Birk, A., and Poppinga, J. (2008). Subpixel Depth Accuracy with

a Time of Flight Sensor using Multimodal Gaussian Analysis. In International Conference on

Intelligent Robots and Systems (IROS), Nice, France. IEEE Press.

[Pathak et al., 2007a] Pathak, K., Birk, A., Poppinga, J., and Schwertfeger, S. (2007a). 3d forward

sensor modeling and application to occupancy grid based sensor fusion. In IEEE/RSJ Int. Conf. on

Intelligent Robots and Systems, San Diego.

[Pathak et al., 2007b] Pathak, K., Birk, A., Schwertfeger, S., and Poppinga, J. (2007b). 3D Forward

Sensor Modeling and Application to Occupancy Grid Based Sensor Fusion. In International Conference

on Intelligent Robots and Systems (IROS), pages 2059 – 2064, San Diego, USA. IEEE

Press.

142

[Pathak et al., 2010a] Pathak, K., Birk, A., Vaskevicius, N., Pfingsthorn, M., Schwertfeger, S., and

Poppinga, J. (2010a). Online 3D SLAM by Registration of Large Planar Surface Segments and

Closed Form Pose-Graph Relaxation. Journal of Field Robotics, Special Issue on 3D Mapping,

27(1):52–84.

[Pathak et al., 2010b] Pathak, K., Birk, A., Vaškevičius, N., and Poppinga, J. (2010b). Fast registration

based on noisy planes with unknown correspondences for 3-d mapping. Robotics, IEEE

Transactions on, 26(3):424 –441. =⇒.

[Pathak et al., 2009] Pathak, K., Vaskevicius, N., Poppinga, J., Pfingsthorn, M., Schwertfeger, S., and

Birk, A. (2009). Fast 3D Mapping by Matching Planes Extracted from Range Sensor Point-Clouds.

In International Conference on Intelligent Robots and Systems (IROS). IEEE Press.

[Pellenz, 2007] Pellenz, J. (2007). Rescue robot sensor design: An active sensing approach. In Proceedings

of the Fourth International Workshop on Synthetic Simulation and Robotics to Mitigate

Earthquake Disaster SRMED 2007. =⇒.

[Petitjean, 2002] Petitjean, S. (2002). A survey of methods for recovering quadrics in triangle meshes.

ACM Comput. Surv., 34(2):211–262.

[Petres et al., 2007] Petres, C., Pailhas, Y., Patron, P., Petillot, Y., Evans, J., and Lane, D. (2007). Path

planning for autonomous underwater vehicles. Robotics, IEEE Transactions on, 23(2):331–341.

=⇒.

[Pettersson and Doherty, 2006] Pettersson, P. O. and Doherty, P. (2006). Probabilistic roadmap based

path planning for an autonomous unmanned helicopter. Journal of Intelligent and Fuzzy Systems,

17:395–405.

[Pflimlin et al., 2006] Pflimlin, J., Soueres, P., and Hamel, T. (2006). Waypoint navigation control of

a VTOL UAV amidst obstacles. In Intelligent Robots and Systems, 2006 IEEE/RSJ International

Conference on, pages 3544–3549. =⇒.

[Point of Beginning, 2006] Point of Beginning (2006). 2006 3D Laser Scanner Hardware Survey. .

[Poppinga and Birk, 2009] Poppinga, J. and Birk, A. (2009). A Novel Approach to Wrap Around

Error Correction for a Time-Of-Flight 3D Camera. In Iocchi, L., Matsubara, H., Weitzenfeld, A.,

and Zhou, C., editors, RoboCup 2008: Robot WorldCup XII, Lecture Notes in Artificial Intelligence

(LNAI). Springer.

[Poppinga et al., 2008a] Poppinga, J., Birk, A., and Pathak, K. (2008a). Hough based Terrain Classification

for Realtime Detection of Drivable Ground. Journal of Field Robotics, 25(1-2):67–88.

[Poppinga et al., 2007] Poppinga, J., Pfingsthorn, M., Schwertfeger, S., Pathak, K., and Birk, A.

(2007). Optimized Octtree Datastructure and Access Methods for 3D Mapping. In IEEE Safety,

Security, and Rescue Robotics (SSRR). IEEE Press.

[Poppinga et al., 2008b] Poppinga, J., Vaskevicius, N., Birk, A., and Pathak, K. (2008b). Fast Plane

Detection and Polygonalization in noisy 3D Range Images. In International Conference on Intelligent

Robots and Systems (IROS), pages 3378 – 3383, Nice, France. IEEE Press.

143

[Rusu et al., 2008a] Rusu, R. B., Marton, Z. C., Blodow, N., Dolha, M., and Beetz, M. (2008a).

Towards 3D point cloud based object maps for household environments. Robotics and Autonomous

Systems, 56:927–941. =⇒.

[Rusu et al., 2008b] Rusu, R. B., Sundaresan, A., Morisset, B., Agrawal, M., and Beetz, M. (2008b).

Leaving flatland: Realtime 3D stereo semantic reconstruction. In ICIRA ’08: Proceedings of the

First International Conference on Intelligent Robotics and Applications, pages 921–932, Berlin,

Heidelberg. Springer-Verlag. =⇒.

[Sabe et al., 2004] Sabe, K., Fukuchi, M., Gutmann, J.-S., Ohashi, T., Kawamoto, K., and Yoshigahara,

T. (2004). Obstacle avoidance and path planning for humanoid robots using stereo vision. In

Robotics and Automation, 2004. Proceedings. ICRA ’04. 2004 IEEE International Conference on,

volume 1, pages 592–597, Washington, DC, USA. IEEE Computer Society. =⇒.

[Saez and Escolano, 2004] Saez, J. and Escolano, F. (2004). A global 3D map-building approach

using stereo vision. In Robotics and Automation, 2004. Proceedings. ICRA ’04. 2004 IEEE International

Conference on, volume 2, pages 1197 – 1202. =⇒.

[Sakenas et al., 2007] Sakenas, V., Kosuchinas, O., Pfingsthorn, M., and Birk, A. (2007). Extraction

of Semantic Floor Plans from 3D Point Cloud Maps. In International Workshop on Safety, Security,

and Rescue Robotics (SSRR). IEEE Press.

[Saunders et al., 2005] Saunders, J. B., Call, B., Curtis, A., and Beard, R. W. (2005). Static and

dynamic obstacle avoidance in miniature air vehicles. AIAA Infotech at Aerospace Conference.

=⇒.

[Schäfer et al., 2005a] Schäfer, B. H., Proetzsch, M., and Berns, K. (2005a). Stereo-vision-based

obstacle avoidance in rough outdoor terrain. In In International Symposium on Motor Control and

Robotics. =⇒.

[Schafer et al., 2008] Schafer, H., Hach, A., Proetzsch, M., and Berns, K. (2008). 3D obstacle detection

and avoidance in vegetated off-road terrain. In Robotics and Automation, 2008. ICRA 2008.

IEEE International Conference on, pages 923–928. =⇒.

[Schäfer et al., 2005b] Schäfer, H. B., Luksch, T., and Berns, K. (2005b). Obstacle detection and

avoidance for mobile outdoor robotics. In In EOS Conference on Industrial Imaging and Machine

Vision. =⇒.

[Schwertfeger et al., 2008] Schwertfeger, S., Poppinga, J., and Birk, A. (2008). Towards Object Classification

using 3D Sensor Data. In ECSIS Symposium on Learning and Adaptive Behaviors for

Robotic Systems (LAB-RS). IEEE.

[Shatkay, 1995] Shatkay, H. (1995). Approximate queries and representations for large data sequences.

Technical report, Providence, Rhode Island, USA. the tech report, not the 96 paper

of the same title.

[Siek et al., 2001] Siek, J., Lee, L.-Q., and Lumsdaine, A. (2001). Boost graph library. http://

www.boost.org/doc/libs/1_39_0/libs/graph/doc/index.html. retrieved on

September 10th, 2009.

144

[Silveira et al., 2006] Silveira, G., Malis, E., and Rives, P. (2006). Real-time robust detection of

planar regions in a pair of images. In Intelligent Robots and Systems, 2006 IEEE/RSJ International

Conference on, pages 49–54, Washington, DC, USA. IEEE Computer Society. =⇒.

[Simmons, 1996] Simmons, R. (1996). The curvature-velocity method for local obstacle avoidance.

In Robotics and Automation, 1996. Proceedings., 1996 IEEE International Conference on, volume

4, pages 3375–3382. =⇒.

[Simmons et al., 1996] Simmons, R., Henriksen, L., Chrisman, L., and Whelan, G. (1996). Obstacle

avoidance and safeguarding for a lunar rover. In Proc. AlAA Forum on Advanced Developments in

Space Robotics, Madison WI. =⇒.

[Smith, 2006] Smith, R. (2006). Open Dynamics Engine. http://www.ode.org/. retrieved on

July 6th, 2009.

[Sun et al., 2001] Sun, W., Bradley, C., Zhang, Y. F., and Loh, H. T. (2001). Cloud data modelling

employing a unified, non-redundant triangular mesh. Computer-Aided Design, 33(2):183 – 193.

=⇒.

[Surmann et al., 2003] Surmann, H., Nuechter, A., and Hertzberg, J. (2003). An autonomous mobile

robot with a 3D laser range finder for 3D exploration and digitalization of indoor environments.

Robotics and Autonomous Systems, 45(3-4):181–198.

[Tangelder and Fabri, 2008] Tangelder, H. and Fabri, A. (2008). dd spatial searching. In Board, C. E.,

editor, CGAL User and Reference Manual. 3.4 edition.

[Terdiman, 2003] Terdiman, P. (2003). OPCODE – Optimized Collision Detection. http://www.

codercorner.com/Opcode.htm. retrieved on July 6th, 2009.

[Thrun, 2002] Thrun, S. (2002). Robotic Mapping: A Survey. In Lakemeyer, G. and Nebel, B.,

editors, Exploring Artificial Intelligence in the New Millenium. Morgan Kaufmann.

[Thrun et al., 2000] Thrun, S., Burgard, W., and Fox, D. (2000). A Real-Time algorithm for Mobile

Robot Mapping With Applications to Multi-Robot and 3D Mapping. In ICRA, pages 321–328.

[Thrun et al., 2003] Thrun, S., Hahnel, D., Ferguson, D., Montemerlo, M., Triebel, R., Burgard, W.,

Baker, C., Omohundro, Z., Thayer, S., and Whittaker, W. (2003). A system for volumetric robotic

mapping of abandoned mines. In Robotics and Automation, 2003. Proceedings. ICRA ’03. IEEE

International Conference on, volume 3, pages 4270 – 4275. =⇒.

[Thrun et al., 2006] Thrun, S., Montemerlo, M., Dahlkamp, H., Stavens, D., Aron, A., Diebel, J.,

Fong, P., Gale, J., Halpenny, M., Hoffmann, G., Lau, K., Oakley, C., Palatucci, M., Pratt, V., Stang,

P., Strohband, S., Dupont, C., Jendrossek, L.-E., Koelen, C., Markey, C., Rummel, C., Niekerk,

J. v., Jensen, E., Alessandrini, P., Bradski, G., Davies, B., Ettinger, S., Kaehler, A., Nefian, A., and

Mahoney, P. (2006). Stanley: The robot that won the DARPA Grand Challenge. Journal of Field

Robotics, 23(9):661–692. =⇒.

[Ulrich and Nourbakhsh, 2000] Ulrich, I. and Nourbakhsh, I. (2000). Appearance-based obstacle

detection with monocular color vision. In Proceedings of the Seventeenth National Conference on

Artificial Intelligence, pages 866 – 871, Menlo Park, California. The AAAI Press. =⇒.

145

[Unnikrishnan and Hebert, 2003] Unnikrishnan, R. and Hebert, M. (2003). Robust extraction of multiple

structures from non-uniformly sampled data. In IEEE/RSJ International Conference on Intelligent

Robots and Systems (IROS), volume 2, pages 1322–1329 vol.2. IEEE Press.

[van den Bergen, 2004] van den Bergen, G. (2004). SOLID – Software Library for Interference Detection.

http://www.win.tue.nl/˜gino/solid/. retrieved on July 27th, 2010.

[Varadarajan, 1996] Varadarajan, K. R. (1996). Approximating monotone polygonal curves using

the uniform metric. In Proceedings of the twelfth annual symposium on Computational geometry,

pages 311–318, Philadelphia, Pennsylvania, USA.

[Vaskevicius et al., 2007] Vaskevicius, N., Birk, A., Pathak, K., and Poppinga, J. (2007). Fast Detection

of Polygons in 3D Point Clouds from Noise-Prone Range Sensors. In International Workshop

on Safety, Security, and Rescue Robotics (SSRR). IEEE Press.

[Videre-Design, 2006] Videre-Design (2006).

1.1.

Stereo-on-a-chip (STOC) stereo head, user manual

[Viejo and Cazorla, 2007] Viejo, D. and Cazorla, M. (2007). 3D plane-based egomotion for SLAM

on semi-structured environment. In IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS),

pages 2761–2766.

[Waite, 2002] Waite, A. D. (2002). Sonar for Practising Engineers - Third Edition. John Wiley &

Sons, Ltd.

[Watanabe et al., 2005] Watanabe, Y., Johnson, E. N., and Calise, A. J. (2005). Vision-based approach

to obstacle avoidance. In 2005 AIAA Guidance, Navigation, and Control Conference and

Exhibit, pages 1–10, San Francisco, CA; USA. =⇒.

[Weingarten and Siegwart, 2006] Weingarten, J. and Siegwart, R. (2006). 3D SLAM using planar

segments. In Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on, pages

3062–3067. =⇒.

[Wilmarth et al., 1999] Wilmarth, S. A., Amato, N. M., and Stiller, P. F. (1999). Motion planning for

a rigid body using random networks on the medial axis of the free space. In SCG ’99: Proceedings

of the fifteenth annual symposium on Computational geometry, pages 173–180, New York, NY,

USA. ACM. =⇒.

[Wulf and Wagner, 2003] Wulf, O. and Wagner, B. (2003). Fast 3D-Scanning Methods for Laser

Measurement Systems. In International Conference on Control Systems and Computer Science

(CSCS14).

[Wzorek and Doherty, 2006] Wzorek, M. and Doherty, P. (2006). Reconfigurable path planning for

an autonomous unmanned aerial vehicle. In Hybrid Information Technology, 2006. ICHIT ’06.

International Conference on, volume 2, pages 242–249. =⇒.

[Yakimovsky and Cunningham, 1978] Yakimovsky, Y. and Cunningham, R. (1978). A system for

extracting three-dimensional measurements from a stereo pair of TV cameras. Computer Graphics

and Image Processing, 7:195–210. =⇒.

[Yvinec, 2008] Yvinec, M. (2008). 2d triangulations. In Board, C. E., editor, CGAL User and

Reference Manual. 3.4 edition.

146

[Zufferey and Floreano, 2005] Zufferey, J.-C. and Floreano, D. (2005). Toward 30-gram autonomous

indoor aircraft: Vision-based obstacle avoidance and altitude control. In Robotics and Automation,

2005. ICRA 2005. Proceedings of the 2005 IEEE International Conference on, pages 2594–2599.

=⇒.

147

PhD Thesis Poppinga: RRT - Jacobs University

PhD Thesis Poppinga: RRT - Jacobs University ... View more PhD Thesis Poppinga: RRT - Jacobs University

Delete template?

Save as template ?

PhD Thesis Poppinga: RRT - Jacobs University PhD Thesis Poppinga: RRT - Jacobs University