09.02.2013 Views

Diploma Thesis - Bad Request - Fachhochschule Vorarlberg

Diploma Thesis - Bad Request - Fachhochschule Vorarlberg

Diploma Thesis - Bad Request - Fachhochschule Vorarlberg

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Moment-based<br />

Facial Feature Tracking<br />

using Java<br />

<strong>Diploma</strong> <strong>Thesis</strong> in the Degree Program<br />

iTec - Information and Communication Engineering<br />

Manuela Hutter<br />

011 0109 038<br />

Supervisor: Dipl.-Ing. (FH) Walter Ritter<br />

Dornbirn, August 2005<br />

<strong>Fachhochschule</strong> <strong>Vorarlberg</strong> GmbH


Eidesstattliche Erklärung<br />

Ich erkläre hiermit ehrenwörtlich, dass ich die vorliegende Arbeit selbständig angefer-<br />

tigt habe. Die aus fremden Quellen direkt oder indirekt übernommenen Gedanken sind<br />

als solche kenntlich gemacht. Die Arbeit wurde bisher keiner anderen Prüfungsbehörde<br />

vorgelegt und auch noch nicht veröffentlicht.<br />

Acknowledgments<br />

Manuela Hutter (Dornbirn, August 2005)<br />

Many people helped me with this work in one way or another. I especially thank<br />

Walter Ritter, my supervisor, for his patience and help in all concerns. I am grateful<br />

to my introductory supervisor Miglena Dontschewa, who provided the initial idea for<br />

this thesis and enthused me for it; to Guido Kempter, who supported me in a diffi-<br />

cult phase of the project and assisted my statistical analyses; and to Avinash Manian,<br />

who gave me a helping hand with the data analysis in SPSS (thanks for your patience).<br />

I thank Colin Gregory-Moores and Lisa Newman for helping me with the basic structure<br />

of my English writing; Regine Bolter, the head of the study program, for giving me<br />

important hints to get on the right track; and Justin Zobel for writing the most helpful<br />

book about “writing for computer science”[Zobel, 2004]. Thanks to Wolfgang Mähr<br />

for proofreading and making helpful suggestions, and to my brother Matthias Hutter<br />

for collecting statistical data. Last but not least, thanks to my parents, Christine and<br />

Josef Hutter, for their personal and financial commitment.<br />

For scientific work with ethical awareness and without animal abuse.<br />

The use of registered names, trademarks etc. in this material does not imply, even in the absence of a specific statement, that<br />

such names are exempt from the relevant protective laws and regulations and therefore free for general use.<br />

i


Zusammenfassung<br />

Diese <strong>Diploma</strong>rbeit stellt ein plattform-unabhängiges, in Java entwickeltes Programm<br />

für die Gesichtsbewegungserkennung vor. Trackingalgorithmen, die markante Punkte<br />

im menschlichen Gesicht lokalisieren und verfolgen, sind eine wichtige Grundlage für<br />

viele unterschiedliche, darauf aufbauende Anwendungen: in der 3D-Modellanimation<br />

werden Punkte für die Gesichtsanimation eines Charakters benötigt; Analysen von<br />

menschlichen Emotionen verwenden die Punkte für automatische Klassifikation der<br />

Gesichtsmimik; und alternative Benutzerschnittstellen können Gesichtsbewegungen<br />

als Basis ihrer Funktionsweise benutzen. Zahlreiche Forschungsarbeiten beschreiben<br />

Bemühungen im Bereich der Erkennung von Gesichtsbewegungen. Trotzdem sind<br />

praktikable Lösungen selten. Es wurde nur eine Anwendung auf dem Markt gefunden,<br />

welche das Trackingproblem in Echtzeit und ohne physische Markierungen auf dem<br />

untersuchten Gesicht löst; es funktioniert allerdings nur auf Windows-Plattformen.<br />

Die entwickelte Java-Anwendung kann auf allen Plattformen verwendet werden, auf<br />

denen eine ‘Java Virtual Machine’ installiert ist. Sie benutzt eine Trackingmethode,<br />

die auf Bildmomenten und einer ‘Binary Space Partitioning’-Datenstruktur basiert,<br />

und einen Canny-Kantendetektor für die Datenaufbereitung verwendet. Die Software<br />

arbeitet mit Video-Eingangsdaten, ohne Markierungen auf dem betrachteten Gesicht.<br />

Sie hat eine modulare Programmstruktur, welche die Verwendung und den Austausch<br />

von externen Bibliotheken zulässt. Derzeit werden das ‘Java Media Framework’ für die<br />

Extrahierung der Video-Frames, und entweder ‘Java2D’ oder ‘Java Advanced Imag-<br />

ing’ für die Bildaufbereitung verwendet. Das Programm kann relevante Merkmale<br />

in vorausgewählten Bildregionen finden. Obwohl die extrahierten Punkte nicht mit<br />

standardisierten Gesichtsparametern wie den MPEG-4 ‘Facial Animation Parameters’<br />

übereinstimmen, zeigen 2 untersuchte Beispielpunkte bemerkenswerte Korrelationen<br />

von bis zu 98% im Vergleich zu manuell ermittelten Punkten; das Erkennen von<br />

Gesichtsmerkmalen auf einem vorverarbeiteten Bild dauert zirka 5 ms. Nachdem der<br />

Tracking-Prozess abgeschlossen ist, können die gefundenen Punkte in einer Ausgabe-<br />

datei gespeichert werden, um sie für nachfolgende übergeordnete Aufgaben verfügbar<br />

zu machen.<br />

ii


Abstract<br />

In this thesis, we present a platform-independent program for facial feature tracking,<br />

implemented in Java. Facial feature tracking algorithms, which locate and pursue dis-<br />

tinctive points in a human face, are an important basis for many different high-level<br />

tasks: 3D model animation needs feature points for moving the model’s facial fea-<br />

tures; programs that analyze human emotions use the points for automatic emotion<br />

recognition; and facial movements may provide a basis for alternative user interfaces.<br />

Numerous papers describe research efforts in the field of facial feature tracking. Nev-<br />

ertheless, practicable solutions are rare. We found only one application on the market<br />

that solves the tracking task in realtime and without physical markers on the tracked<br />

face. However, it only works on Windows platforms. The implemented Java tracking<br />

program can be used on all platforms that have a ‘Java Virtual Machine’ installed. It<br />

uses a tracking method based on image moments and a ’Binary Space Partitioning’<br />

data structure, the input data is prepared by a Canny edge detection mechanism.<br />

The software works on video input, without markers on the processed face. It has<br />

a modular program structure that allows for the use and interchange of external li-<br />

braries. Currently, it uses the ‘Java Media Framework’ for video frame extraction,<br />

and either ‘Java2D’ or ‘Java Advanced Imaging’ for preprocessing. The program is<br />

able to find relevant feature points in preselected image regions. While the extracted<br />

points are not in accordance with point definition standards like the MPEG-4 ‘Facial<br />

Animation Parameters’, 2 tested sample points show remarkable correlations of up to<br />

98% in comparison to manually ascertained points; the computation time of feature<br />

points on a preprocessed image region lies around 5 ms. After the tracking process,<br />

the extracted points can be saved to an output file in order to make them available<br />

for subsequent higher level tasks.<br />

iii


Contents<br />

Introduction 1<br />

1. State of the Art 5<br />

1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br />

1.2. Basic Tracking Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br />

1.3. Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8<br />

1.3.1. Optical Flow Techniques . . . . . . . . . . . . . . . . . . . . . . 8<br />

1.3.2. Active Contours (Snakes) . . . . . . . . . . . . . . . . . . . . . 12<br />

1.3.3. Image Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . 16<br />

1.4. Commercial Implementations . . . . . . . . . . . . . . . . . . . . . . . 22<br />

1.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26<br />

2. Algorithms in Consideration 27<br />

2.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br />

2.2. Testing Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br />

2.2.1. Testing Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br />

2.2.2. Input Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28<br />

2.3. Testing Snake Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 29<br />

2.4. Testing Image Moments . . . . . . . . . . . . . . . . . . . . . . . . . . 34<br />

2.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35<br />

3. Input Data and Its Preparation 36<br />

3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36<br />

3.2. Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36<br />

3.2.1. Data Format Prerequisites . . . . . . . . . . . . . . . . . . . . . 36<br />

3.2.2. Video Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41<br />

3.2.3. Video Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . 41<br />

3.3. Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42<br />

iv


Contents<br />

3.3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42<br />

3.3.2. Edge Detection Algorithms . . . . . . . . . . . . . . . . . . . . 44<br />

3.3.3. Edge Detector Realization . . . . . . . . . . . . . . . . . . . . . 47<br />

3.3.4. Further Improvements . . . . . . . . . . . . . . . . . . . . . . . 48<br />

3.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49<br />

4. Programming 50<br />

4.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50<br />

4.2. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50<br />

4.2.1. Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50<br />

4.2.2. Basic Application Flow . . . . . . . . . . . . . . . . . . . . . . 54<br />

4.2.3. Tracking Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 55<br />

4.3. Implementation Process . . . . . . . . . . . . . . . . . . . . . . . . . . 60<br />

4.3.1. Working Environment . . . . . . . . . . . . . . . . . . . . . . . 60<br />

4.3.2. Difficulties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61<br />

4.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63<br />

5. Evaluation 64<br />

5.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64<br />

5.2. Program Abilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64<br />

5.3. Tracking Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65<br />

5.3.1. Test Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68<br />

5.3.2. Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68<br />

5.3.3. Statistical Evaluation . . . . . . . . . . . . . . . . . . . . . . . 69<br />

5.4. Time Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77<br />

5.4.1. Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77<br />

5.4.2. Statistical Evaluation . . . . . . . . . . . . . . . . . . . . . . . 77<br />

5.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79<br />

Conclusions 80<br />

Bibliography 82<br />

Glossary 86<br />

v


Contents<br />

A. Appendix 92<br />

A.1. Evaluation Data: Coordinates of Corners of the Mouth . . . . . . . . . 92<br />

A.2. Evaluation Data: Tracking Mouth Area Selection (60x20) . . . . . . . 94<br />

A.3. Evaluation Data: Tracking of Whole Area Selection (384x288) . . . . . 96<br />

A.4. Evaluation Data: Canny Preprocessing . . . . . . . . . . . . . . . . . . 98<br />

A.5. Evaluation Data: Sobel Preprocessing . . . . . . . . . . . . . . . . . . 100<br />

vi


Introduction<br />

In this thesis, we develop a modular Java application that is able to detect and track<br />

facial features in video input data. The program finds distinctive points in the first<br />

frame of the input video and tracks them in subsequent video frames. Such facial fea-<br />

ture trackers are needed in different fields of computer vision. One area of application<br />

includes facial expression recognition, classification and detection of emotional states,<br />

where feature points can be used for an automated recognition process. Another area<br />

of application is information based encoding and compression; extracted feature in-<br />

formation could for example be used for low-bandwith video chats where only the<br />

movement information has to be transmitted. In terms of model acquisition and an-<br />

imation, feature points are needed for moving the character’s facial features. Facial<br />

movements may also provide a basis for alternative user interfaces that, for example,<br />

allow handicapped people to operate computers with facial expressions.<br />

This project started because there is no freely available, undisclosed, and platform<br />

independent program that is able to track feature points out of an input video. Various<br />

papers deal with the tracking of facial movements, using different techniques to localize<br />

and follow facial features. These works describe their approach in a more or less trans-<br />

parent way, they mostly mention that they implemented an example prototype, but<br />

they do not provide implementation details or even source code. We assume that they<br />

mainly use C or C++ as programming languages, at least no paper was found that<br />

explicitly uses Java for feature tracking. Despite the number of available approaches,<br />

only one solution, the commercial product VeeAnimator, which was developed for 3D<br />

model animation, currently solves the facial movement tracking task without the need<br />

of physical markers on the tracked face. In contrast to this application, we wanted<br />

to implement a free feature tracker, with the aim to be based on a comprehensible<br />

tracking method, to function on different platforms, and with disclosed sources and<br />

documentation. It should help to understand tracking algorithms, and evaluate the<br />

feasibility to implement a feature tracker in Java.<br />

1


Introduction<br />

In this paper, we first investigate and evaluate various movement tracking algorithms<br />

according to cost factors and their comprehensibility and practicability. Existing track-<br />

ing algorithms work with a range of different approaches. Techniques based on active<br />

contours can track deformable objects. However, they require manual initialization,<br />

and, due to their complexity, implementation as well as computation times may be too<br />

high. Local approaches, such as optical flow, surpass active contours concerning their<br />

complexity and computation times. Still, they have some limitations, as they may<br />

for instance compute more pixels than strictly necessary. Other algorithms designed<br />

for basic shape tracking, like the moment-based approach by Rocha et al. [2002], may<br />

not be applicable for facial movement tracking because they require clear, not frayed<br />

objects in the input images.<br />

Besides the comprehensibility and practicability of the algorithms, we also evaluate<br />

the quality of the obtained output. The extracted feature points may either follow<br />

defined standards, or they may not be consistent and well-positioned enough to fit into<br />

these norms. Figure 0.1 shows a possible tracking output, where the feature points<br />

are valid in the respect that they hold essential positions on the contour of the mouth.<br />

However, one feature point cannot permanently be defined as a certain facial feature<br />

point (the right corner of the mouth, for example), since its position may not be close<br />

enough to a predefined feature location, or the point’s position may unexpectedly<br />

change in subsequent video frames.<br />

(a) (b)<br />

Figure 0.1.: Not standardized output of facial feature localization. The tracked points<br />

(a) may not correlate with standardized points like the MPEG-4 Facial<br />

Animation Parameters (FAP) (b) [Antunes Abrantes and Pereira, 1999].<br />

2


Introduction<br />

For evaluation and decision making, we used modified versions of existing Java code<br />

that implement the algorithms in consideration. We then selected one algorithm that<br />

scored well in the evaluation process to be translated into Java for the usage in the<br />

final program. The chosen approach is based on a shape tracking algorithm by Rocha<br />

et al. [2002]. It uses image moment calculations and a ’Binary Space Partitioning’ data<br />

structure (described in Section 1.3.3) in order to find object positions and orientations.<br />

The algorithm is straightforward and shows small computation times. However, the<br />

output data may not be applicable to standards.<br />

The quality and the condition of the input data are very important factors for the<br />

algorithm to function and produce good results. To enhance these factors, we define<br />

certain prerequisites, select an appropriate Java library for the technical realization,<br />

and describe the preparation of the data. We therefore use the Java Media Frame-<br />

work (JMF) for video frame acquisition, and a Java2D based Canny edge detection<br />

mechanism for the data preprocessing.<br />

The resulting Java program is able to execute simple tracking tasks on preselected<br />

facial regions. For convenience, the users can perform the preselection, and control the<br />

video flow using a Graphical User Interface (GUI). Because this work only deals with<br />

the basic tracking process and leaves out important procedures (like the determination<br />

of face position and orientation), the problem area was reduced and special prerequi-<br />

sites were added (as described in Section 3.2). The architecture of the Java tracker has<br />

a modular design with 3 exchangeable components: the part responsible for the video<br />

frame acquisition, the preprocessing mechanism and the tracking implementation (see<br />

Section 4.2).<br />

The evaluation of the implemented application shows that the bottleneck in terms<br />

of computation times is the currently used Canny preprocessing technique. Investiga-<br />

tions on two exemplary feature points, the corners of the moth, demonstrate that the<br />

moment-based tracking algorithm is producing results similar to manually ascertained<br />

points. Variations are mainly caused by ragged edges in the preprocessed images.<br />

3


This paper is split up into five chapters:<br />

Introduction<br />

Chapter 1 informs about the basic tracking process and the current state of research<br />

on face localization and movement tracking. It gives an overview of commercial solu-<br />

tions and established algorithms in that field and makes a comparison of their range<br />

of application, their strengths and weaknesses.<br />

Chapter 2 goes into detail on preselected algorithms and analyzes the choice of the<br />

moment-based approach to be implemented in the final program. It describes sample<br />

implementations that we used for decision making, and reflects the test runs that we<br />

made in order to come to a decision.<br />

Chapter 3 defines the required input data format for the program, the prerequisites<br />

and the preparation of videos for the tracking process. We compare techniques to<br />

read videos and split them up into single images, and we select the most appropriate<br />

option for our aims. For that purpose, we define video information constraints, like a<br />

convenient and constant position and orientation of the face in all video frames. After<br />

having the right input data, with the face in the right place, we need to transform the<br />

images into edge images to be applicable to the tracking algorithm.<br />

Chapter 4 describes the code development of the feature tracker. It illustrates the<br />

program architecture with its general structure and the basic application workflow. In<br />

that context, we also describe the implementation of the tracking algorithm in detail.<br />

In the second part of this chapter, we briefly outline the working environment, and<br />

state difficulties that arouse during the implementation process.<br />

Chapter 5 shows the achieved results of the Java feature tracker and evaluates them<br />

according to the correctness of 2 calculated feature points, the corners of the mouth,<br />

and the required preprocessing and calculation time.<br />

4


1. State of the Art<br />

1.1. Overview<br />

In order to select the most appropriate facial feature tracking method for the Java<br />

implementation, we inspect the basic tracking process and look at the way it is imple-<br />

mented by different algorithms. We group these algorithms into 3 categories: optical<br />

flow techniques, active contour models (snakes), and moment-based shape tracking.<br />

For every group, we first state general definitions and properties in order to explain<br />

the mode of operation. We then give examples of approaches that use an algorithm of<br />

this group for facial feature tracking, and evaluate their feasibility for the Java feature<br />

tracker. In this context, we also introduce two commercial solutions, which unfortu-<br />

nately do not provide technical background information.<br />

Facial feature tracking is not to be mistaken with face tracking, which aims to locate<br />

the complete face inside a video sequence, characterizing its position and orientation,<br />

but often not evaluating further details inside the tracked face. It may however be<br />

a preparatory step for, or mixed with, facial feature tracking. Several authors have<br />

dealt with the face tracking problem ([Krüger et al., 2000; Sahbi and Boujemaa, 2002;<br />

Wu et al., 1999]).<br />

Within the field of computer vision, “recognition of the facial expressions is a very<br />

complex and interesting subject where there have been numerous research efforts”<br />

[Goto et al., 1999]. Most of these works are based on a similar process workflow, with<br />

videos and/or video streams as input data (see Section 1.2). However, they differ in<br />

their complexity, the initialization strategy and output quality, and in the way how<br />

they deduct facial feature points and movements. As it is the goal to find a straight-<br />

forward tracking algorithm, we examine existing algorithms according to cost factors,<br />

their comprehensibility and practicability. One group of techniques, based on snakes,<br />

can track deformable objects, but requires manual initialization and is computationally<br />

5


1. State of the Art<br />

expensive. It may also require extensive training and implementation periods. Other<br />

approaches, such as optical flow, surpass more complex methods concerning compu-<br />

tation speed, but “provide a low-level problem characterization and suffer from some<br />

drawbacks” [Rocha et al., 2002]: They might, for example, “integrate more pixel than<br />

strictly necessary” [Dellaert and Collins, 1999]. Basic shape tracking methods may<br />

not be applicable for facial movement tracking because of their need of clear, singular<br />

objects in the input images. All examined works provide a theoretical and mathe-<br />

matical description of the developed algorithm, but they lack an illustration of the<br />

programmatic implementation, flow diagrams, or code snippets. The commercial so-<br />

lutions provide very little technical background information, the tracking methodology<br />

is not disclosed. We found two products that are able to track facial features, only<br />

one of them solves this task without the need of physical markers on the tracked face.<br />

1.2. Basic Tracking Process<br />

Most of the work in facial feature tracking is based on a similar basic process workflow.<br />

As illustrated in Figure 1.1, the process works on input data from standard video or<br />

web cameras. This data may underlie some constraints, such as the distance between<br />

the tracked face and the camera, or lightning conditions (described in Section 3.2 for<br />

the Java feature tracker). The core tracking procedure consists of a number of steps.<br />

Most feature trackers have a preprocessing step, where the video data is prepared to<br />

be applicable to the tracking algorithm. The image sequences may be smoothed, con-<br />

verted into a different color model, or object details may be accentuated (see Section<br />

3.3). As facial feature tracking methods mostly assume a certain size or orientation<br />

of the tracked objects, the face has to be detected, located, and probably transformed.<br />

Due to time constraints, this project does not deal with face localization and bridges<br />

this gap with input data prerequisites. Having the preprocessed facial data in the<br />

right position in the image sequence, the feature tracking algorithm can set to work.<br />

Some algorithms, like the motion-based Java tracker or Snake-based methods, require<br />

pre-selection of feature points or feature regions. In case the implemented Java pro-<br />

gram, this step could be automated in further development steps. After the tracking<br />

procedure, all methods provide more or less standardized facial parameters, 2D or<br />

3D, depending on the tracker application. These parameters may then be used for<br />

facial model animation, or for high level face processing tasks such as facial expression<br />

recognition, face classification or face identification.<br />

6


1. State of the Art<br />

Figure 1.1.: Basic facial feature tracking workflow. All non-grey elements are part of<br />

the implemented Java feature tracker.<br />

Gorodnichy [2003] has illustrated a similar “hierarchy of face processing tasks”. He<br />

does not state input and output data, but goes into detail with the face localization<br />

as a preliminary step, and he lists a range of higher level tasks. Facial feature tracking<br />

is not mentioned in his his illustration, he seems to include this procedure in a step<br />

called “Face Localization (precise)”.<br />

In the following section, we describe a set of well-established tracking methodologies<br />

and their practice in the illustrated facial feature tracking workflow.<br />

7


1.3. Algorithms<br />

1.3.1. Optical Flow Techniques<br />

Definitions and Properties<br />

1. State of the Art<br />

Optical flow is a concept for considering the motion of objects within a visual represen-<br />

tation, where the motion is typically represented as vectors originating or terminating<br />

at pixels in a digital image sequence. Every pixel in an optical flow image is repre-<br />

sented by a motion vector that indicates the direction and the intensity of motion in<br />

this point. The work of Beauchemin and Barron [1995] extensively describes optical<br />

flow techniques. Figure 1.2, taken from their work, illustrates the computation of<br />

optical flow.<br />

(a) (b)<br />

Figure 1.2.: One frame of an image sequence (a) and its optical flow (b) Beauchemin<br />

and Barron [1995].<br />

An optical flow algorithm “estimates the 2D flow field from image intensities”[Cutler<br />

and Turky, 1998]. In the survey of Cédras and Shah [1995], the methods are di-<br />

vided into four classes: differential methods, region-based matching, energy-based,<br />

and phase-based techniques:<br />

“Differential methods compute the velocity from spatiotemporal derivates<br />

of image intensity. Methods for the computation of first order and sec-<br />

ond order derivates were devised, although estimates from second order<br />

approaches are usually poor and sparse. In region-based matching, the<br />

velocity is defined as the shift yielding the best fit between image regions,<br />

according to some similarity or distance measure.<br />

8


1. State of the Art<br />

Energy-based (or frequency-based) methods compute optical flow using<br />

the output from the energy of velocity-tuned filters in the Fourier domain,<br />

while phase-based methods define velocity in terms of the phase behavior<br />

of band-pass filter output, for example the zero crossing techniques.”<br />

Optical Flow for Feature Tracking<br />

Cohn et al. An example of how optical flow techniques are used in facial feature<br />

tracking is described by Cohn et al. [1998]. In their work, they manually select feature<br />

points in the first frame. Each of these points is then defined as the center of a 13x13<br />

pixel flow window. The position of all feature points is normalized by automatically<br />

mapping them to a standard face model based on three facial feature points: the<br />

medial canthi of both eyes and the uppermost point of the philtrum (see Figure 1.3).<br />

Figure 1.3.: Standard face model according to Cohn et al. [1998]. Medial canthus:<br />

inner corner of the eye, philtrum: vertical groove in the upper lip.<br />

A hierarchical optical flow method is used to automatically track feature points in<br />

the image sequence. The displacement of each feature point is calculated by subtract-<br />

ing its normalized position in the first frame from its current normalized position.<br />

The resulting flow vectors are concatenated to produce a 12-dimensional displacement<br />

vector in the brow region, a 16-dimensional displacement vector in the eye region, a<br />

12-dimensional displacement vector in the nose region, and a 20-dimensional vector in<br />

the mouth region (see Figure 1.4). The technique is based on the Facial Action Coding<br />

System (FACS), a widespread method for measuring and describing facial behaviors<br />

developed by Ekman and Friesen [1978] in the 1970s. Facial activities are described in<br />

terms of a set of small, basic actions, each called an Action Unit (AU). The AUs are<br />

based on the anatomy of the face and occur as the result of one or more muscle actions.<br />

9


1. State of the Art<br />

Figure 1.4.: Feature point displacements. Change from neutral expression (AU 0)<br />

to brow raise, eye widening, and mouth stretched wide open (AU<br />

1+2+5+27). Lines trailing from the feature points represent replacement<br />

vectors due to expression Cohn et al. [1998].<br />

Essa and Pentland The work of Essa and Pentland [1997] describes another facial fea-<br />

ture tracking method based on optical flow. They base their work on a self-developed,<br />

“extending FACS”, encoding system. They analyzed image sequences of facial expres-<br />

sions and probabilistically characterizing the facial muscle activation associated with<br />

each expression. This is achieved using a detailed physics-based dynamic model of the<br />

skin and muscles coupled with optical flow in a feedback controlled framework. They<br />

call this analysis control-theoretic approach, which produces muscle-based representa-<br />

tions of facial motion (Figure 1.5 shows an example).<br />

(a) (b)<br />

Figure 1.5.: A motion field for the expression of smile from optical flow computation<br />

(a) mapped to a face model using the control-theoretic approach (b) [Essa<br />

and Pentland, 1997].<br />

10


Evaluation<br />

1. State of the Art<br />

The approach by Cohn et al. uses the FACS for feature tracking. This system has<br />

been widely used for controlling computer animation, but was not intentionally devel-<br />

oped for this purpose. The intended goal was to “create a reliable means for skilled<br />

human scorers to determine the category or categories in which to fit each facial be-<br />

havior” (http://face-and-emotion.com/dataface/facs/description.jsp). Essa<br />

and Pentland [1997] state in their work, that<br />

“it is widely recognized that the lack of temporal and detailed spatial in-<br />

formation (both local and global) is a significant limitation to the FACS<br />

model. [...] Additionally, the heuristic ‘dictionary’ of facial actions origi-<br />

nally developed for FACS-based coding of emotion has proven to be difficult<br />

to adapt to machine recognition of facial expression”.<br />

The results of the method show that the accuracy is between 83% and 92% compared to<br />

previous tests and results of human testers, depending on the region. The authors find<br />

one reason for the lack of 100% agreement is “the inherent subjectivity of human FACS<br />

coding, which attenuates the reliability of human FACS codes” [Cohn et al., 1998].<br />

Two other possible reasons were the “restricted number of optical flow feature windows<br />

and the reliance on a single computer vision method”. In contrast to this approach<br />

by Cohn et al., the solution by Essa and Pentland specifically deals with the facial<br />

expression recognition. The work describes a complete tracking framework, which<br />

includes a physics-based dynamic model for skin and muscles-description, something<br />

that is not intended for the Java tracker. Both algorithms do not mention complexity<br />

or computation time for the tracking process.<br />

11


1.3.2. Active Contours (Snakes)<br />

Definitions and Properties<br />

1. State of the Art<br />

Active contour models, commonly called snakes, are energy-minimizing curves that<br />

deform to fit image features. Snakes, first introduced by Kass et al. [1988], “lock on to<br />

nearby minima in the potential energy generated by processing an image. (This energy<br />

is minimized by iterative gradient descent [...]) In addition, internal (smoothing) forces<br />

produce tension and stiffness that constrain the behavior of the models; external forces<br />

may be specified by a supervising process or a human user” [Ivins and Porrill, 1993].<br />

Figure 1.6 shows the basic functionality of a closed snake.<br />

Figure 1.6.: A closed snake. The snake’s ends are joined so that it forms a closed<br />

loop. Over a series of time steps the snake moves into alignment with the<br />

nearest salient feature [Ivins and Porrill, 1993].<br />

Snakes are applied to a range of different image processing problems. They sup-<br />

port the detection of lines and edges, but can also be used for stereo matching or for<br />

segmenting image sequences. Snakes have often been used in medical research appli-<br />

cations, and motion tracking systems use them to model moving objects. The main<br />

limitations of the models are that they “usually only incorporate edge information<br />

(ignoring other image characteristics) possibly combined with some prior expectation<br />

of shape; and that they must be initialized close to the feature of interest if they are<br />

to avoid being trapped by other local minima”[Ivins and Porrill, 1993] 1 .<br />

1 An overview of John Ivins’ publications about snakes is available at http://www.computing.edu.<br />

au/~jim/snakes.html<br />

12


1. State of the Art<br />

A snake (V ) is an ordered collection of n points in the image plane:<br />

V = {vi, . . . , vn} (1.1)<br />

vi = (xi, yi), i = {i, . . . , n}<br />

The points in the contour iteratively approach the boundary of an object through the<br />

solution of an energy minimizing problem. For each point in the neighborhood of vi,<br />

an energy term is computed<br />

Ei = αEint(vi) + βEext(vi) (1.2)<br />

where Eint(vi) is an energy function dependent on the shape of the contour, and<br />

Eext(vi) is an energy function dependent on the image properties, such as the gradient,<br />

near point vi. α and β are constants providing the relative weighting of the energy<br />

terms. Ei, Eint, and Eext are calculated using matrices. The value at the center of each<br />

matrix corresponds to the contour energy at point vi. Other values in the matrices<br />

correspond (spatially) to the energy at each point in the neighborhood of vi. Each<br />

point vi is moved to the point v ′ i , corresponding to the location of the minimum value<br />

in Ei. This process is illustrated in Figure 1.7. If the energy functions are chosen<br />

correctly, the contour V should approach the object boundary and stop when done so.<br />

Figure 1.7.: An example of the movement of a point vi in a snake. The point vi is<br />

the location of minimum energy due to a large gradient at that point<br />

[Mackiewich, 1995].<br />

13


Snakes for Feature Tracking<br />

1. State of the Art<br />

The work of Terzopoulos and Waters [1993] describes a hybrid method, where shape<br />

models and snakes are taking part in the tracking process. Face models are set up,<br />

which are then tracked by snakes. The approach incorporates many complex proce-<br />

dures, described by the authors as following:<br />

“An approach to the analysis of dynamic facial images for the purposes<br />

of estimating and resynthesizing dynamic facial expressions is presented.<br />

The approach exploits a sophisticated generative model of the human face<br />

originally developed for realistic facial animation. The face model which<br />

may be simulated and rendered at interactive rates on a graphics work-<br />

station, incorporates a physics-based synthetic facial tissue and a set of<br />

anatomically motivated facial muscle actuators. The estimation of dynam-<br />

ical facial muscle contractions from video sequences of expressive human<br />

faces is considered. An estimation technique that uses deformable contour<br />

models (snakes) to track the nonrigid motions of facial features in video<br />

images is developed. The technique estimates muscle actuator controls<br />

with sufficient accuracy to permit the face model to resynthesize transient<br />

expressions.”<br />

Figure 1.8 illustrates how snakes are used in this work.<br />

(a) (b)<br />

Figure 1.8.: Snakes and fiducial points used for muscle contraction estimation: neutral<br />

expression (a) and surprise expression (b)<br />

14


Evaluation<br />

1. State of the Art<br />

Snakes are mostly used in combination with other methods, as they require pre-<br />

initialization close to the feature of interest. A big disadvantage of the snake algorithm<br />

is that it is easily mislead if the edge is uncontinuous. Xie and Mirmehdi [2003] call<br />

this characteristic weak edge:<br />

“Despite their significant advantages, geometric snakes only use local in-<br />

formation and suffer from sensitivity to local minima. Hence, they are<br />

attracted to noisy pixels and also fail to recognize weaker edges for lack<br />

of a better global view of the image. The constant flow term can speed<br />

up convergence and push the snake into concavities easily when gradient<br />

values at object boundaries are large. But when the object boundary is<br />

indistinct or has gaps, it can also force the snake to pass through the<br />

boundary.”<br />

They developed an improved edge algorithm, called RAGS, that is able to undergo<br />

this problem. It works with “extra diffused region force which delivers useful global<br />

information about the object boundary and helps prevent the snake from stepping<br />

through”[Xie and Mirmehdi, 2003]. Figure 1.9 shows improvement with RAGS.<br />

(a) (b)<br />

Figure 1.9.: Weak-edge leakage. A regular snake leaks out of a weak edge (a); RAGS<br />

snake converges properly using its extra region force (b).<br />

Snakes have a great potential to work well in a tracking environment. However, the<br />

weak-edge leakage problem and the complexity of the algorithm argues against the use<br />

of snakes.<br />

15


1.3.3. Image Moments<br />

Definitions and Properties<br />

1. State of the Art<br />

In order to define its basic position, size, and orientation, a binary or greyscale image<br />

object can be approximated by a best-fitting ellipse. This ellipse is defined by the<br />

centroid, major and minor axis, and the angle of the major axis with the x-axis.<br />

These values are calculated using image moment functions. Figure 1.10 shows example<br />

moment calculations for a binary image object (the black pixels in the illustration). a,<br />

b and θ, and the resulting ellipse are illustrated in the image. The following paragraphs<br />

derive and explain the functions necessary for the calculation of the best-fitting ellipse.<br />

m00 = 5<br />

m10 = 15, m01 = 15<br />

m20 = 49, m02 = 47, m11 = 43<br />

c = (3, 3)<br />

θ = 31.7 ◦ , a = 1.84, b = 0.70<br />

Figure 1.10.: Example for moment calculations and shape representation.<br />

The image ellipse is represented by the semi-major axis a, the semi-minor<br />

axis b and the orientation angle θ.<br />

General Moment Definition A grayscale image can be seen as a two-dimensional<br />

density distribution function, written in the form of f(x, y), where the function value<br />

represents the intensity of a pixel at the position (x, y). A general definition of two-<br />

dimensional (p + q) order moments is then given by the following equation:<br />

Φpq =<br />

�∞<br />

�∞<br />

−∞ −∞<br />

Ψpq(x, y) f(x, y) dx dy p, q = 0, 1, 2, 3... (1.3)<br />

where Ψpq is a continuous function of (x, y), known as the moment weighting kernel<br />

or the basis set. The indices p, q usually denote the degrees of the coordinates (x, y),<br />

as defined inside the function Ψ. For example, a zeroth order moment is given by<br />

16


1. State of the Art<br />

p = 0 and q = 0. Applied to an image, the intensity function f(x, y) is bounded,<br />

and therefore the integrals in equation 1.3 are finite. In consequence, the general<br />

two-dimensional moment function can also be written in the form<br />

��<br />

Φpq = Ψpq(x, y) f(x, y) dx dy p, q = 0, 1, 2, 3... (1.4)<br />

ζ<br />

where ζ represents the image region, that is the number of foreground pixels in the<br />

image. Detailed moment function descriptions can be found in the book “Moment<br />

Functions in Image Analysis” [Mukundan and Ramakrishnan, 1998].<br />

Geometric Moments “Geometric moments are the simplest among moment func-<br />

tions, with the kernel function defined as a product of the pixel coordinates.” [Mukun-<br />

dan and Ramakrishnan, 1998, p. 9]. Compared with more complex weighting kernels,<br />

geometric moments are easy to perform and implement. They are also called Carte-<br />

sian moments, or regular moments. Equation 1.5 shows the two-dimensional geometric<br />

moment function, referred to as mpq.<br />

��<br />

mpq =<br />

ζ<br />

x p y q f(x, y) dx dy p, q = 0, 1, 2, 3... (1.5)<br />

In this equation, the basis set is defined as x p y q (compare to equation 1.3).<br />

As the number of values in the image region is discrete and finite, the integral can<br />

be replaced by a summation to make it easier to compute. The equation can then be<br />

written as<br />

mpq = �<br />

A<br />

x p y q f(x, y) dx dy p, q = 0, 1, 2, 3... (1.6)<br />

where A is the number of pixels in the image region.<br />

Moments that are calculated from a binary (or silhouette) image are called silhouette<br />

moments. The pixels of a binary image can only adopt the values 0 and 1. If a pixel<br />

is part of an image region, it is set to 1. If it belongs to the background, its value is<br />

0. For silhouette moments, the image region ζ only contains the pixels with value 1,<br />

17


1. State of the Art<br />

and the equation can be written in the form<br />

��<br />

mpq =<br />

Shape Representation Using Moments<br />

ζ<br />

x p y q dx dy p, q = 0, 1, 2, 3... (1.7)<br />

A set of low order moments can be used to describe the shape of image regions.<br />

Geometrical properties like the image area, the center of mass and the orientation<br />

can be defined by using moments of zeroth, first and second order. The moment<br />

of zeroth order (m00) represents the total intensity of an image. If the image is<br />

binary, m00 represents the image area, that is the number of foreground pixels. The<br />

intensity centroid can be calculated by combining first order moments m10, m01 with<br />

the moment of order zero. The first order moments “provide the intensity moment<br />

about the y-axis and x-axis of the image” [Mukundan and Ramakrishnan, 1998, p.<br />

12]. For example, m10 on a silhouette image sums up all the x-coordinates of the<br />

image region. The centroid c = (xc, yc) is given by<br />

xc = m10<br />

m00<br />

, yc = m01<br />

. (1.8)<br />

For a silhouette image, c represents the geometrical center of the image region, also<br />

called the center of mass.<br />

Central moments shift the reference system to the centroid to make the moment<br />

m00<br />

calculations independent of the image area position. They are defined as<br />

��<br />

µpq =<br />

ζ<br />

(x − xc) p (y − yc) q f(x, y) dx dy p, q = 0, 1, 2, 3... (1.9)<br />

As the image region remains unchanged during the transformation and the pixel co-<br />

ordinates are in equal shares on both sides of the reference system, we have<br />

µ00 = m00; µ10 = µ01 = 0. (1.10)<br />

According to equation 1.9, the image area is traversed twice for central moment cal-<br />

culations, as the centroid is determined before µpq can be calculated. The work of<br />

18


1. State of the Art<br />

Rocha et al. [2002] avoids the double traversation. It uses the following equations for<br />

the calculation of the second order central moments:<br />

µ20 = m20<br />

− x<br />

m00<br />

2 c<br />

µ11 = m11<br />

− xcyc<br />

m00<br />

µ02 = m02<br />

− y<br />

m00<br />

2 c<br />

(1.11)<br />

(1.12)<br />

(1.13)<br />

The second order moments are “a measure of variance of the image intensity distri-<br />

bution about the origin. The central moments µ20, µ02 give the variances about the<br />

mean (centroid). The covariance is given by µ11.” [Mukundan and Ramakrishnan,<br />

1998, p. 12]. The second order central moments can also be seen as moments of inertia<br />

with the coordinate axes moved to have the intensity centroid as their origin. If these<br />

so-called principal axes of inertia are used as the reference system, they make the<br />

product of inertia component (µ11) vanish. The moments of inertia (µ20, µ02) of the<br />

image about this reference system are then called the principal moments of inertia.<br />

We can use these moments to provide useful descriptors of shape. The work of Morse<br />

[2004] gives a good description of these techniques:<br />

“Suppose that for a binary shape we let the pixels outside the shape have<br />

value 0 and the pixels inside the shape value 1. The moments µ20 and<br />

µ02 are thus the variances of x and y respectively. The moment µ11 is the<br />

covariance between x and y [...] . You can use the covariance to determine<br />

the orientation of the shape.”<br />

The covariance matrix C is<br />

C =<br />

�<br />

µ20 µ11<br />

µ11 µ02<br />

�<br />

(1.14)<br />

By finding the eigenvalues and eigenvectors of C and looking at the ratio of the eigen-<br />

value, we can determine the eccentricity, or elongation, of the shape. The direction<br />

of elongation can then be derived using the direction of the eigenvector whose corre-<br />

sponding eigenvalue has the largest absolute value.<br />

19


The eigenvalues of C are defined as<br />

1. State of the Art<br />

I1 = (µ20<br />

�<br />

+ µ02) + (µ20 − µ02) 2 + 4µ 2 11<br />

�<br />

2<br />

I2 = (µ20 + µ02) −<br />

(µ20 − µ02) 2 + 4µ 2 11<br />

The semi-major axis a and the semi-minor axis b can then be calculated as<br />

2<br />

(1.15)<br />

a = � 3 ∗ I1; b = � 3 ∗ I2. (1.16)<br />

These axis-calculations are derived from the paper by Rocha et al. [2002]. Other au-<br />

thors described a and b differently ([Mukundan and Ramakrishnan, 1998, p. 14],[Sonka<br />

et al., 1999, p. 258]). During the implementation phase, testing results were most ap-<br />

propriate with the usage of the stated formulas.<br />

The orientation angle θ of one of the principal axis of inertia with the x-axis is given<br />

by<br />

Image Moments for Feature Tracking<br />

θ = 1<br />

2 tan−1<br />

� �<br />

2µ11<br />

. (1.17)<br />

µ20 − µ02<br />

The work of Rocha et al. [2002] introduces a moment-based object tracking method<br />

where the object in the binary image is approximated by best-fitting ellipses. Binary<br />

Space Partitioning (BSP), a method for recursively subdividing a space into convex sets<br />

by hyperplanes, is used for the approximation. Each node of the BSP tree represents<br />

a part of the image object, described by its best-fitting ellipse.<br />

(a) (b) (c)<br />

Figure 1.11.: Object fitting by 2 k ellipses at each level. Construction of the BSP tree<br />

at level 0 (a), level 1 (b), and the result of level 3 (c) [Rocha et al., 2002].<br />

20


1. State of the Art<br />

As illustrated in Figure 1.11, the algorithm starts by calculating the ellipse of the<br />

root node (level 0). Then, the image region is divided along the minor axis, and the<br />

child nodes are created, each incorporating the pixels on one side of the splitting axis.<br />

This subdivision is repeated until a certain predefined tree depth is reached where the<br />

ellipses sufficiently approximate the image shape (see (c) in Figure 1.11).<br />

The approach by Rocha et al. [2002] was designed for basic shape tracking purposes,<br />

with only one simple object on the image region. It is not yet used and evaluated for<br />

more complex tasks, such as a facial feature tracking. As stated by the authors,<br />

“problems that we did not address in this paper are occlusion, tracking of multiple<br />

objects and motion discontinuities. Future work will go in these directions”[Rocha<br />

et al., 2002]. We did not find any further papers that base their work on this moment<br />

tracking algorithm.<br />

Evaluation<br />

Despite its simple approach, the proposed ellipse approximation method of Rocha<br />

et al. [2002] surprises with the quality of the achieved results. The paper is described<br />

in a very legible way, and the results are illustrated graphically. Therefore the work<br />

presages a straightforward implementation. The algorithm is not yet tested on multiple<br />

objects, but with an appropriate region selection on the preprocessed face images, we<br />

can simplify the object structures in order to make them applicable to the tracking<br />

procedure. The paper does not state processing times, but the design of the algorithm<br />

permits to expect short operating times.<br />

21


1.4. Commercial Implementations<br />

Overview<br />

1. State of the Art<br />

The number of facial movement tracking software on the market is still very limited.<br />

During inquiry, we have found two products. X-IST FaceTracker by the German com-<br />

pany noDNA (http://www.nodna.com/FaceTracker.26.0.html) and VeeAnimator<br />

by Vidiator Technology (USA, http://www.vidiator.com/facestation.php). Both<br />

of them keep the technical specification short and do not provide information on what<br />

tracking methods and algorithms have been used.<br />

The basic operating sequence is the same for the two systems, even though they differ<br />

in some key factors. Both of them take video streams as input data, are able to process<br />

and transfer in realtime, and provide data for proprietary 3D animation software.<br />

However, only VeeAnimator can operate without physical markers on the tracked<br />

person. They also differ in the scope of supply, hardware requirements and integration<br />

with proprietary 3D animation software, where the German product is ahead. Still<br />

surprising is the fact that VeeAnimator, which gets by without any physical markers,<br />

is about a fourth the price of the X-IST FaceTracker.<br />

X-IST FaceTracker<br />

The X-IST FaceTracker is characterized by a head-mounted video camera, required<br />

facial markers and lighting conditions, and the support for range of different 3D an-<br />

imation formats. In contrast to the VeeAnimator, X-IST FaceTracker uses its own<br />

proprietary headset for video recording (see Figure 1.12). The camera on this headset<br />

is near infrared sensitive, with PAL or NTSC video output and adjustable camera<br />

focus. It has a near infrared dimmable light source built into. Currently, X-IST works<br />

on Microsoft Windows 2000, it will be available for Windows XP Professional in future.<br />

22


1. State of the Art<br />

It works with infrared reflective markers on the face of the tracked person, which<br />

are then recognized by the tracking software. To detect these markers correctly, the<br />

studio environment has to be kept in fluorescent (cold) light, without daylight or<br />

other warm light sources such as halogene or light bulbs. It provides drivers for 3D<br />

animation programs (Alias Mocap, Famous3D, 3ds Max, FBX), a Portable Control<br />

Unit (PCU) and a Software Development Kit (SDK) for 3rd party integration. The<br />

package with the headset system and the provided software costs e 6.999, without<br />

required additional hardware and drivers.<br />

Figure 1.12.: The X-IST FaceTracker. With the provided headset (on the left) it is<br />

possible to create facial animations.<br />

VeeAnimator (formerly FaceStation)<br />

VeeAnimator stands out with the ability to track in realtime, without the use of<br />

physical markers and with standard hardware components, which makes the tracking<br />

process simple in execution.<br />

It is “a suite of software applications that allow you to animate heads and faces<br />

in Discreet’s 3ds max or Alias|Wavefront Maya” [vidiator, 2004] that uses a normal<br />

video camera. The camera does not have to be head mounted and, in contrast to<br />

the X-IST FaceTracker, whole head movements are recorded. The software places 22<br />

virtual markers at key positions on the face. The movement of these markers is then<br />

‘tracked’ from each video frame to the next to generate facial animation data. This<br />

data is used to animate a model in the 3D animation package.<br />

23


1. State of the Art<br />

At any given video frame, the face is analyzed into a mixture of 16 different facial<br />

expression elements (including smile, frown, lip pucker, vowel sounds, raised eyebrows,<br />

closed eyelids). These facial expression elements can then be used for animation, for<br />

example to drive a set of morph targets with the defined expressions. The software<br />

additionally provides audio (speech) analysis tools that can be used to refine lip move-<br />

ments. The big advantage of VeeAnimator is that is does not need any additional<br />

hardware or special lighting. Soft diffused illumination on the actors face, from what-<br />

ever light source, is sufficient for the program to work satisfactorily. Figure 1.13 shows<br />

a tracking example with this software, taken from the VeeAnimator demonstration<br />

video 2 .<br />

(a) (b)<br />

Figure 1.13.: VeeAnimator in action. The tracked feature points (a), the real-life per-<br />

son (right) and its virtual reality equivalent during realtime tracking (b).<br />

VeeAnimator contains 4 parts: FaceLifter tracks prerecorded computer video files,<br />

FaceTracker does realtime tracking on video streams, FaceDriver is the 3ds Max or<br />

Maya plug-in component, and the Avatar Editor creates fully textured head models.<br />

Comparison<br />

Table 1.1 on page 25 gives a summarizing overview over the mentioned two programs.<br />

They differ in a lot of points, especially in prerequisites and the required hardware.<br />

Especially the comparison of the number of supported feature points of these two ap-<br />

plications is interesting.<br />

2 http://www.vidiator.com/demos/facestation/FSDemoFinal_small.wmv<br />

24


Components<br />

1. State of the Art<br />

X-IST FaceTracker V 4.5 VeeAnimator<br />

Included Software package, headset<br />

system, cables and marker<br />

tape<br />

Software package<br />

Required PCI Framegrabber Card Ordinary digital video<br />

Optional Drivers/Converters for 3rd<br />

Requirements<br />

party animation software;<br />

PCU; SDK<br />

Software Windows 2000<br />

(XP in progress)<br />

camera, ‘Alias Maya 3D’ or<br />

‘Autodesk 3ds Max’<br />

Windows 2000/XP;<br />

Maya 4.5 or 5.0 /<br />

3ds Max 4.26, 4.3, 5.0, 6.0<br />

Clock rate ≥800 MHz prerecorded: ≥700 MHz,<br />

Hardware 20 GB HD, 128 MB RAM,<br />

Specification<br />

2D Graphics Card XGA, 1<br />

PCI Slot<br />

Feature Points up to 36 (typically 15) 22<br />

Physical Markers Yes No<br />

Environment no daylight/warm light;<br />

fluorescent (cold) light only<br />

Tracking Rates 25/50 fps (PAL),<br />

30/60 fps (NTSC)<br />

realtime: ≥2.0 GHz<br />

200 MB HD,<br />

Maya 3D / 3ds Max<br />

requirements<br />

soft defused illumination<br />

30/60 fps (NTSC)<br />

Price e 6,999.00 �1,995.00<br />

Table 1.1.: Comparison of commercial products<br />

25


1.5. Summary<br />

1. State of the Art<br />

Many researchers have already developed facial feature tracking algorithms, describing<br />

their work with different levels of detail. The examined approaches based on optical<br />

flow use the FACS, which suffers from major drawbacks because of the lack of spatial<br />

information. Other methods that work with snakes combine various tracking and loca-<br />

tion techniques and are hence more complex. Moreover, snakes can suffer the leaking<br />

edge problem, which worsens the result dramatically. The investigated image moment<br />

technique is straightforward, but may not be applicable to complex tracking tasks. No<br />

paper states information about the used programming language, and sources are not<br />

freely available on the Internet. We therefore assume that all works are implemented<br />

with a platform dependent language like C++, which may have advantages in the<br />

required processing time, but raises constraints in the portability and the ease of use.<br />

26


2. Algorithms in Consideration<br />

2.1. Overview<br />

In Chapter 1 we summarized different approaches for facial feature tracking. We<br />

showed that FACS-based methods have difficulties as they were not originally devel-<br />

oped for machine recognition. The motion-based approach surprised to be straight-<br />

forward and comprehensible. Other algorithms have been computationally expensive<br />

or seem not to be straightforward to implement. According to their description in pa-<br />

pers and their predicted practicability, we selected two tracking procedures for a closer<br />

examination: active contour models (snakes) and moment-based tracking. Both ap-<br />

proaches need manual or automated initialization in the first video frame. Snakes need<br />

an initial contour and therefore exact feature points for processing, the moment-based<br />

solution works on the complete picture, but needs initialization of feature regions be-<br />

cause it can only recognize single objects. The required preprocessing steps for the two<br />

methods are also similar. They both work on binary edge images, but will presumably<br />

produce better results on grayscale edge images, where the edge intensity varies and<br />

and therefore also weaker edges can be handled. The two algorithms mainly differ in<br />

their implementation complexity and processing time. This factor is investigated in<br />

this chapter.<br />

2.2. Testing Method<br />

2.2.1. Testing Tool<br />

For testing the practicability and performance of the algorithms in consideration, we<br />

have used and extended Java code examples which already implement the required<br />

functionality. These examples are programmed as plugins for the Java based image<br />

processing tool ImageJ. It is a public domain program, available at http://rsb.info.<br />

nih.gov/ij/.<br />

27


2. Algorithms in Consideration<br />

On the homepage, the program is described as following:<br />

“ImageJ is [...] inspired by NIH Image for the Macintosh. It runs, either as<br />

an online applet or as a downloadable application, on any computer with a<br />

Java 1.1 or later virtual machine. Downloadable distributions are available<br />

for Windows, Mac OS, Mac OS X and Linux. [...]<br />

ImageJ was designed with an open architecture that provides extensibil-<br />

ity via Java plugins. Custom acquisition, analysis and processing plugins<br />

can be developed using ImageJ’s built in editor and Java compiler. User-<br />

written plugins make it possible to solve almost any image processing or<br />

analysis problem.”<br />

At the time of inquiry, ImageJ was available in version 1.33, which had errors in<br />

working with Java 1.5 on Linux 1 and was therefore used with Java 1.4.2. The recent<br />

ImageJ version 1.34 works fine with Java 1.5.<br />

2.2.2. Input Data<br />

For the following tests we used a binary edge image of a human face. We therefore<br />

extracted a video frame that shows a face in neutral position in the middle of the im-<br />

age. This enables us to have clearly identifiable facial features, represented by edges,<br />

which eases the selection of feature region and therefore the correct comparison of the<br />

output. Moreover, we approximate the test situation to the conditions of the final<br />

Java tracking program. In order to transform the video frame into the correct format,<br />

we converted the color image into a grayscale image and processed it with a Canny<br />

edge detector.<br />

In the following sections we describe the results of the ImageJ feature tracking<br />

plugins.<br />

1 java.lang.NullPointerException is thrown during image window initialization.<br />

28


2.3. Testing Snake Algorithms<br />

2. Algorithms in Consideration<br />

We have found two ImageJ plugins that implement Snake algorithms, which both<br />

work on grayscale images: Jacob’s SplineSnake implementation, and the snake plugin<br />

by Boudier.<br />

SplineSnake The SplineSnake implementation of Jacob et al. [2004] allows to select<br />

any required image region by drawing a path onto the source image. Points on this<br />

path, which have a preset distance between each other, are called knots and are the<br />

initialization for the snake algorithm. Additionally, the user can specify constraint<br />

knots that have to be passed by the final snake. All adjustable parameters are de-<br />

scribed at http://ip.beckman.uiuc.edu/Software/SplineSnake/usage.html, the<br />

values in Table 2.1 are directly used by the Snake algorithm:<br />

Parameter Default<br />

Image energy: proper linear combination of gradient and region<br />

energies can result in better convergence. The right combina-<br />

tion depends on the image.<br />

“100% Region”<br />

Maximum number of iterations. 2000<br />

Size of one step during optimization. 2.0<br />

Accuracy to which the snake is optimized. 0.01<br />

Smoothing radius of the image smoothing procedure that is<br />

computed before running the snake algorithm.<br />

Spring weight: specifies how the constraint knots are weighted. 0.75<br />

Table 2.1.: SplineSnake parameters.<br />

For testing, we have drawn a nearly rectangular path around the mouth, with a<br />

knot distance of 5 pixels. During the testing process, we varied the step size and the<br />

number of iterations. Satisfying results were possible with a step size of 10, and 200<br />

iterations. With 50 iterations (as in (b) and (c)), we were not able to approximate the<br />

mouth contour close enough. A step size of more than 10 did not enhance the process.<br />

Results are illustrated in Figure 2.1.<br />

29<br />

1.5


(a)<br />

(c)<br />

2. Algorithms in Consideration<br />

Figure 2.1.: Overview of SplineSnake results. The initial selection (a), SplineSnake<br />

with step size 1.0 and 50 iterations (b), step size 10.0 and 50 iterations<br />

(c), and SplineSnake with step size 10.0 and 200 iterations.<br />

SplineSnake cannot omit small sources of interference in its processing. A tracking<br />

example with distracting pixels is illustrated in Figure 2.2.<br />

Figure 2.2.: SplineSnake interference. The final snake (the inner red line) is not able to<br />

ignore single interfering pixels on the right side of the upper lip contour.<br />

The plugin delivers information about the processing time and the resulting snake<br />

knots. Table 2.2 shows a result of SplineSnake test cycles. For these results, we tested<br />

with different manually drawn mouth selections, 200 snake iterations and a step size<br />

of 10.0. Other values were not changed from default. The average processing time<br />

after 20 cycles was 2.42 seconds, with 26.3 resulting knots and 4.35 curve-describing<br />

samples per knot.<br />

30<br />

(b)<br />

(d)


2. Algorithms in Consideration<br />

Requiring a tracking program that is able to work close to realtime, these tracking<br />

times are not supportable. However, we have to notice that the tracking times of<br />

subsequent video frames could be reduced by initializing the snake with the parameters<br />

of the preceding frame. The initial snake would then be close to the final snake, and<br />

therefore less cycles (presumably < 10) have to be processed.<br />

no. knots samples/knot time<br />

1 29 4 2.280<br />

2 27 4 2.587<br />

3 29 4 3.846<br />

4 25 4 2.674<br />

5 32 3 4.133<br />

6 27 5 2.844<br />

7 24 5 3.684<br />

8 17 6 0.616<br />

9 23 5 1.353<br />

10 27 4 2.207<br />

11 25 4 1.579<br />

12 25 5 0.849<br />

13 25 3 2.027<br />

14 27 5 2.051<br />

15 25 4 1.543<br />

16 25 4 2.662<br />

17 22 4 1.233<br />

18 35 5 4.131<br />

19 29 4 3.263<br />

20 28 5 2.833<br />

26.3 4.35 2.420<br />

Table 2.2.: SplineSnake: Results<br />

31


2. Algorithms in Consideration<br />

Boudier Snake Plugin The second ImageJ plugin for snakes is written by Thomas<br />

Boudier. It is available at http://www.snv.jussieu.fr/~wboudier/softs/snake.<br />

html. For testing, we used the default parameters listed in Table 2.3.<br />

Parameter Value<br />

Gradient threshold 20<br />

Regularization 0.10<br />

Number of iterations 200<br />

Step result show 5<br />

Alpha-Canny-Deriche 1.00<br />

Table 2.3.: Bodier snake parameters<br />

For a comparison to the SplineSnake plugin, we chose to use a rectangular initial<br />

selection. As illustrated in Figure 2.3, the success of the snake procedure greatly<br />

depends on this initial selection. During testing, a change of the selection by one pixel<br />

resulted in extreme outgrowths of the resulting snake.<br />

1(a)<br />

2(a)<br />

Figure 2.3.: Overview of snake results. 1: a Selection of (231, 208, 59, 16) (a) delivers<br />

1(b)<br />

2(b)<br />

good results (b), 2: an enlargement of the region width by 1 pixel (a) has<br />

significant negative effects (b).<br />

Table 2.4 shows testing results that were made with this snake plugin. The values<br />

specified represent the rectangular selection on the edge image round the mouth region<br />

32


2. Algorithms in Consideration<br />

(position on x/y-axes, width and height). A checkmark in the last column indicates<br />

whether the result is satisfying (that is the snake bounds the mouth region), or leaked<br />

out over a big part of the displayed face.<br />

As the results show, the plugin delivers a successful result in only about one third of<br />

the testcases. In the last row we took the average of all selections as snake initialization<br />

values, which also lead to a negative outcome. If the region selection is closer to the<br />

mouth contour, for example with an elliptical selection, the algorithm works more<br />

reliably.<br />

no. x y w h result<br />

1 236 215 54 9 ✗<br />

2 236 210 54 16 ✗<br />

3 235 210 55 15 ✗<br />

4 234 207 56 20 ✗<br />

5 233 211 57 17 ✗<br />

6 233 211 54 18 ✗<br />

7 233 208 60 18 ✗<br />

8 233 208 58 17 ✗<br />

9 232 209 61 21 ✗<br />

10 232 209 61 19 ✗<br />

11 232 209 60 17 ✓<br />

12 231 208 64 21 ✗<br />

13 231 208 62 19 ✓<br />

14 231 208 60 16 ✗<br />

15 231 208 59 16 ✓<br />

233 209 58 17 ✗<br />

Table 2.4.: Snake: Results<br />

33


2.4. Testing Image Moments<br />

2. Algorithms in Consideration<br />

An ImageJ moment calculation implementation was found at http://rsb.info.nih.<br />

gov/ij/plugins/moments.html, which was apparently integrated into ImageJ ver-<br />

sion 1.34 2 . The plugin calculates image moments from rectangular image selections<br />

up to the 4th order, and calculates the elongation and orientation of objects. The<br />

implementation allows a mapping of image intensity values before the moments are<br />

calculated. For that purpose, it uses the equation<br />

pi,j = f ∗ (pi,j − c) (2.1)<br />

where pi,j is the intensity value of the pixel. Factor f and cutoff c can be specified<br />

manually in the user interface. This mapping allows the user to specify another back-<br />

ground color than black (by setting the cutoff accordingly), and to process images with<br />

a different color range (by changing the factor). The plugin provides tabular output<br />

of the moment calculations, the results are not illustrated in the image. The calcu-<br />

lations and the provided source code still give a good overview on how the moment<br />

calculations work. The implementation is straightforward and very comprehensible.<br />

It executes the following steps:<br />

Step 1: Compute moments of order 0 and 1.<br />

Step 2: Compute coordinates of the centroid.<br />

Step 3: Compute moments of orders 2, 3, and 4.<br />

Step 4: Normalize 2nd moments and compute the variance around the centroid.<br />

Step 5: Normalize 3rd and 4th order moments and compute the skewness (symmetry)<br />

and kurtosis (peakedness) around the centroid.<br />

Step 6: Compute orientation and eccentricity.<br />

Source: Awcock [Awcock, 1995, pp. 162–165]<br />

In the case of a moment-based facial feature tracker, moment calculations above the<br />

2nd order are not necessary. Step 5 and 6 can therefore be left out. Note that the<br />

image pixels have to be traversed twice (in step 1 and step 3), which increases the<br />

complexity of g(n) = (n) for a region with n foreground pixels by factor 2.<br />

2 Measurements in ImageJ (‘Analyze→Set Measurements...’ and ‘Analyze→Measure’)<br />

34


2. Algorithms in Consideration<br />

For testing purposes, we changed the plugin code so that it displays processing time<br />

information. This information shows calculation times between 10 ms and < 1 ms for a<br />

60x20 pixel selection. The more often the plugin is executed, the less calculation time<br />

is needed. JVM caching may be responsible for that behavior. This time information<br />

cannot be compared one-to-one to the data produced by the snake code. The plugin<br />

calculates moments of higher orders, which are not necessary for a feature tracker.<br />

Still, this procedure does not have an influence on the complexity of the algorithm, as<br />

no additional traversation of the image pixels is necessary. The complexity will change<br />

with the implementation of the BSP tree structure, as the moment information has<br />

to be calculated for every tree node. It is then g(n) = O(log(d) ∗ n) for a tree depth<br />

of d. Still, the processing times are far shorter than those of the snake plugin, and we<br />

assume that this will also be the case if Java tracker is based on moments.<br />

2.5. Summary<br />

We have shown that the performance and reliability of snake algorithms strongly de-<br />

pends on the adjustment of its parameters and the initialization of snake knots. The<br />

accurate selection of the image area and snake parameters has been problematic and<br />

challenging during the test phase. Small changes of region selections have caused in-<br />

comprehensible huge differences in the processing results. Calculation times of about<br />

2.5 seconds per execution seem to be too high for an application that aims to work<br />

close to realtime. The time could be reduced in subsequent frames by initializing the<br />

snake knots with the knots of the previous frame. Then the major execution time<br />

would only accrue in the first video frame. In contrast, moment calculations proved<br />

to be straightforward, fast and comprehensible. In addition to the results of our tests,<br />

the paper about tracking with moments calculation [Rocha et al., 2002] describes an<br />

exact course of action, which gives a clear path on how to proceed and therefore eases<br />

future work. For that reason we decided to implement a moment-based Java feature<br />

tracker.<br />

The next section describes the necessary prerequisites and preparations that have<br />

to be made, so that a moment-based tracking algorithm can work satisfactorily.<br />

35


3. Input Data and Its Preparation<br />

3.1. Overview<br />

In order to be applicable to the selected tracking algorithm, the input video has to<br />

be read and transformed into a proper format and quality. In this chapter, we divide<br />

this procedure into 2 steps. First, we read the input data and extract the video<br />

frames. Therefore, we specify prerequisites according to the used reading technique<br />

and the presumed video quality, and select sample input videos to be used in the<br />

development process. In the second step, we process the data in order to enhance<br />

the image features. We discuss the requirements of the tracking algorithm, and state<br />

how we meet these requirements with edge detection algorithms. Well-established<br />

algorithms are explained as well as the Java libraries that can facilitate this step.<br />

3.2. Prerequisites<br />

3.2.1. Data Format Prerequisites<br />

Media API Selection<br />

In order to specify the video format for the program, we have to know what Application<br />

Programming Interface (API) is used for the media handling. The API should be able<br />

to incorporate time-based media into the implemented Java application. It should be<br />

a platform independent and pure Java library to avoid Java Native Interface (JNI) for<br />

dumping into native media API’s. We investigated the following possibilities:<br />

� JMF, developed by Sun Microsystems and IBM<br />

� Quicktime for Java, developed by Apple<br />

� MPEG-4 Toolkit, developed by IBM<br />

Implementing a media import from scratch is not stated as an option, as the complexity<br />

of the task and the necessary implementation time do not meet the time constraints<br />

of the project.<br />

36


3. Input Data and Its Preparation<br />

Java Media Framework The JMF 2.0 API is Sun Microsystem’s freely available<br />

API that enables the presentation of time-based media. It provides support for cap-<br />

turing and storing media data, controlling the type of processing, and performing<br />

custom processing on media data streams. In addition, JMF 2.0 defines a plug-in<br />

API that enables the programmer to customize and extend JMF functionality. The<br />

current JMF 2.1.1e Reference Implementation supports the media types and formats<br />

listed in Table 3.1 1 . The list of formats is limited and, due to the latest JMF release<br />

date in March 2001, does not contain currently well-established formats like MPEG-4;<br />

MPEG-1 is only supported in the platform specific performance packs. Therefore the<br />

pure Java version of JMF is not able to decode MPEG-1 videos. Moreover, differ-<br />

ent authors, for example Davison [2005], describe the framework as buggy. Search-<br />

ing for JMF on Sourceforge 2 only returns a handful of sparsely active projects that<br />

are dealing with JMF’s video functionality. However, sticking with JMF despite its<br />

limited collection of supported media formats and codecs is, according to Adamson<br />

(http://www.oreillynet.com/pub/wlg/2933), still the most practical all-java op-<br />

tion.<br />

The JMF API Guide [JMF] describes the basic working model as following:<br />

“A data source encapsulates the media stream much like a video tape and<br />

a player provides processing and control mechanisms similar to a VCR.<br />

Playing and capturing audio and video with JMF requires the appropriate<br />

input and output devices such as microphones, cameras, speakers, and<br />

monitors.”<br />

A lot of implementation examples are freely available for JMF, for example in Sun’s<br />

JMF Forum (http://forum.java.sun.com/forum.jspa?forumID=28). An ImageJ<br />

plugin for JMF is available at http://rsb.info.nih.gov/ij/plugins/jmf-player.<br />

html.<br />

1 found at http://java.sun.com/products/java-media/jmf/2.1.1/formats.html<br />

2 One of the largest collections of Open Source software: http://www.sourceforge.net<br />

37


Media Type Cross Platform<br />

3. Input Data and Its Preparation<br />

Version<br />

Solaris/Linux<br />

Performance Pack<br />

Windows<br />

Performance Pack<br />

AVI (.avi) read/write read/write read/write<br />

Cinepak D D,E D<br />

MJPEG (422) D D,E D,E<br />

RGB D,E D,E D,E<br />

YUV D,E D,E D,E<br />

VCM 3 - - D,E<br />

HotMedia<br />

(.mvr)<br />

read only read only read only<br />

IBM HotMedia D D D<br />

MPEG-1 Video<br />

(.mpg)<br />

Multiplexed<br />

System stream<br />

Video-only<br />

stream<br />

QuickTime<br />

(.mov)<br />

- read only read only<br />

- D D<br />

- D D<br />

read only read only read only<br />

Cinepak D D,E D<br />

H.261 - D D<br />

H.263 D D,E D,E<br />

JPEG (420, 422,<br />

444)<br />

D D,E D,E<br />

RGB D,E D,E D,E<br />

D: format can be decoded and presented<br />

E: media stream can be encoded in the format<br />

read: media type can be used as input (read from a file)<br />

write: media type can be generated as output (written to a file)<br />

Table 3.1.: JMF 2.1.1 - Supported Video Formats<br />

3 VCM - Window’s Video Compression Manager support. Tested for these formats: IV41, IV51,<br />

VGPX, WINX, YV12, I263, CRAM, MPG4.<br />

38


3. Input Data and Its Preparation<br />

Quicktime for Java QuickTime for Java (QTJava) brings together the QuickTime<br />

movie player and the Java programming language. As a result, it is possible for<br />

Java applications to play QuickTime movies, edit and create them, capture audio and<br />

video, and perform 2D and 3D animations. QTJava provides a basic set of function-<br />

ality across all platforms that support Java and QuickTime. It is currently in version<br />

6.4, which works with Java 1.4.1. “The previous version of QTJava supported J2SE<br />

1.4.1, but only on Windows” 4 . QTJava wins in the supported media types, as it can<br />

play all types supported by the current QuickTime version. These formats include<br />

MPEG-4, Flash 5, H.261, H.263, H.264, DV and DVC Pro NTSC, DV PAL and DVC<br />

Pro PAL. Iverson describes the media playback handling with QTJava in his book<br />

“Mac OS X for Java Geeks” [Iverson, 2003, p. 154]. He praises the “rich range of<br />

supported media types”, but claims the API to be “still relatively C-like”.<br />

QTJava consists of two layers 5 :<br />

� A core layer which provides the ability to access the complete QuickTime API<br />

� An application framework for easy integration into Java applications. It includes:<br />

1. Integration of QuickTime with the Java Runtime. This includes sharing<br />

display space between Java and QuickTime and sharing events from Java<br />

with QuickTime.<br />

2. A set of classes that simplifies the effort required to perform common tasks<br />

while providing an extensible framework that application developers can<br />

customize to meet their specific requirements.<br />

The Java method calls are claimed to provide very little overhead to the native call;<br />

they do parameter marshalling and check the result of the native call for any error<br />

conditions. The major limitation of QTJava is that QuickTime is only supported for<br />

Windows and Mac platforms. As this project claims to develop a platform-independent<br />

software that runs on all platforms that provide a Java Virtual Machine (JVM) 1.4.1<br />

or higher, this library cannot be used. Moreover, we want to omit that additional<br />

programs have to be installed in order to make the Java tracker work correctly.<br />

4 see http://developer.apple.com/quicktime/qtjava/<br />

5 as described in an Apple developer article at http://developer.apple.com/quicktime/qtjava/overview.html<br />

39


3. Input Data and Its Preparation<br />

IBM Toolkit for MPEG-4 The IBM Toolkit for MPEG-4 is currently in version 1.2.4,<br />

which is usable for Java 1.1 up to 1.5. It consists of a set of Java classes and APIs<br />

with five sample applications: three cross-platform playback applications, and two<br />

tools for generating MPEG-4 content for use with MPEG-4-compliant devices. These<br />

applications are the following:<br />

� AVgen: a simple, easy-to-use GUI tool for creating audio/video-only content for<br />

ISMA- or 3GPP-compliant devices<br />

� XMTBatch: a tool for creating rich MPEG-4 content beyond simple audio and<br />

video<br />

� M4Play: an MPEG-4 client playback application<br />

� M4Applet for ISMA: a Java player applet for ISMA-compliant content<br />

� M4Applet for HTTP: a Java applet for MPEG-4 content played back over HTTP.<br />

Since the toolkit is Java based, the client applications and the content creation appli-<br />

cations are cross-platform and will run on any Java-supporting platform. Its minimum<br />

requirement is a Java SDK with Swing, for higher performance and more capabilities<br />

SDK version 1.4 or above is recommended. More details can be found at the project<br />

homepage 6 . The major disadvantage of the IBM Toolkit is that is not freely avail-<br />

able. It is possible to download a 90 days trial license, commercial licenses cost from<br />

�500 upwards. Furthermore, it is limited to the playback of MPEG-4 movies, which<br />

decreases the range of possible input data.<br />

MPEG-4 Video for JMF The MPEG-4 Video for JMF is a freely available plug-in<br />

that enables decoding of MPEG-4 videos in Java, independent of the IBM Toolkit for<br />

MPEG-4. This plug-in allows for the decoding of MPEG-4 video streams, which are<br />

created with any encoder that supports the MPEG-4 Simple Profile. The decoder of<br />

MPEG-4 Video for JMF can be used on any JMF-enabled platform. In order to func-<br />

tion, it needs JMF 2.1.1 and all the JMF requirements. “The implementation is 100%<br />

pure Java and has undergone special optimizations to ensure adequate performance”<br />

(http://www.alphaworks.ibm.com/tech/mpeg-4).<br />

6 see http://www.alphaworks.ibm.com/tech/tk4mpeg4. Implementation demos are available at<br />

http://www.research.ibm.com/mpeg4/Demos/index.htm<br />

40


3. Input Data and Its Preparation<br />

Selection The JMF was selected for the Java motion tracker, as it is the only freely<br />

available solution that works on a Java-enabled machine without further requirements.<br />

We are aware that data format and implementation problems could arouse due to the<br />

development status of the library. If MPEG-4 support is needed at a later point, the<br />

IBM-plugin can be used to enhance the JMF functionality.<br />

3.2.2. Video Quality<br />

Due to the selected tracking algorithm and missing preliminary tracking stages, the<br />

Java tracker has certain requirements to the input videos. For the desired quality of<br />

the tracking algorithm, it is necessary that the frame sequence is continuous, which<br />

should be given if the video is directly recorded with about 25 frames per second.<br />

The lighting conditions should be diffused soft light, frontal on the tracked face. As<br />

discussed in Section 1.2, the Java tracker leaves out the face tracking step. Therefore<br />

the face has to be in the middle of the image, the size and orientation have to stay<br />

almost constant.<br />

3.2.3. Video Samples<br />

For testing purposes, we used video material from the University of Tübingen (http:<br />

//vdb.kyb.tuebingen.mpg.de). On their homepage, they describe the setup for video<br />

recording:<br />

“The video cameras were arranged in a semi-circle around the subject at<br />

a distance of roughly 1.3m [as shown in Figure 3.1]. Each camera was<br />

centered on the subject and leveled. The cameras recorded 25 frames/sec<br />

in full PAL video resolution (786*576, non-interlaced). In order to facilitate<br />

the recovery of rigid head motion, the subject wore a headplate with 6 green<br />

foam markers attached to it.<br />

Figure 3.1.: Top view of camera layout used for recordings (taken from http://vdb.<br />

kyb.tuebingen.mpg.de).<br />

41


3. Input Data and Its Preparation<br />

Each recording contains one isolated action unit, repeated three times, with<br />

a (close to) neutral expression in between. For each action unit, there are<br />

six video files (one for each camera). Each video file has identical length<br />

and starts at exactly the same time. The videos were converted from raw<br />

single chip CCD data to RGB using a Bayer filter, then encoded as MPEG1<br />

using mpeg2enc.”<br />

The chosen MPEG-1 format, which has a well-defined specification with little or no<br />

unsupported variations and few incompatibilities between encoders and decoders. It<br />

is freely distributable, and players are available on all platforms. According to our<br />

constraints, we used camera positions C and D, as they provide faces in an almost<br />

frontal view.<br />

3.3. Preparation<br />

3.3.1. Overview<br />

After we have decided on how to read videos and extract frames, we now have to<br />

prepare the data in a way that the picture information is applicable to the track-<br />

ing algorithm and robust against small perturbations in the input data. The feature<br />

extraction process should be stable against small changes in illumination, viewing<br />

direction, and deformations of the objects in the environment. Otherwise, if small<br />

changes in any of these quantities lead to large changes in the position of facial feature<br />

points, the interpretation these points would be difficult.<br />

As described in Section 1.3.3, the moment-based tracking approach works by deter-<br />

mining the figure’s position and orientation. Therefore the algorithm needs<br />

� a binary or grayscale image and<br />

� well-defined and possibly only sparsely ragged image areas that are silhouetted<br />

against a defined background.<br />

It has to to be assured that the input data satisfies these needs. We identified two<br />

types of applicable preprocessed images: filled image regions and unfilled edge images.<br />

Figure 3.2 shows an example of these two types on a mouth region.<br />

42


3. Input Data and Its Preparation<br />

(a) (b)<br />

Figure 3.2.: Two types of binary image regions applicable for the tracking algorithm.<br />

A binarized mouth region, displayed as a filled figure (a) and with detected<br />

edges (b).<br />

Tests on these examples show that the moment-based calculation of the best fitting<br />

ellipse has similar results for both image types: The orientation angle θ differs by 0.1%<br />

between the filled region and the edge image. The ellipse axes a and b vary by 3.75<br />

and 0.77 pixels in a total area of 60x20 pixels.<br />

A disadvantage of filled regions is that the number of foreground pixels that have<br />

to be processed by a tracking program is considerably larger than in edge images.<br />

Moreover, the tracked centroids will be located in the middle of the lips, which makes<br />

it impossible to track the facial feature contours. In order to create the region images,<br />

thresholding could be used as a preparation technique. However, it has to be done<br />

differently for each image region, as, for example, the mouth region has a different<br />

color and hue distribution than eye regions. In contrast, an edge detection mechanism<br />

significantly reduces the number of foreground pixels, and the subsequent tracking<br />

algorithm can presumably locate points directly on the contour. The amount of data<br />

present in the edge map is reduced compared to the original image, which leads to a<br />

better performance of the overall system. Edge detection is the most common method<br />

for feature extraction in machine vision, the number of edge detection algorithms is<br />

enormous. Therefore we decided to use an edge detection mechanism for the prepro-<br />

cessing of the input frames.<br />

43


3.3.2. Edge Detection Algorithms<br />

3. Input Data and Its Preparation<br />

Edge detectors are “used to locate changes in the intensity function; edges are pixels<br />

where this function (brightness) changes abruptly” [Sonka et al., 1999, p. 77].The<br />

purpose is to convert the large array of brightness values that comprise an image<br />

into a compact, symbolic code. The goal is to determine the location of brightness<br />

discontinuities in the image. In order to detect such brightness changes in the in-<br />

tensity function, the edge detector algorithms mostly approximate the first or second<br />

derivative of the image function (see Figure 3.3).<br />

Figure 3.3.: Function f(x) with intensity change, its first derivative f ′ (x), and second<br />

derivative f ′′ (x).<br />

We selectively inspected 4 edge detection algorithms, which are commonly used<br />

and promise satisfying results: Prewitt and Sobel operators, Laplacian of Gaussian,<br />

and the Canny edge detection. In the next sections, we briefly describe the different<br />

algorithms.<br />

Prewitt and Sobel Edge Detectors Prewitt and Sobel operators are using filters for<br />

the estimation of local gradients that approximate the first derivative. The gradient<br />

is estimated in 8 possible directions (for a 3x3 convolution mask).<br />

44


3. Input Data and Its Preparation<br />

The first three masks for the Prewitt operator are<br />

h1 =<br />

⎡<br />

⎢<br />

⎣<br />

⎤<br />

⎡<br />

⎤<br />

⎡ ⎤<br />

1 1 1<br />

0 1 1<br />

−1 0 1<br />

⎥<br />

⎢<br />

⎥<br />

⎢ ⎥<br />

0 0 0 ⎦ , h2 = ⎣−1<br />

0 1⎦<br />

, h3 = ⎣−1<br />

0 1⎦<br />

. (3.1)<br />

−1 −1 −1<br />

−1 −1 0<br />

Accordingly, the Sobel operators are defined as<br />

−1 −2 −1<br />

−1 0 1<br />

⎡<br />

⎤<br />

⎡ ⎤<br />

⎡ ⎤<br />

1 2 1<br />

0 1 2<br />

−1 0 1<br />

⎢<br />

⎥<br />

⎢ ⎥<br />

⎢ ⎥<br />

h1 = ⎣ 0 0 0 ⎦ , h2 = ⎣−1<br />

0 1⎦<br />

, h3 = ⎣−2<br />

0 2⎦<br />

. (3.2)<br />

−2 −1 0<br />

−1 0 1<br />

The other masks can be determined by rotating the matrices of equation 3.1 and 3.2.<br />

Sobel and Prewitt filters are very similar, Sobel puts more weight on the central row<br />

and column. Its simplicity and the good results make the Sobel operator a popular<br />

edge detection mechanism. The main disadvantage of the first derivative operators<br />

is “their dependence on the size of the object and sensitivity to noise” [Sonka et al.,<br />

1999, p. 83].<br />

Laplacian of Gaussian The Laplacian of Gaussian (LoG) is combining the Laplace<br />

convolution operator with a Gaussian smoothing. The Laplace operator is approxi-<br />

mating the second derivative, which only returns the gradient magnitude and not the<br />

direction. For 4-neighborhoods and 8-neighborhoods, the 3x3 masks are defined as<br />

⎡ ⎤<br />

⎡ ⎤<br />

0 1 0<br />

1 1 1<br />

⎢ ⎥<br />

⎢ ⎥<br />

h1 = ⎣1<br />

−4 1⎦<br />

, h2 = ⎣1<br />

−8 1⎦<br />

, (3.3)<br />

0 1 0<br />

1 1 1<br />

If the Laplace operator is used separately, it responds doubly to some edges in the<br />

image. Together with the Gaussian smoothing, it is able to retrieve good results. The<br />

advantage of this approach compared to classical first derivative edge operators is that<br />

a larger area surrounding the current pixel is taken into account.<br />

Canny Edge Detector Canny’s aim was to discover the optimal edge detection al-<br />

gorithm. His parameter definition for an optimal algorithm consists of 3 criteria:<br />

45


3. Input Data and Its Preparation<br />

� good detection<br />

A low error rate. Occurring image edges are not dismissed by the algorithm.<br />

� good localization<br />

Well localized edges, being on the same position as the occurring edges.<br />

� minimal response<br />

One given edge is marked once, and image noise does not create false edges.<br />

The Canny operator works in a multi-stage process. First, the image is smoothed<br />

by Gaussian convolution. Then a simple 2D first derivative operator is applied to<br />

the smoothed image to create edges on regions of the image with high first spatial<br />

derivatives. In this step, the gradient magnitude is calculated in both x and y direction,<br />

and is thereafter combined into one edge image. The algorithm then tracks along the<br />

edges, a process known as non-maximal suppression. The tracking process is controlled<br />

by two thresholds: T 1 and T 2 with T 1 > T 2. Tracking only begins if the starting<br />

point has a value higher than T 1. Tracking then continues in both directions until the<br />

intensity value falls below T2. This method helps to ensure that noisy edges are not<br />

broken up into multiple edge fragments. The final step uses heuristic thresholding to<br />

keep only edge information and eliminate data that was wrongly identified. Figure 3.4<br />

shows the multi-stage edge detection process.<br />

Figure 3.4.: Multi-stage canny edge detection process.<br />

According to this process, the effect of the canny operator is influenced by 3 pa-<br />

rameters: the width of the Gaussian convolution mask and the thresholds T 1 and T 2.<br />

Increasing the width of the Gaussian mask “reduces the detector’s sensitivity to noise,<br />

at the expense of losing some of the finer detail in the image. The localization error in<br />

the detected edges also increases slightly as the Gaussian width is increased” [Fisher<br />

et al., 1994]. Example illustrations of Canny edge detection results with different con-<br />

volution masks in comparison to other edge detection mechanisms can be found in the<br />

work of Burger and Burge [2005, pp. 111–125].<br />

46


3.3.3. Edge Detector Realization<br />

3. Input Data and Its Preparation<br />

In order to realize the edge detection mechanisms in Java, we need backing frameworks<br />

that provide Image information and ease the image processing. We therefore used<br />

Java2D, which is part of the Java 2 Platform Standard Edition, and Java Advanced<br />

Imaging (JAI), which is an additional, freely available at the Java Sun homepage. In<br />

the following paragraphs, we shortly describe how these libraries can be used for edge<br />

detection.<br />

Java2D Edge Detectors Java2D does not provide a predefined edge detection mech-<br />

anism. In order to implement an edge detector, actual pixel values of an AWT image<br />

have to be processed. There are two ways to access these individual pixel values.<br />

The image manipulation features in AWT are primarily aimed at modifying individ-<br />

ual pixels as they pass through a ’filter’. A stream of pixel data is sent out by a<br />

ImageProducer, passes through the ImageFilter, and onto an ImageConsumer. The<br />

pre-defined ImageFilter subclass for processing individual pixels is the RGBImage-<br />

Filter. As the data is pushed out by the producer, this model is known as the push<br />

model. An alternative approach is to use the PixelGrabber class to collect all the pixel<br />

data from an image into an array, where it can then be conveniently processed. In<br />

this case, use must also be made of MemoryImageSource to funnel the changed array’s<br />

data as a stream to a specified ImageConsumer. Figure 3.5 shows an overview of the<br />

2 pixel acquisition methods.<br />

(a) (b)<br />

Figure 3.5.: Workflow of fetching individual pixels with Java2D.<br />

47


3. Input Data and Its Preparation<br />

JAI Edge Detectors JAI provides the GradientMagnitude operation, an edge de-<br />

tector that computes the magnitude of the image gradient vector in two orthogonal<br />

directions. It performs two convolution operations on the source image by detecting<br />

edges in horizontal and vertical direction. The algorithm then calculates the gradient<br />

norm of the two intermediate images.<br />

The result of the GradientMagnitude operation may be defined as<br />

dst[x][y][b] = � (SH(x, y, b)) 2 + (SV (x, y, b)) 2 (3.4)<br />

where SH(x, y, b) and SV (x, y, b) are the horizontal and vertical gradient images gen-<br />

erated from band b of the source image by correlating it with the supplied orthogonal<br />

gradient masks. The default masks for the GradientMagnitude perform a Sobel edge<br />

enhancement.<br />

3.3.4. Further Improvements<br />

The Canny edge detection mechanism gives good results for the preprocessing. Still,<br />

this method could be improved by face specific preprocessing. Different image features<br />

have a different edge intensities. The eye region, for example, provides a wide range<br />

of hard gradient transitions. The mouth region does not have these clear boundaries,<br />

and especially the lower edge of the lower lip is often lost. An improved preprocessing<br />

algorithm could assume the image regions of facial features and treat these regions<br />

with a different intensity of the Gauss smoothing. With this technique, the mouth<br />

region edges could be improved, and interferences in less important regions, like the<br />

cheeks, could be omitted. Another possibility could be to add feature specific checks<br />

after the edge detection process. These checks could then determine if an important<br />

facial edge is missing and could close this gap by reanalyzing the input data. A<br />

straightforward way to improve the results for the mouth region could be to weight<br />

the red channel higher during the grayscale image production, as the most recognizable<br />

difference between the mouth and the alongside skin parts is in the intensity of the<br />

red channel.<br />

48


3.4. Summary<br />

3. Input Data and Its Preparation<br />

In order to fulfill the requirements of the tracking algorithm, the input data has to be<br />

read and manipulated in a proper way. We therefore examined different APIs to grab<br />

frames from video input data, and decided to work with JMF, as it is available on<br />

different platforms, is freely available, and promises to be practicable. We also defined<br />

basic preconditions to the video data, and selected MPEG-1 sample data for testing<br />

purposes. In order to be applicable for the subsequent tracking algorithm, the video<br />

frames are processed with an edge detection mechanism, where we decided to look at<br />

both Java2D and JAI based functionality, preferably using a Canny edge detection<br />

method, as it shows the best tracking results. In the next section we go into detail<br />

with the code development of the Java tracker.<br />

49


4. Programming<br />

4.1. Overview<br />

After selecting the necessary libraries and methodologies, we can now illustrate the<br />

development of the Java tracker. We therefore describe the architecture, and state<br />

implementation details. We chose a modular program structure to ease the exchange<br />

of components and to separate the performed tasks. The graphical representation<br />

of tracking information is strictly separated from the data representation. All these<br />

design decisions are described in Section 4.2. Additionally, we explain the basic appli-<br />

cation flow and the implementation of the tracking algorithm. Section 4.3 then gives<br />

an insight of the implementation process. It describes the working environment and<br />

states problems that arouse during the coding phase.<br />

4.2. Architecture<br />

4.2.1. Structure<br />

The Java feature tracker is split up into 5 packages, grouped by the tasks of con-<br />

taining classes. These packages are the GUI, the graphical data representation layer,<br />

the domain layer, the data storage layer and the controlling and connecting classes.<br />

Figure 4.1 illustrates the implemented classes and according packages, the following<br />

paragraphs describe the functionality of each group.<br />

Controlling and connecting classes<br />

The controlling and connecting classes are responsible to establish and manage the<br />

communication between other packages, and therefore also manage the instantiation<br />

of important facade objects. For that purpose, the package has to provide constant<br />

values for the exchange of states between packages.<br />

50


4. Programming<br />

The Main class, the startup object for the program, is part of this package. It creates<br />

the main GUI object and the TrackerController, and connects the two instances by ex-<br />

changing a StateChangedListener. The TrackerController is an interface that receives<br />

commands and propagates them to underlying domain classes. The TrackerFacade is<br />

the implementing class that accomplishes this task. The StateChangedListener is an<br />

interface that manages the communication from lower layers to the user interface and<br />

logfiles.<br />

Graphical User Interface<br />

The GUI holds the Swing interface that is presented to the user. It is implemented<br />

in the TrackerWindow class, which is created and linked to the remaining applica-<br />

tion. The TrackerWindow additionally encapsulates an implementation of the State-<br />

ChangedListener interface. This subclass is responsible to log application messages or<br />

print them to the user interface.<br />

Graphical Data Representation<br />

The graphical data representation provides functionality to generate a drawable rep-<br />

resentation of the BSP tracking tree, and presents all video and tracking information<br />

to the user. When the main GUI class requests the video visualization and tracking<br />

component from the TrackerController, a TrackerComponent interface is returned.<br />

It is implemented by the TrackerPanel, which manages the current image selections,<br />

the correct display of the current video frame, and the according tracking data. For<br />

each selection, the TrackerComponent holds a BSPTree2D, the visual equivalent to a<br />

BSPTree, and the Extrema2D, that is the leaf nodes with minimum or maximum x-<br />

or y-values. Both classes implement the interface Drawable, which eases the presen-<br />

tation of the shapes onto a Graphics context. They are compositions of Figures2D,<br />

an interface which is implemented by Points2D in the case of Extrema2D, and by<br />

BSPFigure2D in the case of BSPTree2D. The latter holds drawing information of the<br />

ellipses, centroids, major and minor axes, represented by the subclasses of BSPFig-<br />

ure2D, separately for leafs and inner nodes of the BSPTree.<br />

51


Domain Layer<br />

4. Programming<br />

The domain layer encapsulates the core functionality of the Java feature tracker. It<br />

consists of 3 main parts: the classes responsible for video frame extraction, for pre-<br />

processing of frames, and for the tracking process itself. In each of these parts, the<br />

responsible class is created by a factory which returns an instance of the specific<br />

interface or abstract class. The TrackerFacade then communicates with these inter-<br />

faces. For the extraction of the video frames, the FrameExtractorFactory creates an<br />

instance of the FrameExtractor interface. For preprocessing, the PreprocessorFactory<br />

creates the Preprocessor. The core element of the program, the feature tracking im-<br />

plementation, is created by the RegionTrackerFactory, which returns a subclass of the<br />

RegionTracker. This class builds the tree of BSPNodes and returns the root node.<br />

Storage Layer<br />

The storage layer is responsible to store important tracked data for further usage, like<br />

for evaluation or higher-level tasks. The data is therefore saved to a TrackedRegion<br />

and returned to the TrackerFacade. The facade then forwards this information to the<br />

implementation of the TrackedDataController that manages the received data and files<br />

it to a specified location.<br />

52


domain<br />

Main<br />

><br />

StateChangedListener<br />

gui<br />

FrameExtractorFactory<br />

><br />

FrameExtractor<br />

FrameAccessPrePerformed FrameAccess<br />

TrackerWindow<br />

MyStateChangedListener<br />

bsp<br />

TrackerComponent<br />

TrackerPanel<br />

BSPCentroids2D<br />

JMFSnapper<br />

BSPNode<br />

4. Programming<br />

*<br />

BSPEllipses2D<br />

AboutDialog<br />

BSPTree2D<br />

BSPFigure2D<br />

PreprocessorFactory<br />

><br />

Preprocessor<br />

JAIEdgeDetector<br />

RegionTrackerFactory<br />

RegionTracker<br />

RGBRegionTracker<br />

SimpleFileFilter<br />

Figure2D<br />

* *<br />

TrackerFacade<br />

uses JMF, JAI uses JMF uses JMF<br />

uses JAI<br />

><br />

Drawable<br />

BSPMajorAxes2D BSPMinorAxes2D<br />

><br />

TrackerController<br />

PreprocessorException<br />

CannyEdgeDetector<br />

Figure 4.1.: Class Diagram<br />

53<br />

BinaryInvRegionTracker<br />

*<br />

Extrema2D<br />

Points2D<br />

ViewPreferences<br />

data<br />

><br />

TrackedDataController<br />

TrackedDataHandler<br />

TrackedRegion


4.2.2. Basic Application Flow<br />

4. Programming<br />

After describing the involved classes of the Java feature tracker, we now look at the<br />

way how the important classes communicate during runtime. The basic workflow is<br />

mainly managed by the TrackerController. As illustrated in Figure 4.2, the class re-<br />

ceives all calls from the GUI, such as the openVideo, playVideo or process commands.<br />

The TrackerController then propagates the command to the responsible class.<br />

During the openVideo command, the FrameExtractor is called, which extracts frames<br />

from the video input data. If a new image frame is available, the StateChangedListener<br />

is notified, and this object then updates the TrackerComponent, the visual component<br />

that displays video frames in the user interface. The tracker component is also re-<br />

sponsible to call the Preprocessor and request the preprocessed image. Other video<br />

playback functions trigger a similar process.<br />

process is called to execute the actual feature tracking. For that purpose, the method is<br />

redirected to the RegionTracker, which then creates the BSP tree for the current video<br />

frame. The TrackedDataController is responsible for saving the tracked information<br />

to a file.<br />

Figure 4.2.: Overview of the basic application workflow<br />

54


4.2.3. Tracking Algorithm<br />

4. Programming<br />

The basic application workflow, as described in the previous section, has one core<br />

element, the RegionTracker. It is responsible to produce feature points out of a pre-<br />

processed image. It therefore creates a hierarchical BSP tree, where each node holds<br />

shape information for a certain part of the image. This process is based on the work<br />

of Rocha et al. [2002], as described in Section 1.3.3. Most of the tracking procedure is<br />

implemented in the class BSPNode that is responsible for the creation of a BSP tree.<br />

The basic procedure of the BSPNode works in three steps:<br />

1. Add image foreground pixels to the BSPNode (see procedure 4.2).<br />

2. Calculate orientation values of the added points (see procedure 4.3).<br />

After this step, the flag isCalculated is set to true, so that subsequent procedures<br />

can verify that the calculation step has not been left out.<br />

3. Create a BSP tree by subdividing the current node (see procedure 4.4). Return<br />

the current node as root.<br />

In order to save image information, the BSPNode holds three arrays: X and Y<br />

for the position of point P (x, y), and array V for the image intensity value of the<br />

point. If the algorithm only works with binary images, the intensity values in V<br />

is set to 1. The number of pixels that were already added to the node is held in<br />

ipix. The image moments are named mpq with p and q set to 0, 1, 2 (see the moment<br />

calculations in procedure 4.2). Additionally, every BSPNode holds a reference to<br />

the StateChangedListener (called listener) to propagate informations or errors to the<br />

user, and to a TrackedRegion (called trackedRegion) to permanently save tracking<br />

information. The tracking procedure is started by the class RegionTracker, which<br />

currently traverses all pixels of a rectangular image raster and adds all foreground<br />

pixels (that is pixels with a non-zero intensity value) to the root node (see procedure<br />

4.1).<br />

55


Functions of Class RegionTracker<br />

4. Programming<br />

The abstract class RegionTracker is responsible to trigger the feature tracking pro-<br />

cess. In its method createBSPTree, the class creates the root node of the BSP tree,<br />

and carries out all steps that are necessary for this node. The class is derived by<br />

the BinaryInvRegionTracker, which works with binary images. It therefore takes the<br />

intensity value of the image pixel at band 0 of the raster and inverts it. (In the pre-<br />

processed image, all image pixels that belong to the image area are black (=0). After<br />

inverting, the value of image area pixel is 1).<br />

Procedure 4.1: createBSPTree(levels)<br />

1. Create a new BSPNode N with the following parameters:<br />

(a) The image raster r. It contains the pixels of a certain region of the image.<br />

(b) The maximum number of non-zero pixels nmax, in this case the number of<br />

pixels in the raster (widthr ∗ heightr).<br />

(c) A listener and a trackedRegion for feedback and data storage purposes.<br />

2. For each position (i, j) in the raster, do the following:<br />

(a) Let (xmin, ymin) be the position of the upper left raster point in the image.<br />

Fetch rxmin+i,ymin+j, the pixel value at postion (xmin + i, ymin + j) in the<br />

raster.<br />

(b) If ri,j �= 0, add point (i, j) to the node N.<br />

3. If at least one point was added to the root node, do:<br />

(a) Call function calculateValues() of node N.<br />

(b) Call function subdivide(levels-1) of node N.<br />

4. Return N<br />

56


Functions of Class BSPNode<br />

4. Programming<br />

After having described the initial function calls to the BSPNode, we now look at the<br />

inside of these functions. The 3 substantial functions of the BSPNode are addPoint,<br />

calculateValues and subdivide. By splitting up the tracking process into these methods,<br />

it is possible to add region points independently to the current node. During the<br />

initialization of a new child node, the points that will be added to this node are<br />

not yet known. Moreover, the independent calculation of localization values allows<br />

for a method execution only if it is really needed, and for additional checks between<br />

calculation and subdivision.<br />

Procedure 4.2: addPoint(x, y, val)<br />

1. Add the point to the arrays: Xipix ← x, Yipix ← y and Yipix ← val<br />

2. Increase the number of pixels: ipix ← ipix + 1<br />

3. Add up to the moments:<br />

(a) m00 ← m00 + val<br />

(b) m10 ← m00 + x ∗ val<br />

(c) m01 ← m00 + y ∗ val<br />

(d) m11 ← m00 + x ∗ y ∗ val<br />

(e) m20 ← m00 + x 2 ∗ val<br />

(f) m02 ← m00 + y 2 ∗ val<br />

57


4. Programming<br />

Procedure 4.3: calculateValues()<br />

1. Check if m00 is 0. If so, no pixels are set in this node and all further calculations<br />

are skipped. Return false in this case.<br />

2. Calculate the image centroid c(xc, yc):<br />

(a) xc ← m10<br />

m00<br />

(b) yc ← m01<br />

m00<br />

3. Calculate the second order central moments µ20, µ11 and µ02:<br />

(a) µ20 ← m20<br />

− x<br />

m00<br />

2 c<br />

(b) µ11 ← m11<br />

− xcyc<br />

m00<br />

(c) µ02 ← m02<br />

m00<br />

− y2 c<br />

4. Calculate θ. Two special cases have to be treated separately: µ11 = 0 and<br />

m20 = m02. The treatment of these cases was determined by program test runs.<br />

(a) If µ11 is 0, set θ as following:<br />

– If µ02 < µ20: θ ← 0<br />

– Else: θ ← π<br />

2<br />

(b) If m20 = m02 do:<br />

– If µ11 < 0: θ ← π<br />

4<br />

– Else: θ ← 3π<br />

4<br />

(c) Else, do the default calculation:<br />

θ ← tan−1 � �<br />

2µ11<br />

/2<br />

µ20 − µ02<br />

Note that Math.atan2(y, x) should be used instead of Math.atan( y<br />

x )<br />

in Java. Otherwise, a sign error could lead to an angle rotated 90�.<br />

5. Set the flag isCalculated to true.<br />

6. Return true.<br />

58


4. Programming<br />

Procedure 4.4: subdivide(levels)<br />

1. Check if flag isCalculated is true. Otherwise, stop the subdivision.<br />

2. If the current node is a leaf node, that is if no levels to compute are left<br />

(levels = 0), add the point to the trackedRegion and exit.<br />

3. Create the child nodes C1 and C2.<br />

The constructor parameter nmax (the maximum number of non-zero-pixels) is<br />

set to the number of non-zero pixels of the current node (ipix).<br />

4. Calculate the orthogonal angle to θ (the angle of the x-axis with the minor axis):<br />

θ⊥ = (θ + π<br />

2 ) mod π<br />

5. Iterate over all points. For every point P = (Xi, Yi) with the intensity value Vi,<br />

proceed as described:<br />

(a) If the point is the current centroid (Xi = xc and Yi = yc), add P to both<br />

child nodes.<br />

(b) Otherwise, divide the image area along the minor axis of the best fitting<br />

ellipse. For this, the reference system is shifted to have the centroid as the<br />

origin. Then the angle θP ′, the angle of the shifted point P ′ with the x-axis,<br />

is computed. The difference between θP ′ and θ⊥ is then taken to decide if<br />

the point is added to C1 or C2 (see Figure 4.3):<br />

1. Calculate θP ′. The y-value of P ′ is mirrored along the x-axis to correspond<br />

to the standard Cartesian coordinate system:<br />

π<br />

θP ′ ← 2 − atan2(xp − xc, −(yp − yc))<br />

2. Calculate the difference angle β: β ← θP ′ − θ⊥<br />

3. Verify that β is between −π and +π:<br />

If β ≤ −π, then β ← 2π + β<br />

4. If β ≤ 0 or β = π, then add point P to C1.<br />

5. If β ≥ 0, then add point P to C2.<br />

(c) Check that the current point was added in at least one child node.<br />

6. For both child nodes C1 and C2, call the function calculateValues(). If it returns<br />

true, call the function subdivide(levels-1). Otherwise, set the child node to null.<br />

(As it is an empty node, it is not used any more).<br />

59


4. Programming<br />

(a) (b)<br />

Figure 4.3.: Angle calculation for raster subdivision. The best-fitting ellipse and the<br />

according splitting axis are defined for an image area (a). To decide what<br />

side of the splitting axis point P belongs to, the coordinate system is<br />

shifted to have the centroid C as origin (b).<br />

4.3. Implementation Process<br />

After showing the architecture, the basic workflow, and details of the tracking algo-<br />

rithm, we now describe the process of implementation. We therefore state the working<br />

environment for the development as well as the problems that arouse during the im-<br />

plementation of the previously described architecture.<br />

4.3.1. Working Environment<br />

The programming was done on a SuSE Linux platform, with Eclipse SDK in version<br />

3.1.0 for the Java development. NetBeans 4.1 was used for building the GUI. Code<br />

was written compatible with Java 1.4, but it was tested with both Java 1.5.0 01 and<br />

1.4.2 08 on Linux, and 1.4.2 on Windows. The JMF was used in the 2.1.1e Linux<br />

performance pack version (and the Windows performance pack for Windows testing),<br />

as the reading of the MPEG-1 files does not work with the pure java crossover version<br />

of JMF. Poseidon for UML Community Edition 3.0.1 was used for the initial class<br />

design and code generation.<br />

60


4.3.2. Difficulties<br />

4. Programming<br />

During the development process of the Java movement tracker, we had to deal with<br />

some difficulties. The major drawbacks and delays originated in three main parts of the<br />

architecture: the implementation of the tracker algorithm, the video frame extraction,<br />

and the preprocessing methods.<br />

Algorithm<br />

The main problem during implementation of the tracking algorithm was the correct<br />

calculation of the orientation angle, and to find a straightforward way to subdivide the<br />

current image area into two child areas along the minor axis. The calculation of the<br />

angle θ required special treatment because the method sometimes delivers the sought<br />

angle rotated 90�. We found a hint that this problem can arise if the inverse tangent<br />

is calculated with Math.atan instead of Math.atan2. Even though we implemented<br />

this change, the problem is still not solved in all cases. Furthermore, there are two<br />

special cases where the standard angle calculation formula does not work: if µ11 = 0,<br />

or if µ20 = µ02. We solved this problem by manually testing various cases with<br />

different angles for these exceptions. Hence, we came up with values for θ that deliver<br />

satisfactory results. Though, we did not prove these values. For the area splitting, we<br />

first worked with linear equations in the form y = kx+d, and in the point-vector form.<br />

Both methods required additional conversions using the (invert) tangent. After some<br />

test runs we came up with the solution described in procedure 4.4. It is based on a shift<br />

of the coordinate system to have the centroid as its origin. Then, the difference angle<br />

between the θ and the angle between axis and shifted point P ′ is taken for decision<br />

making (see Fgure 4.3 for an illustration).<br />

Frame Extraction<br />

For the video frame extraction, we currently use a modified version of the sample class<br />

FrameAccess.java that is provided in the JMF guide [JMF, p. 54]. However, this code<br />

completely traverses the video, and start/stop functionality is only possible with rough<br />

workarounds. Other possibilities, like buffering all images in the cache is not possible<br />

due to limited memory.<br />

61


4. Programming<br />

Caching images as files is too slow and generates too much file IO. We then tried a so-<br />

lution based on the class Seek.java 1 . It uses the FramePositioningControl helper class<br />

to access single video frames. However, this code did not work with our input videos<br />

(with both Linux and Windows JMF performance pack versions), as it returns 0 as a to-<br />

tal number of video frames. A ray of hope is the JMFSnapper implementation provided<br />

by Davison (http://fivedots.coe.psu.ac.th/~ad/jg/ch283/index.html). It is<br />

described in a draft chapter of the book “Killer Game Programming in Java”[Davison,<br />

2005] and explains a solution without using the FramePositioningControl. It works<br />

fine and is fast, but it is not yet completely integrated in the framework of the Java<br />

feature tracker. This would be a possibility for further enhancements.<br />

Edge detection<br />

The first aim was to implement the Canny edge detection algorithm using JAI. De-<br />

scriptions on how to proceed were vague and did not give enough help for the cod-<br />

ing. We found a project, called Beeegle, that uses a JAI Canny implementation,<br />

but the downloadable source code is incomplete and defective (http://beeegle.nl/<br />

modules/sections/index.php?op=listarticles&secid=2). Consultation with the<br />

authors showed that the code is not used any more and will therefore not be updated<br />

or corrected. Hence, we reverted to an implementation that is provided in a Java forum<br />

(http://forums.java.sun.com/thread.jspa?threadID=546211&start=45&tstart=<br />

1) and adapted it to fit into the program architecture. Later on we tried to imple-<br />

ment a second edge detection mechanism, a simple Sobel operator proposed by the<br />

JAI framework. If the method processes an image that is fetched with JAIs fileload<br />

method, the operator is very fast and delivers good results. Integrated into the Java<br />

feature tracking, the function did not work correctly. The process was very slow and<br />

delivered a binary image, even though a grayscale image was expected. After in-<br />

quiry, we found out that the image type of the BufferedImage differs in the two cases.<br />

In the latter case, the image is fetched from an AWTImage, which returns the type<br />

TYPE 3BYTE BGR. Then the method getAsBufferedImage of the PlanarImage re-<br />

quires 1 to 3 seconds for processing. This problem could not be solved for the time<br />

being.<br />

1 an official JMF solution provided on the Java Sun homepage (http://java.sun.com/products/<br />

java-media/jmf/2.1.1/solutions/Seek.java).<br />

62


4.4. Summary<br />

4. Programming<br />

The architecture of the Java feature tracker is split up into 5 packages, showing a<br />

modular structure with 3 exchangeable parts, responsible for frame extraction, pre-<br />

processing and feature tracking. Because of a factory-based construction of these<br />

parts, each of them can be replaced and therefore allows for a comparison of differ-<br />

ent approaches. This is not only important for future enhancements, but also for the<br />

development process when problematical code could easily be exchanged. Difficulties<br />

mainly arouse in 3 major parts. Angle calculations and subdivision needed special<br />

treatment and increased testing; JMF was not as comfortable as it probably claims to<br />

be; and JAI is not yet used for preprocessing, even though the basic JAI edge detection<br />

mechanism would be faster.<br />

63


5. Evaluation<br />

5.1. Overview<br />

To evaluate the quality of the developed Java feature tracker, we first look at the<br />

basic program functionalities and then focus on two main aspects: the quality of the<br />

extracted feature points, and the time consumption. In a first step, we evaluate the<br />

abilities that users have via the user interface. Then we rate extracted feature points of<br />

a mouth region, and compare the elapsed time for the preprocessing and the tracking<br />

methods in different circumstances.<br />

5.2. Program Abilities<br />

The developed Java feature tracker is able to find facial feature points of manually<br />

preselected feature areas in a sequence of video frames. In a series of steps, the users<br />

first open a video, and set selections for the areas to process on either the original or<br />

the preprocessed video frame. By selecting the process-button, the program starts the<br />

creation of the BSP tree, memorizes the result and presents it frame-by-frame in the<br />

GUI. After running through the video frames, the users can save the feature points to<br />

a Comma Separated Values (CSV) file.<br />

The users can select or unselect various tracking information for visualization. They<br />

can look at algorithm-specific data like the 16 tracked centroid points, and the cor-<br />

responding ellipses and ellipse axes. 4 (or more) of these 16 points are then called<br />

features, which are the points with the biggest or smallest x/y-values. These feature<br />

points can be viewed separately in the GUI. The user interface allows for basic flow<br />

control of the video playback, but has some limitations. Play, stop and the display<br />

of the next frame work fine. However, pausing a video is faulty, and the display of<br />

the previous frame is not implemented. Moreover, the program architecture provides<br />

customizing options, like the color and stroke size of the feature display, or the se-<br />

lection of the preprocessor and the frame extractor. A GUI for these features is not<br />

implemented yet, but can be added with little additional effort.<br />

64


5.3. Tracking Quality<br />

5. Evaluation<br />

First observations of the tracked feature points show that preselections on single areas<br />

deliver acceptable results. The most accurate output was achieved for an eyebrow<br />

selection. Mouth selections have good results except for the lowest point of the lower<br />

lip, where the preprocessing is not able to find a continuous edge. Since there is no link<br />

between feature calculations of two subsequent frames, the location of this point may<br />

flip horizontally from the left side of the mouth in frame n to the right side in frame<br />

(n + 1). Eye and nose selections require exact area preselection, as nearby edges may<br />

disturb the tracking process. However, in contrast to the snake algorithm, a selection<br />

of a larger area without disturbing pixels does not change the result, as background<br />

pixels (0-value pixels in binary images) do not have an influence on the calculation.<br />

Calculated points of the implemented Java feature tracker do not necessarily match<br />

with standardized feature points, as the algorithm has no knowledge about the under-<br />

lying image section. It therefore processes every image region selection in an equal way,<br />

without knowing if the produced feature point is, for example, the corner of the mouth.<br />

Figures 5.1 and 5.2 show feature points that were produced by the Java feature<br />

tracker. All test runs illustrated in Figure 5.1 produced satisfactory outcome. Image<br />

(a) and (b) have only minor deviations, the left corner of the mouth in image (a), for<br />

example, is slightly too low (that is a too high y-value). Images (c), (d), (e), and (f)<br />

show the problem with the lowest point of the lower lip. The point is either on the<br />

left or on the right side of the desired centered position. Results on other face regions<br />

are illustrated in (g) and (h). Figure 5.2 shows a whole-face feature tracking process,<br />

which works with 6 image selections. As illustrated in (b), the 16 tracked points per<br />

selected regions correctly approximate the contour of the underlying facial feature. (d)<br />

shows the Canny preprocessed image, with the discontinuity of the lower line of the<br />

lower lip.<br />

65


5. Evaluation<br />

(a) (b)<br />

(c) (d)<br />

(e) (f)<br />

(g) (h)<br />

Figure 5.1.: Tracking results for selective regions.<br />

66


5. Evaluation<br />

(a) (b)<br />

(c) (d)<br />

Figure 5.2.: Tracking result for 6 area selections: 2x eyebrow, 2x eye, nose, mouth.<br />

The produced features are in (a) and all centroids of the leaf-nodes in the<br />

BSP tree in (b). The preselected image areas are visible in (c), (d) shows<br />

the outcome in the preprocessed Canny edge image.<br />

67


5.3.1. Test Data<br />

5. Evaluation<br />

For the statistical evaluation, we have selected a video that shows a mouth movement.<br />

From our test data (described in Section 3.2.3) we have chosen the recording of AU<br />

23, described as Lip Tightener, performed by the facial muscle Orbicularis oris 1 .<br />

5.3.2. Technique<br />

For the mouth region testing, we perform a mouth region selection of 60x20 pixels,<br />

starting at point (233, 219). We examine the corners of the mouth, as these two<br />

points are best comparable and straightforward to examine. The left point (from<br />

the viewers perspective) is called point1, its coordinates are (x1, y1); the right point<br />

is called point2, with the coordinates (x2, y2). Three different parties have collected<br />

data: Two human testers manually examined the two features. The first tester is the<br />

developer of the Java tracker, female, 21 years old (called man1 from now on). Tester 2<br />

is an unprejudiced male, 15 years (called man2). The third input comes from the data<br />

extracted by the algorithm (called algo). Figure 5.3 shows two resulting frames, frame<br />

number 106 with almost congruent results (0 or 1 pixel difference), and frame number<br />

88 with the most dissimilar tracking points (up to 5 pixels difference) in the described<br />

testcase.<br />

(a) (b)<br />

Figure 5.3.: Good and bad results: a frame with almost identical tracking points (a),<br />

and the frame with the biggest differences (b). A white pixel is the selec-<br />

tion by man1, blue by man2, and green is calculated by algo.<br />

For the statistical calculations and diagram extraction we used Gnumeric in version<br />

1.2.13, OpenOffice.org 2.0 beta, and SPSS 11.<br />

1 the overview of action unit description can be found at http://www.cs.cmu.edu/afs/cs/project/<br />

face/www/facs.htm, the recent AU manual is available at http://face-and-emotion.com/<br />

dataface/facs/new_version.jsp<br />

68


5.3.3. Statistical Evaluation<br />

5. Evaluation<br />

According to the testcase description in Section 5.3.2, we performed a test run with the<br />

implemented Java feature tracker and collected the manual capturing of the 2 human<br />

testers. The output is 3 different sources for both x- and y-value of each corner of the<br />

mouth. Figure 5.4 illustrates the result of the data ascertainment (see Appendix A.1<br />

for all values).<br />

point1<br />

point2<br />

Figure 5.4.: Positions of the corners of the mouth.<br />

In order to examine the correctness of the tracked feature points, we focus on 4<br />

aspects: First, we look at the absolute values and compare coordinate positions as well<br />

as curve progressions. Then we examine the quality of the program output relative to<br />

the manual tracking data, where we look at the curve progression and the relationhip<br />

between the curves.<br />

69


5. Evaluation<br />

Coordinate Position Looking at the means over all x/y values, it shows that espe-<br />

cially the algorithm-calculated coordinates of point2 differ from the manual selections<br />

(see Figure 5.1). The x-value is too high (too far right in the image region), and the<br />

y-value too low (too high in the image region).<br />

x1 y1 x2 y2<br />

algo 243.03 220.03 284.07 217.59<br />

man1 243.2 219.43 282.75 219.51<br />

man2 243.2 219.43 282.75 219.51<br />

Table 5.1.: Mean position of x/y coordinates.<br />

The source of this inaccuracy is most likely to be found in the preprocessing. As<br />

illustrated in Figure 5.5, the mouth contours of the preprocessed image are ragged<br />

and uncontinuous. This image also shows why the value of y2 is too high: a false edge<br />

outside the corner of the mouth is visible in the preprocessed image.<br />

(a) (b)<br />

Figure 5.5.: Preprocessing of mouth region. The original image selection (a), and the<br />

preprocessed version that produced ragged edges (b).<br />

Looking at the minima and maxima over all tracked frames, we see that the algo-<br />

rithm data has more outliers than the manually determined data. For example, the<br />

y-coordinate of point2 has a minimum of 215 where the manual testers reach values<br />

of 218 and 219. The maximum of this point is not higher, as the algorithm generally<br />

produces too low y-values (see Table 5.2).<br />

70


5. Evaluation<br />

Minima Maxima<br />

x1 y1 x2 y2 x1 y1 x2 y2<br />

algo 210 218 277 215 251 224 289 219<br />

man1 236 218 278 218 249 221 288 221<br />

man2 237 219 278 219 251 222 288 222<br />

Table 5.2.: Minima and maxima of x/y coordinates.<br />

We assume that both average and min/max values would improve in case of a clearer<br />

and continuous edge detection.<br />

Curve Progression As a next step, we want to examine if the curve progression is<br />

continuous, or if the coordinate values change strongly from one frame to the next. We<br />

therefore determine the squared difference between two subsequent video frames (see<br />

Figure 5.6) and calculate the sum over all 136 frames, as well as the maximum and<br />

mean value (see Table 5.3). The curve progression of algo tends to be more erratic and<br />

oscillative than of man1 and man2. The maximum oscillation is in any case produced<br />

by algo. Taking the sum over all values, algo has significantly higher values. For the<br />

x-coordinates, man2 delivers significantly better results, man1 is closer to the values<br />

of the algorithm. In terms of the y-coordinates, algo is significantly worse than both<br />

manual testers (see Table 5.3 for the numbers).<br />

x1 y1 x2 y2<br />

man1 man2 algo man1 man2 algo man1 man2 algo man1 man2 algo<br />

sum 197 116 220 29 16 124 108 78 211 27 16 54<br />

max 25 9 25 1 1 16 9 16 16 1 1 9<br />

avg 1.46 0.86 1.63 0.21 0.12 0.92 0.8 0.58 1.56 0.2 0.12 0.4<br />

Table 5.3.: Sum, maximum and average of d 2 on subsequent frames.<br />

71


point1<br />

point2<br />

5. Evaluation<br />

Figure 5.6.: d 2 of subsequent video frames for both x and y coordinate of point1 and<br />

point2.<br />

Curve Relationship In order to determine how the data of algo, man1 and man2 are<br />

related to each other, we first calculate the correlation of the curves in Figure 5.4.<br />

Correlations x1 y1 x2 y2<br />

man1 – man2 0.97148 0.84148 0.97423 0.84805<br />

man1 – algo 0.97919 0.7235 0.96173 -0.073<br />

algo – man2 0.98083 0.67658 0.95887 -0.17251<br />

standard derivation 0.00499 0.08496 0.00816 0.56269<br />

Table 5.4.: Correlation results.<br />

72


5. Evaluation<br />

These correlations show good results for the algorithm in respect to the manual<br />

testers. The x-value of point1 shows correlations around 97-98%, and a standard<br />

deviation lower than 0.5%. Only the y-value of point2 does not have satisfactory<br />

results. The manual testing values do not correlate with the output of the algorithm,<br />

the standard deviation is over 56%. Table 5.4 shows all results. In addition to the<br />

correlation, we look at the similarity by calculating the square of the frame-by-frame<br />

difference between the testers and performing a t-test on this data. Table 5.5 shows<br />

the result of this approach.<br />

point1 x1 y1<br />

T P (T


5. Evaluation<br />

The output of the t-test has to be treated with caution, as the precondition of having<br />

normal distributed values is not fulfilled [Smith, 2002] (both Kolmogorov-Smirnov<br />

and Shapiro-Wilk tests return significances below 0.000). The reason why we still<br />

calculated the t-test is that the curves are predominantly bell-shaped, and that we<br />

expect a normal distribution with a greater sample size. Still, the t-test inspects<br />

the x/y-coordinates individually, but does not deal with the overall quality of the<br />

two feature points. The Analysis of Variance (ANOVA) calculation in the following<br />

paragraph tries to fill this gap.<br />

Relative Point Position In order to compare the position of the automatically de-<br />

tected feature points with the points determined by the manual tester, we take a closer<br />

look at the (squared) distance between curves of different testers. Table 5.6 shows the<br />

sum of all d 2 ’s, as well as minima and maxima of all tracked frames. The distances<br />

between the curve of one manual tester and the algorithm are significantly higher than<br />

between the manual testers. man1 is still closer to algo than man2.<br />

man1 – man2 x1 y1 x2 y2<br />

sum 189 251 114 224<br />

maximum 9 9 4 4<br />

average 1.39 1.85 0.84 1.65<br />

man1 – algo x1 y1 x2 y2<br />

sum 223 166 453 779<br />

maximum 9 9 16 25<br />

average 1.64 1.22 3.33 5.73<br />

algo – man2 x1 y1 x2 y2<br />

sum 192 183 323 1593<br />

maximum 16 9 9 36<br />

average 1.41 1.35 2.38 11.71<br />

Table 5.6.: Sum, maximum and average of d 2 of different tester’s values.<br />

74


5. Evaluation<br />

For examining the overall performance of the two algorithm-tracked feature points<br />

in respect to the manual testers, we calculate ANOVA for the d 2 values described<br />

above. In order to clean up the data and make the data more likely to be normal<br />

distributed, we calculate the extrema and removed them from the data set. Table<br />

5.7 shows the output of this extrema calculation. Note that point2 has less extrema<br />

(except the x-coordinate of d 2 (man1 − algo)), but with higher values (up to >= 36).<br />

We therefore assume that point1 has more outliers than point1.<br />

x1 y1 x2 y2<br />

d 2 (man1 − algo) – 16 >= 4 24 >= 9 2 >= 25<br />

d 2 (man2 − algo) 22 >= 4 22 >= 4 13 >= 9 5 >= 36<br />

Table 5.7.: Extrema of d 2 between the algorithm and each of the manual testers.<br />

After removing the extrema from the data list, we recalculate ANOVA on the<br />

cleaned-up data set. According to a test of homogeneity of variances, the new data set<br />

is not significantly homogenous (with a significance of 0.000). In order to evaluate the<br />

overall performance of the feature points, we calculate a contrast test where we divide<br />

the d 2 -results into 2 groups and compare their mean values. We perform 3 groupings:<br />

all values of man1 (d 2 of x1, y1, x2, and y2) compared to all values of man2; all d 2 of x-<br />

coordinates (of both testers) compared to the d 2 ’s of y-coordinates; and the distances<br />

of point1 compared to the distances of point2. The results are illustrated in Table<br />

5.8. It shows that man2 has a bigger spread as man1, the difference to the algorithm<br />

is larger. The difference between the means of all y-values and those of all x-values<br />

is 11.417, so it is likely that there is an overestimation in the vertical direction. The<br />

spread between all point1-values and all point2-values is 16.529, so point2 is more<br />

likely to be overestimated.<br />

Contrast Spread Std. Error t<br />

avg(valman2) – avg(valman1) 3.997 0.9399 4.253<br />

avg(valys) – avg(valxs) 11.417 0.9399 12.147<br />

avg(valpoint2s) – avg(valpoint1s) 16.529 0.9399 17.586<br />

Table 5.8.: Contrast tests of d 2 between the algorithm and each of the manual testers.<br />

75


5. Evaluation<br />

After looking at the overall success of the feature points, we finally create the<br />

overestimation-table 5.9, where we can compare d 2 of one coordinate to the value<br />

of each other coordinate. The table shows that d 2 (y2) of man2-algo has the biggest<br />

difference to all other values, it has the largest spread and therefore varies most during<br />

the feature tracking process. x1 and y1 have the same spread, their error rate is likely<br />

to be similar.<br />

man1-algo man2-algo<br />

d 2 (x1) d 2 (y1) d 2 (x2) d 2 (y2) d 2 (x1) d 2 (y1) d 2 (x2) d 2 (y2)<br />

man1- d 2 (x1) – 1.081 -3.801 1.062 1.062 -9.147<br />

algo d 2 (y1) -1.081 – -1.495 -4.882 -1.116 -10.228<br />

d 2 (x2) 1.495 – -3.387 1.440 1.440 -8.733<br />

d 2 (y2) 3.801 4.882 3.387 – 4.826 4.826 3.766 -5.346<br />

man2- d 2 (x1) -1.026 -1.440 -4.826 – -1.061 -10.172<br />

algo d 2 (y1) -1.026 -1.440 -4.826 – -1.061 -10.172<br />

d 2 (x2) 1.116 -3.766 1.061 1.061 – -9.111<br />

d 2 (y2) 9.147 10.228 8.733 5.346 10.172 10.172 9.111 –<br />

Table 5.9.: Tamhane post-hoc test on d 2 . Mean difference (row − column) where it is<br />

significant in the 0.05 level.<br />

76


5.4. Time Usage<br />

5.4.1. Technique<br />

5. Evaluation<br />

For testing, we use the same input data as in Section 5.3.2. We investigate the time<br />

consumption of the two methods that are mainly responsible for the tracking process<br />

and may consume the most time. The first procedure is used for returning the feature<br />

points (see procedure 4.1 on page 56), the second is responsible for the edge detection.<br />

For testing the tracking method, we use the standard program, whereas in case of<br />

the preprocessing we use a testclass, called PreprocessingTimeTest, as it minimizes<br />

additional overhead. The time information is then extracted from the logfile.<br />

5.4.2. Statistical Evaluation<br />

Feature Tracking<br />

For the comparison of the implemented feature tracking algorithm, we examine 2 test-<br />

cases: a complete image region selection and a 60x20 pixels mouth region selection.<br />

During the first test run, we found that the results are disturbed by garbage collection<br />

latencies. Hence, we tune the JVM parameters to allow for parallel processing during<br />

garbage collection. The JVM used parameters are: -verbose:gc -Xms64m -Xmx256m<br />

-XX:NewRatio=2 -XX:+UseConcMarkSweepGC. Figure 5.7 shows the tracking time<br />

output for both region selections (all tracking time information can be found in ap-<br />

pendix A.2 and A.3).<br />

(a) (b)<br />

Figure 5.7.: Tracking time consumption for the complete region selection (a) and a<br />

60x20 mouth selection (b).<br />

77


5. Evaluation<br />

Remarkably, the JVM tuning considerably improves the tracking of the complete<br />

region, but has the opposite effect on the smaller selection. The tracking times lay<br />

between 3.94 ms (5.2 ms JVM tuned) for the mouth region, and 51.76 ms (21.4 ms<br />

JVM tuned) for the whole-image selection.<br />

Preprocessing<br />

To test the preparation of images, we compare two preprocessing implementations: the<br />

currently used Canny edge detector, and a standard JAI edge detection mechanism,<br />

which is based on the Sobel operator. As in the testing of the tracking method,<br />

we perform the tests with both standard and tuned JVM options. The results are<br />

illustrated in Figure 5.8 (all tracking values can be found in appendix A.4 and A.5).<br />

(a) (b)<br />

Figure 5.8.: Preprocessing time consumption for the Canny (a) and the JAI Sobel (b)<br />

edge detection.<br />

The figures show that the currently used edge detection mechanism is considerably<br />

slower than the JAI operator. The average processing time for Canny is 212.91 ms<br />

(156.42 ms with JVM tuning), in contrast to 1.27 ms (0.86 ms JVM tuned) for Sobel.<br />

Unfortunately, the excellent values of the JAI Sobel operator cannot be transmitted to<br />

the Java feature tracker. In both the test class and the final program the algorithm uses<br />

a BufferedImage. In case of the test class, the type of this image is TYPE INT RGB,<br />

in the final program it is TYPE 3BYTE BGR, as it the image is directly fetched<br />

from the video and not read from a saved .png image. In case of the latter type, the<br />

necessary image conversion requires processing times of more than 1 second.<br />

78


5.5. Summary<br />

5. Evaluation<br />

The evaluation showed that the developed Java feature tracker is able to deliver reason-<br />

able results. Important feature points can be located; differences in mean coordinate<br />

positions are below 2 pixels, and correlations of the produced feature points reach val-<br />

ues of up to 98%. The results differ per inspected feature point. For example, the left<br />

corner of the mouth showed a more accurate position as the right corner. Most signif-<br />

icant improvements can be reached by improving the preprocessing. This can be done<br />

by producing continuous, non-ragged feature contours that do not respond to light-<br />

ning changes, and omitting intensity changes in the image that do not belong to facial<br />

features. This step would also improve the oscillating behavior that can be noticed<br />

in the current version of the Java tracker. Another step would be to add geometrical<br />

transformations to link subsequent feature points, or to omit outliers by calculating<br />

more accurate values from the neighboring frames. Performancewise, the bottleneck<br />

of the tracking process is the preprocessing. It currently needs about 4 times as much<br />

time as the actual feature tracking. A JAI based edge detector would be faster for the<br />

image preparation, but further inquiry is necessary to find a workaround for extremely<br />

time-consuming image type conversions.<br />

79


Conclusions<br />

Overview<br />

The feature tracking solution described in this work is completely based on Java and<br />

finds facial feature points in preselected image regions of input videos. In this work,<br />

we explained the selection of algorithms and libraries as well as implementation and<br />

evaluation details. The main advantages of this solution are a (theoretical) platform-<br />

independence, little hardware requirements and small tracking efforts.<br />

Java turns out to be, with some constraints, very practicable as programming lan-<br />

guage for feature tracking problems. The time consumption is reasonably small and<br />

the implementation comfortable, as Java libraries are available for video processing and<br />

imaging tasks. Still, JMF, which was used for video frame extraction, does not keep all<br />

promises, as jumping between frames appears to be challenging or does not even work<br />

in the proposed way. JAI for image processing seems to be fast, but conversions be-<br />

tween image types are inapprehensibly time-consuming and could not be bypassed for<br />

the time being. For this reason, we alternatively used a Java2D based implementation.<br />

The applied algorithm, proposed by Rocha et al. [2002], shows the possibility to<br />

solve a complex task by splitting it up into smaller and therefore easier problems. The<br />

creation of a BSP tree, with nodes that hold object position, size and orientation, is<br />

straightforward and understandable. Nevertheless, we had to study different sources<br />

to verify equation definitions. The implementation is widely trouble-free, as the ap-<br />

proach is clearly specified in the underlying paper. However, problems arouse during<br />

the calculation of the orientation angle, where the method sometimes returns an angle<br />

rotated 90�. We did not yet find a solution for this difficulty. Still, the algorithm<br />

delivers good results for the feature tracking task. The accuracy of feature points<br />

often lies around 90%. Errors in the feature determination mostly have their origin in<br />

preprocessing discrepancies.<br />

80


Future Work<br />

Conclusions<br />

According to the listed problems, the major improvements of the current solution<br />

would be a revised JMF code, an enhanced preprocessing method and solved track-<br />

ing algorithm difficulties. Frame-by-frame traversion of the video could be enhanced<br />

by adding the previousFrame() functionality. Improved edge detection should make<br />

all important edges visible, like the lower line of the lower lip. Enhanced processing<br />

times could be reached by exclusively preprocessing selected image regions. Moreover,<br />

individual tracking cases with wrong object orientations have to be revised.<br />

In comparison to the commercial VeeAnimator, the Java feature tracker is inferior<br />

in a number of aspects. Image regions need manual initialization, it currently does<br />

not work with streaming media and is not able to track in realtime. The feature<br />

points still flutter if observed over a series of frames, they are not standardized, and<br />

do not deliver 3D information. The program could be changed to that effect by using<br />

geometric transformations between the shape information of subsequent frames (as<br />

described in the paper of Rocha et al. [2002]). JMF could be used to open video<br />

streams or could be replaced by alternative libraries. Automatic preselection of image<br />

areas could be introduced, for example by using facial shape models. These models<br />

could then be used to map the tracked points to a standardized image feature model<br />

by selecting feature points according to their proximity to model points. For simple<br />

3D information, the z-axis could be set to standard values.<br />

Summary<br />

The current program shows a straightforward and comprehensible feature tracking<br />

solution that provides basic tracking procedures. In case of further developments, for<br />

example by the Open Source community, the project could be a free and independent<br />

alternative in the field of feature tracking, facilitating high-level face processing tasks.<br />

Having Java as a basis, it would be suitable to professional 3D animations or future<br />

user interfaces on different platforms. Moreover, it could be used for teaching and<br />

further investigations in the field of computer vision. With this work we have unveiled<br />

the possibilities for future developments.<br />

81


Bibliography<br />

G. Antunes Abrantes and F. Pereira. MPEG-4 Facial Animation Technology: Survey,<br />

Implementation and Results. IEEE Transactions on Circuits and Systems for Video<br />

Technology, 9(2):290–305, March 1999.<br />

T. Awcock. Applied Image Processing. McGraw-Hill Companies, August 1995.<br />

S. S. Beauchemin and J. L. Barron. The Computation of Optical Flow. ACM Comput.<br />

Surv., 27(3):433–466, 1995. ISSN 0360-0300.<br />

D. L. Bimler, J. Kirkland, and K. A. Jameson. Quantifying variations in personal color<br />

spaces: Are there sex differences in color vision? Color Research & Application, 2:<br />

128–134, 2004.<br />

W. Burger and M. J. Burge. Digitale Bildverarbeitung. eXamen.press. Springer, 2005.<br />

C. Cédras and M. A. Shah. Motion Based Recognition: A Survey. Image and Vi-<br />

sion Computing, 13(2):129–155, March 1995. URL http://www.cc.gatech.edu/<br />

~jimmyd/summaries/cedras1995.html.<br />

J. Cohn, A. Zlochower, J.-J. J. Lien, and T. Kanade. Feature-Point Tracking by Op-<br />

tical Flow Discriminates Subtle Differences in Facial Expression. In Proceedings of<br />

the 3rd IEEE International Conference on Automatic Face and Gesture Recognition<br />

(FG ’98), pages 396 – 401, April 1998. URL http://www.ri.cmu.edu/pubs/pub_<br />

2075.html.<br />

R. Cutler and M. Turky. View-Based Interpretation of Real-Time Optical Flow for<br />

Gesture Recognition. http://citeseer.ist.psu.edu/cutler98viewbased.html,<br />

1998.<br />

A. Davison. Killer Game Programming in Java. O’Reilly, 1 edition, 2005. ISBN<br />

0-596-00730-2.<br />

82


Bibliography<br />

F. Dellaert and R. Collins. Fast Image-Based Tracking by Selective Pixel Integration.<br />

In ICCV 99 Workshop on Frame-Rate Vision, September 1999. URL http://www.<br />

ri.cmu.edu/pubs/pub_3195_text.html.<br />

P. Ekman and W. Friesen. Facial Action Coding System: A Technique for the Mea-<br />

surement of Facial Movement. Consulting Psychologists Press, Palo Alto, 1978.<br />

I. Essa and A. Pentland. Motion-Based Recognition, volume 9 of Computational Imag-<br />

ing and Vision, chapter 12: Facial Expression Recognition Using Image Motion.<br />

Kluwer Academic Publishers, 1997. ISBN 0-7923-4618-1.<br />

B. Fisher, S. Perkins, A. Walker, and E. Wolfart. Hypermedia Image Processing Ref-<br />

erence. http://www.cee.hw.ac.uk/hipr/html/canny.html, Department of Arti-<br />

ficial Intelligence University of Edinburgh/UK, 1994.<br />

D. Gorodnichy. Facial Recognition in Video. In Proceedings of International As-<br />

sociation for Pattern Recognition (IAPR) International Conference on Audio- and<br />

Video-Based Biometric Person Authentication (AVBPA’03), LNCS 2688, pages 505–<br />

514, Guildford, United Kingdom, June 2003. NRC 47150. URL http://iit-iti.<br />

nrc-cnrc.gc.ca/publications/nrc-47150_e.html.<br />

T. Goto, M. Escher, C. Zanardi, and N. Magnenat-Thalmann. MPEG-4 Based<br />

Animation With Face Feature Tracking. In CAS ’99 (Eurographics workshop),<br />

pages 89–98, Milano, Italy, September 1999. MIRALab, Springer. URL http:<br />

//www.miralab.unige.ch/papers/15.pdf.<br />

W. Iverson. Mac OS X for Java Geeks. O’Reilly, April 2003. URL<br />

http://www.oreilly.com/catalog/macxjvgks/http://www.oreilly.com/<br />

catalog/macxjvgks/chapter/ch10.pdf.<br />

J. Ivins and J. Porrill. Everything You Always Wanted To Know About Snakes (But<br />

Were Afraid To Ask). Technical report, Artificial Intelligence Vision Research Unit<br />

University Of Sheffield, England S10 2TP, July 1993. URL http://www.computing.<br />

edu.au/~jim/psfiles/aivru86c.ps. AIVRU Technical Memo #86 (Revised June<br />

1995; March 2000).<br />

M. Jacob, T. Blu, and M. Unser. Efficient Energies and Algorithms for Parametric<br />

83


Bibliography<br />

Snakes. IEEE Transactions on Image Processing, 13(9):1231–1244, September 2004.<br />

URL http://ip.beckman.uiuc.edu/publications.html.<br />

JMF. Java�Media Framework API Guide, jmf 2.0 fcs edition, November 1999.<br />

M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active Contour Models. Interna-<br />

tional Journal of Computer Vision,, 1:321–331, 1988.<br />

V. Krüger, A. Happe, and G. Sommer. Affine Real-Time Face Tracking Using Gabor<br />

Wavelet Networks. In ICPR00: Proceedings of the International Conference on<br />

Pattern Recognition (ICPR00), volume 1, page 1127, Washington, DC, USA, 2000.<br />

IEEE Computer Society.<br />

B. Mackiewich. Intracranial Boundary Detection and Radio Frequency Correction in<br />

Magnetic Resonance Images. Master’s thesis, Simon Fraser University, August 1995.<br />

URL http://www.cs.sfu.ca/~stella/papers/blairthesis/main/main.html.<br />

B. S. Morse. Lecture 11: Shape Representation: Regions (Moments). http://bryan.<br />

cs.byu.edu/650/home/index.php, January 2004. Course material for ‘Computer<br />

Vision’ at Brigham Young University.<br />

R. Mukundan and K. R. Ramakrishnan. Moment Functions in Image Analysis. World<br />

Scientific, 1998.<br />

L. Rocha, L. Velho, and P. C. P. Carvalho. Image Moments-Based Structuring and<br />

Tracking of Objects. sibgrapi, 00:99, 2002.<br />

H. Sahbi and N. Boujemaa. Coarse to Fine Face Detection Based on Skin Color<br />

Adaption. In ECCV ’02: Proceedings of the International ECCV 2002 Work-<br />

shop Copenhagen on Biometric Authentication, pages 112–120, London, UK, 2002.<br />

Springer-Verlag. ISBN 3-540-43723-1. URL http://www-rocq.inria.fr/imedia/<br />

Articles/23590112.pdf.<br />

T. Smith. When to use and not to use the two-sample t-test. http://<br />

www.ubht.nhs.uk/R\&D/RDSU/Statistical%20Tutorials/t-tests.pdf, Novem-<br />

ber 2002. URL http://www.ubht.nhs.uk/R&D/RDSU/Statistical%20Tutorials/<br />

statistical_tutorials.htm. Research and Effectiveness Department (United<br />

Bristol Healthcare NHS Trust).<br />

84


Glossary<br />

M. Sonka, V. Hlavac, and R. Boyle. Image Processing, Analysis, and Machine Vision.<br />

PWS Publishing, second edition, 1999.<br />

D. Terzopoulos and K. Waters. Analysis and Synthesis of Facial Image Sequences<br />

Using Physical and Anatomical Models. IEEE Trans. Pattern Anal. Mach. Intell.,<br />

15(6):569–579, 1993. ISSN 0162-8828.<br />

vidiator. Using FaceStation 2. http://www.vidiator.com/support/<br />

facestationdocs/index.html, 2004.<br />

H. Wu, Q. Chen, and M. Yachida. Face Detection From Color Images Using a Fuzzy<br />

Pattern Matching Method. IEEE Trans. Pattern Anal. Mach. Intell., 21(6):557–563,<br />

1999. ISSN 0162-8828.<br />

X. Xie and M. Mirmehdi. Geodesic Colour Active Contour Resistent to Weak<br />

Edges and Noise. In Proceedings of the 14th British Machine Vision Conference,<br />

pages 399–408. BMVA Press, September 2003. URL http://www.cs.bris.ac.uk/<br />

Publications/Papers/2000034.pdf.<br />

J. Zobel. Writing for Computer Science. Springer, 2 edition, 2004.<br />

85


Glossary<br />

ANOVA Analysis of Variance. A series of statistical procedures for examining<br />

differences in means and for partitioning variance, 74, 75<br />

API Application Programming Interface. A defined set of calling con-<br />

ventions allowing a software application to access a particular set of<br />

services, 36, 37, 39, 49<br />

AU Action Unit. The front-end interface and navigation design of an<br />

application, 9, 10, 68<br />

BSP Binary Space Partitioning. A technique for the division of geomet-<br />

rical objects. It is mainly used in game engines of computer games,<br />

20, 35, 51, 54–56, 64, 67, 80<br />

CSV Comma Separated Values. A file format used as a portable represen-<br />

tation of a database, 64<br />

FACS Facial Action Coding System. The front-end interface and navigation<br />

design of an application, 9–11, 26, 27<br />

FAP Facial Animation Parameters. Feature points used for facial anima-<br />

tion that are standardized in the MPEG-4 standard, 2<br />

FBX A platform-independent 3D authoring and interchange format., 23<br />

GUI Graphical User Interface. The front-end interface and navigation<br />

design of an application, 3, 50, 51, 54, 60, 64<br />

86


Glossary<br />

JAI The Java Advanced Imaging API. An optional package extending the<br />

Java 2 Platform, providing additional capabilities for running image<br />

processing applications and imaging applets in Java, 47–49, 62, 63,<br />

78–80<br />

JMF The Java Media Framework API. An optional package extending the<br />

Java 2 Platform that enables audio, video and other time-based media<br />

to be added to applications and applets built on Java technology, 3,<br />

36, 37, 40, 41, 49, 60–63, 80, 81<br />

JNI Java Native Interface. A programming framework that allows Java<br />

code running in the Java VM to call and be called by native appli-<br />

cations and libraries written in other languages, 36<br />

JVM Java Virtual Machine. A piece of software that converts Java byte-<br />

code into machine language and executes it, 39, 77, 78<br />

LoG Laplacian of Gaussian. A convolution operator using a Gaussian<br />

image smoothing and a second derivative Laplace operator, 45<br />

PCU Portable Control Unit. Desktop control of lightsource; external power<br />

supply replaces internal PC backpanel power supply, 23, 25<br />

QTJava Quicktime for Java. In a different context it is also: Java QT binding,<br />

39<br />

SDK Software Development Kit. A programming package that enables<br />

a programmer to develop applications for a specific platform. Java<br />

SDK versions below 1.2 and version 1.5 are called JDK, 23, 25, 40,<br />

60<br />

87


List of Figures<br />

0.1. Not standardized output of facial feature localization. . . . . . . . . . 2<br />

1.1. Basic facial feature tracking workflow. . . . . . . . . . . . . . . . . . . 7<br />

1.2. Optical flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8<br />

1.3. Standard face model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9<br />

1.4. Feature point displacements. . . . . . . . . . . . . . . . . . . . . . . . 10<br />

1.5. Control-theoretic mapping of optical flow. . . . . . . . . . . . . . . . . 10<br />

1.6. A closed snake. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br />

1.7. An example of the movement of a point in a snake. . . . . . . . . . . . 13<br />

1.8. Snakes and fiducial points used for muscle contraction estimation. . . 14<br />

1.9. Weak-edge leakage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15<br />

1.10. Example for moment calculations and shape representation. . . . . . . 16<br />

1.11. Object fitting by 2 k ellipses at each level. . . . . . . . . . . . . . . . . 20<br />

1.12. X-IST FaceTracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23<br />

1.13. VeeAnimator in action. . . . . . . . . . . . . . . . . . . . . . . . . . . . 24<br />

2.1. Overview of SplineSnake results. . . . . . . . . . . . . . . . . . . . . . 30<br />

2.2. SplineSnake interference. . . . . . . . . . . . . . . . . . . . . . . . . . . 30<br />

2.3. Overview of snake results. . . . . . . . . . . . . . . . . . . . . . . . . . 32<br />

3.1. Top view of camera layout used for recordings. . . . . . . . . . . . . . 41<br />

3.2. Two types of binary image regions applicable for the tracking algorithm. 43<br />

3.3. Function with intensity change, its first and second derivative. . . . . . 44<br />

3.4. Multi-stage canny edge detection process. . . . . . . . . . . . . . . . . 46<br />

3.5. Workflow of fetching individual pixels with Java2D. . . . . . . . . . . 47<br />

4.1. Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53<br />

4.2. Overview of the basic application workflow . . . . . . . . . . . . . . . 54<br />

4.3. Angle calculation for raster subdivision. . . . . . . . . . . . . . . . . . 60<br />

88


List of Figures<br />

5.1. Tracking results for selective regions. . . . . . . . . . . . . . . . . . . . 66<br />

5.2. Tracking result for 6 area selections. . . . . . . . . . . . . . . . . . . . 67<br />

5.3. Good and bad results. . . . . . . . . . . . . . . . . . . . . . . . . . . . 68<br />

5.4. Positions of the corners of the mouth. . . . . . . . . . . . . . . . . . . 69<br />

5.5. Preprocessing of mouth region. . . . . . . . . . . . . . . . . . . . . . . 70<br />

5.6. d 2 of subsequent video frames. . . . . . . . . . . . . . . . . . . . . . . 72<br />

5.7. Tracking time consumption. . . . . . . . . . . . . . . . . . . . . . . . . 77<br />

5.8. Preprocessing time consumption. . . . . . . . . . . . . . . . . . . . . . 78<br />

89


List of Tables<br />

1.1. Comparison of commercial products . . . . . . . . . . . . . . . . . . . 25<br />

2.1. SplineSnake parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . 29<br />

2.2. SplineSnake: Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />

2.3. Bodier snake parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 32<br />

2.4. Snake: Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33<br />

3.1. JMF 2.1.1 - Supported Video Formats . . . . . . . . . . . . . . . . . . 38<br />

5.1. Mean position of x/y coordinates. . . . . . . . . . . . . . . . . . . . . . 70<br />

5.2. Minima and maxima of x/y coordinates. . . . . . . . . . . . . . . . . . 71<br />

5.3. Sum, maximum and average of d 2 on subsequent frames. . . . . . . . . 71<br />

5.4. Correlation results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72<br />

5.5. Two-tailed paired samples t-test on d 2 . . . . . . . . . . . . . . . . . . . 73<br />

5.6. Sum, maximum and average of d 2 of different tester’s values. . . . . . 74<br />

5.7. Extrema of d 2 between the algorithm and each of the manual testers. . 75<br />

5.8. Contrast tests of d 2 between the algorithm and each of the manual testers. 75<br />

5.9. Tamhane post-hoc test on d 2 . . . . . . . . . . . . . . . . . . . . . . . . 76<br />

90


List of Procedures<br />

4.1. createBSPTree(levels) . . . . . . . . . . . . . . . . . . . . . . . . . . 56<br />

4.2. addPoint(x, y, val) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57<br />

4.3. calculateValues() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58<br />

4.4. subdivide(levels) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59<br />

91


A. Appendix<br />

A.1. Evaluation Data: Coordinates of Corners of the Mouth<br />

man1 man2 algo<br />

frame x1 y1 x2 y2 x1 y1 x2 y2 x1 y1 x2 y2<br />

1 238 220 287 220 238 221 287 221 238 220 288 219<br />

2 239 220 287 220 238 221 287 221 237 220 288 219<br />

3 238 220 287 220 238 221 287 221 237 220 288 219<br />

4 238 220 287 220 238 221 287 221 237 220 288 219<br />

5 238 219 287 220 238 221 288 221 237 220 287 219<br />

6 239 219 285 220 240 221 287 221 239 219 287 219<br />

7 241 219 284 219 243 220 283 220 244 221 285 218<br />

8 246 220 282 220 246 220 282 220 245 221 285 219<br />

9 247 220 282 220 248 221 281 221 246 221 284 218<br />

10 247 221 282 221 249 221 281 221 246 220 282 219<br />

11 249 221 281 221 250 221 281 221 249 220 282 219<br />

12 249 221 281 221 250 222 280 222 250 221 282 219<br />

13 249 221 281 221 250 222 280 222 250 223 283 219<br />

14 248 221 281 221 250 222 280 222 250 223 282 219<br />

15 248 221 280 221 250 222 280 222 250 223 282 218<br />

16 249 221 280 221 250 222 280 222 251 224 282 218<br />

17 248 220 280 221 250 222 280 222 250 220 283 218<br />

18 248 221 280 221 250 222 280 222 250 221 283 218<br />

19 248 221 279 221 250 222 280 222 250 221 283 218<br />

20 249 221 281 221 250 222 280 222 250 224 283 218<br />

21 248 221 280 221 250 222 280 222 250 224 283 218<br />

22 248 221 280 221 250 222 280 222 250 221 283 218<br />

23 248 221 280 221 249 222 280 222 249 223 283 218<br />

24 249 221 280 221 249 222 280 222 248 222 283 218<br />

25 248 221 280 221 250 222 280 222 248 221 280 217<br />

26 248 221 280 221 250 222 279 222 249 221 281 218<br />

27 248 221 280 220 251 222 279 221 249 221 280 217<br />

28 249 221 280 221 250 222 280 221 249 220 280 217<br />

29 248 221 280 221 250 222 280 221 250 222 278 216<br />

30 248 221 280 221 250 222 280 221 250 222 280 217<br />

31 249 221 280 221 250 222 280 222 248 221 282 218<br />

32 249 220 280 221 250 222 280 222 251 223 279 217<br />

33 249 220 281 220 249 222 280 222 249 221 280 217<br />

34 247 220 281 220 248 221 281 222 246 221 281 216<br />

35 243 220 284 220 245 221 283 221 244 220 284 218<br />

36 241 219 286 219 242 220 284 220 244 220 287 218<br />

37 240 219 286 219 240 220 286 220 240 219 287 218<br />

38 239 218 286 218 239 219 287 219 238 219 288 218<br />

39 238 218 286 218 239 219 287 219 237 218 289 218<br />

40 239 218 286 219 238 219 287 219 237 218 288 218<br />

41 238 219 286 218 238 220 287 219 238 219 289 218<br />

42 239 219 286 219 238 220 287 219 237 219 289 218<br />

43 238 219 287 219 239 220 287 220 238 219 288 218<br />

44 238 219 286 219 239 220 287 220 237 219 288 218<br />

45 238 219 286 219 239 220 287 220 238 219 289 218<br />

46 239 219 286 219 240 220 287 220 238 219 288 219<br />

47 238 219 286 219 240 220 287 220 238 219 288 218<br />

48 239 219 286 219 240 220 287 220 238 220 288 218<br />

49 238 220 287 219 240 220 287 220 237 220 288 219<br />

50 238 220 287 219 240 220 287 220 238 220 287 219<br />

51 238 219 286 219 239 220 286 221 237 220 288 219<br />

52 238 219 286 219 238 221 286 221 238 220 288 219<br />

53 239 219 286 219 238 221 286 221 238 220 288 219<br />

54 239 219 287 219 238 221 286 221 238 220 288 219<br />

55 239 219 286 219 238 221 286 221 238 220 287 218<br />

56 239 219 286 220 238 221 286 221 237 220 287 219<br />

57 239 219 286 220 238 221 286 221 237 220 287 219<br />

58 239 219 286 220 238 221 286 221 238 220 287 219<br />

59 240 219 287 219 239 220 285 221 238 220 287 219<br />

60 240 219 285 219 240 220 285 221 240 220 286 218<br />

continued on next page..<br />

92


A. Appendix<br />

man1 man2 algo<br />

frame x1 y1 x2 y2 x1 y1 x2 y2 x1 y1 x2 y2<br />

61 243 219 283 219 243 220 283 221 243 220 283 217<br />

62 246 220 280 219 246 220 282 221 245 220 282 218<br />

63 246 220 280 220 247 221 281 221 246 221 281 218<br />

64 247 220 280 220 248 221 280 221 249 221 281 218<br />

65 247 220 279 220 248 222 280 222 249 223 279 217<br />

66 248 220 280 220 248 222 280 222 248 222 279 217<br />

67 248 220 279 220 248 222 280 222 248 221 279 217<br />

68 247 220 279 220 248 222 280 222 247 220 279 217<br />

69 247 220 279 220 248 222 280 222 249 220 280 217<br />

70 247 220 279 220 248 222 280 222 249 222 279 217<br />

71 247 219 279 220 248 222 280 222 248 222 281 217<br />

72 247 220 279 221 248 222 279 222 248 222 278 216<br />

73 246 220 279 220 248 221 279 221 248 219 278 219<br />

74 247 220 280 220 248 221 279 221 248 221 280 217<br />

75 247 219 279 220 248 221 279 221 247 220 280 217<br />

76 249 220 280 220 248 221 280 221 247 220 282 217<br />

77 248 220 279 220 247 221 280 221 247 220 281 217<br />

78 248 219 280 219 247 221 280 221 247 220 281 217<br />

79 248 220 280 220 247 221 280 221 247 220 282 217<br />

80 247 219 279 220 247 221 280 221 247 220 280 216<br />

81 247 219 279 220 247 221 280 221 247 220 281 217<br />

82 247 220 279 220 247 221 280 221 247 220 282 217<br />

83 247 220 280 220 247 221 280 221 247 220 282 217<br />

84 248 219 280 220 247 221 280 221 247 220 282 217<br />

85 247 219 280 220 247 221 280 221 248 219 281 217<br />

86 249 220 279 220 246 221 281 221 248 219 280 216<br />

87 249 220 280 220 246 221 281 221 250 221 280 216<br />

88 247 219 280 220 246 221 282 221 250 222 280 216<br />

89 245 219 282 220 244 220 283 220 245 220 283 217<br />

90 243 219 284 219 242 220 284 220 243 219 285 217<br />

91 240 219 286 219 241 220 286 220 241 219 286 217<br />

92 239 219 286 218 240 220 287 220 239 219 288 217<br />

93 238 219 286 218 238 220 287 220 237 219 289 217<br />

94 238 218 286 218 238 220 287 220 237 219 289 218<br />

95 237 218 287 218 237 220 287 220 236 218 289 218<br />

96 238 218 286 218 237 219 288 219 236 218 289 217<br />

97 238 218 287 218 237 219 288 219 236 219 289 217<br />

98 236 218 287 218 237 219 288 219 236 219 289 217<br />

99 237 218 287 218 237 219 288 219 236 219 289 218<br />

100 237 218 287 218 237 219 288 219 236 219 289 218<br />

101 237 218 287 218 237 219 288 219 236 219 289 218<br />

102 237 218 287 218 237 219 288 219 235 220 289 218<br />

103 237 218 287 219 237 219 288 219 236 219 289 218<br />

104 237 218 287 218 237 219 288 219 236 219 289 218<br />

105 237 218 287 218 237 219 288 219 236 219 289 218<br />

106 236 218 288 218 237 219 288 219 236 219 288 218<br />

107 236 218 287 219 237 219 288 219 236 219 289 218<br />

108 238 218 287 218 237 219 288 219 236 219 289 218<br />

109 238 218 286 218 237 219 288 219 236 219 289 217<br />

110 238 218 287 218 237 219 288 219 236 219 289 218<br />

111 237 218 287 218 237 219 288 219 236 219 289 218<br />

112 237 218 286 218 237 219 288 219 236 219 289 218<br />

113 237 219 287 218 237 219 288 219 236 219 289 218<br />

114 238 218 286 218 237 219 288 219 236 219 289 217<br />

115 237 218 286 218 237 220 288 220 236 219 289 218<br />

116 238 219 286 218 237 220 288 220 236 219 289 218<br />

117 238 219 286 218 238 220 288 220 236 219 289 218<br />

118 240 219 284 219 240 220 285 220 239 219 286 218<br />

119 242 219 282 219 242 220 283 220 241 220 285 218<br />

120 245 220 280 219 244 221 281 221 244 220 282 218<br />

121 245 220 279 220 245 221 279 221 245 221 278 218<br />

122 246 220 278 220 245 221 278 221 245 221 278 217<br />

123 246 220 279 220 245 221 278 221 246 219 279 217<br />

124 246 220 278 220 245 221 278 221 246 220 277 216<br />

125 247 220 278 220 245 221 278 221 246 220 277 216<br />

126 247 220 278 220 245 221 279 221 246 220 278 216<br />

127 247 220 278 220 245 221 279 221 246 219 277 216<br />

128 246 219 279 220 245 221 279 221 246 219 280 216<br />

129 246 219 279 220 245 221 279 221 247 219 280 216<br />

130 247 219 279 220 245 221 279 221 247 218 282 216<br />

131 247 219 279 219 246 221 279 221 247 218 281 216<br />

132 245 219 280 219 246 221 279 221 247 218 281 216<br />

133 248 219 279 219 246 221 279 221 247 218 281 216<br />

134 246 219 279 219 246 221 279 221 247 218 280 215<br />

135 247 219 279 219 246 221 279 221 246 219 277 215<br />

136 247 219 279 219 246 221 279 221 246 220 277 215<br />

93


A. Appendix<br />

A.2. Evaluation Data: Tracking Mouth Area Selection (60x20)<br />

We performed 20 test-runs with the implemented Java feature tracker, selecting a<br />

mouth region with 60x20 pixels. 10 runs were done with standard JVM options, 10<br />

were done with the following tuning: -verbose:gc -Xms64m -Xmx256m -XX:NewRatio=2<br />

-XX:+UseConcMarkSweepGC.<br />

The measurement unit is milliseconds.<br />

frame JVM tuned JVM standard options<br />

1 27 48 26 26 52 69 70 69 28 69 46 31 50 32 30 24 45 46 48 47<br />

2 19 13 25 33 13 13 15 33 13 33 29 31 29 29 28 30 33 17 20 29<br />

3 22 24 39 38 22 24 22 36 21 45 32 31 32 33 33 33 34 27 23 32<br />

4 8 8 11 12 8 8 7 12 8 12 10 10 11 10 9 10 10 8 8 12<br />

5 34 29 22 22 30 26 19 22 29 22 25 17 24 24 18 24 25 18 17 22<br />

6 7 7 10 8 12 10 6 8 11 11 7 7 8 7 8 7 7 18 8 7<br />

7 5 6 6 6 5 6 7 5 5 6 6 6 5 6 4 5 5 6 7 5<br />

8 20 18 6 6 36 7 7 6 7 8 6 6 17 6 6 17 5 6 6 6<br />

9 6 6 20 7 6 8 6 8 5 10 7 6 8 6 7 8 7 8 8 6<br />

10 10 6 9 11 10 12 9 10 10 11 10 9 9 19 9 10 9 9 9 9<br />

11 54 12 9 7 16 9 7 7 7 40 9 7 9 9 10 9 9 8 8 9<br />

12 6 6 7 5 4 8 6 4 5 7 5 6 5 5 6 4 5 7 5 5<br />

13 10 9 5 8 10 15 9 23 9 9 9 9 9 10 9 9 8 9 9 10<br />

14 28 6 6 6 7 7 4 8 5 5 8 5 10 6 6 8 6 7 7 6<br />

15 3 3 9 3 4 4 4 4 2 6 3 4 4 4 4 4 3 4 4 4<br />

16 5 16 4 3 3 4 3 4 18 4 4 3 4 3 4 4 4 3 3 4<br />

17 8 4 4 4 17 4 4 4 4 3 4 5 4 4 5 4 4 4 4 4<br />

18 3 3 19 2 2 2 2 3 3 5 3 2 3 2 3 3 2 3 2 3<br />

19 3 3 3 3 2 3 3 3 3 5 3 3 3 2 3 3 3 3 3 3<br />

20 5 3 4 3 17 2 2 2 4 4 4 4 6 4 4 3 4 4 4 4<br />

21 5 3 6 17 3 3 4 3 3 6 3 4 3 3 3 3 3 3 3 2<br />

22 4 3 4 4 4 4 18 3 4 5 4 4 4 3 5 4 4 4 4 4<br />

23 8 9 4 4 4 18 5 18 5 5 6 6 5 5 5 5 7 7 6 5<br />

24 5 6 6 5 5 4 6 5 4 19 5 6 6 6 6 6 5 4 5 6<br />

25 5 22 21 7 20 21 5 21 4 4 8 8 8 8 8 9 8 8 8 8<br />

26 5 4 2 4 2 3 4 2 3 2 2 2 2 2 2 3 3 2 2 3<br />

27 3 2 5 3 2 3 3 2 3 4 2 2 3 3 3 2 3 3 2 2<br />

28 3 2 3 2 2 2 3 7 3 2 3 2 2 3 2 3 3 3 3 2<br />

29 5 2 2 2 7 2 3 2 2 3 3 3 2 3 2 2 2 3 3 2<br />

30 3 3 56 3 15 5 16 3 16 25 3 3 3 3 3 3 3 3 3 3<br />

31 6 5 2 3 2 3 3 3 6 2 3 3 3 3 2 2 3 2 2 2<br />

32 6 3 2 3 3 4 9 4 3 4 3 4 3 3 3 3 3 3 3 3<br />

33 5 6 8 6 6 19 2 7 4 6 6 16 6 7 6 7 7 6 7 6<br />

34 3 3 3 3 2 3 3 3 16 3 3 6 3 3 3 3 3 3 3 3<br />

35 6 4 3 4 17 3 20 17 3 4 4 4 3 4 4 3 3 4 4 4<br />

36 3 3 6 5 3 3 2 3 17 6 4 3 4 4 3 3 3 4 4 4<br />

37 3 4 4 3 6 3 3 2 3 3 3 4 3 3 3 3 4 3 3 3<br />

38 5 3 3 3 3 3 3 3 3 18 3 2 3 3 3 3 3 4 3 3<br />

39 4 3 6 3 3 3 3 4 6 6 4 3 3 2 3 3 3 3 3 2<br />

40 6 3 19 3 4 3 3 4 3 3 3 3 3 4 3 4 2 3 3 3<br />

41 5 3 2 3 3 2 6 3 2 3 14 3 3 2 3 4 2 3 3 3<br />

42 3 3 5 3 3 4 2 3 3 5 3 3 2 3 3 3 3 3 3 3<br />

43 2 3 2 2 2 2 2 3 2 2 2 3 2 2 2 2 2 3 2 2<br />

44 17 16 2 3 3 3 3 3 3 3 3 2 3 2 2 3 3 3 3 2<br />

45 3 3 6 3 3 3 3 3 3 5 3 2 3 2 2 3 3 2 3 3<br />

46 3 3 4 2 3 3 4 2 3 2 3 3 2 3 3 3 2 2 2 2<br />

47 4 2 2 2 2 2 2 2 2 2 2 4 2 2 2 3 2 2 2 2<br />

48 2 2 4 15 2 2 3 2 2 4 2 3 2 3 2 2 2 2 2 2<br />

49 2 3 8 8 7 8 14 2 7 7 7 8 7 8 7 8 7 8 8 8<br />

50 4 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 3<br />

51 3 2 17 2 2 2 2 2 2 18 2 3 2 2 2 2 2 2 2 2<br />

52 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 1 2 2 2<br />

53 4 2 2 3 2 2 3 2 2 20 3 2 2 2 1 1 2 3 2 1<br />

54 6 3 5 3 3 2 2 3 3 5 3 3 3 3 2 3 3 3 2 3<br />

55 2 2 2 2 2 2 17 2 2 2 2 3 2 2 3 2 2 2 2 2<br />

56 5 2 1 2 2 2 3 2 3 2 2 3 1 2 2 3 2 2 2 2<br />

57 3 2 4 1 2 2 3 2 14 3 2 3 2 1 3 2 2 3 2 2<br />

58 2 15 16 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2<br />

59 4 2 2 2 2 2 2 2 1 4 1 2 2 2 2 2 2 2 2 2<br />

60 2 2 4 2 2 2 2 2 2 16 3 2 2 2 2 2 2 2 2 2<br />

61 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2<br />

continued on next page..<br />

94


A. Appendix<br />

frame JVM tuned JVM standard options<br />

62 6 4 18 2 3 3 2 4 3 3 3 3 2 3 2 3 2 3 3 2<br />

63 4 4 6 3 3 4 3 4 3 19 3 3 3 2 4 3 3 3 4 4<br />

64 3 2 3 3 16 15 2 3 15 2 2 2 3 3 3 2 2 3 3 3<br />

65 21 2 14 2 5 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2<br />

66 2 2 4 2 2 16 2 2 2 4 2 2 2 2 1 2 2 1 2 2<br />

67 2 2 1 2 2 2 3 2 2 2 2 3 1 2 2 1 2 2 2 1<br />

68 4 2 2 2 2 2 1 2 2 2 3 2 2 1 1 2 2 1 2 2<br />

69 2 15 7 2 3 15 2 2 1 17 2 3 2 2 2 2 1 2 2 2<br />

70 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 1 1<br />

71 4 8 1 2 2 2 2 8 2 2 2 2 2 2 2 2 3 1 2 2<br />

72 2 3 5 3 3 2 2 2 3 17 3 1 3 3 3 2 3 3 2 3<br />

73 15 1 2 2 2 2 2 2 2 2 2 3 2 3 1 2 2 2 2 2<br />

74 4 1 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2<br />

75 16 16 5 4 3 3 2 3 3 6 3 7 3 3 3 3 3 3 3 3<br />

76 2 2 4 2 17 3 16 3 4 5 2 5 2 2 1 2 2 2 2 1<br />

77 4 5 2 2 2 2 2 2 2 1 4 2 4 1 4 4 8 4 5 4<br />

78 2 3 4 3 3 2 1 2 3 17 3 2 2 2 2 3 3 3 3 3<br />

79 2 2 1 2 1 2 2 1 2 3 2 1 2 2 2 3 2 2 2 2<br />

80 4 2 1 2 1 1 2 1 3 1 1 2 2 2 2 1 1 3 2 2<br />

81 3 3 5 6 2 3 2 3 2 5 3 2 3 3 2 3 3 3 3 2<br />

82 3 2 3 2 18 3 2 2 2 3 3 3 2 2 3 3 3 2 2 2<br />

83 4 2 2 3 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2<br />

84 14 2 4 1 2 2 2 1 2 4 2 2 2 2 2 2 2 2 2 2<br />

85 6 3 3 3 3 3 4 3 3 3 3 4 3 4 3 3 3 3 3 3<br />

86 4 2 2 2 2 15 8 2 1 2 2 2 2 1 2 2 2 2 2 2<br />

87 3 3 5 2 2 3 2 4 2 5 3 3 3 3 3 2 2 2 2 3<br />

88 2 2 2 1 1 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2<br />

89 19 3 3 4 3 4 3 4 3 5 3 3 3 3 4 3 3 4 3 5<br />

90 2 2 4 2 2 2 2 2 2 4 2 2 2 2 2 1 2 3 2 1<br />

91 3 3 5 3 3 3 3 3 3 3 4 3 3 4 3 3 3 3 4 3<br />

92 17 2 2 3 2 3 2 2 3 2 4 4 3 2 2 2 2 4 3 4<br />

93 1 1 3 2 2 7 2 2 2 4 2 2 2 2 2 2 2 2 1 2<br />

94 4 3 2 3 3 3 6 4 3 15 3 7 3 6 5 3 3 3 4 3<br />

95 4 2 2 2 2 2 2 2 3 2 2 2 2 3 2 2 2 2 2 2<br />

96 2 2 4 3 2 2 17 2 2 5 3 3 2 2 3 7 3 2 3 3<br />

97 4 3 3 5 4 3 3 3 3 3 4 3 3 3 3 3 3 3 4 3<br />

98 17 4 4 5 3 16 2 3 4 4 3 2 3 3 3 6 2 2 3 3<br />

99 2 15 4 2 2 2 1 2 2 4 2 2 2 2 2 2 2 2 2 2<br />

100 2 2 15 3 2 3 2 2 3 6 3 2 2 2 4 2 3 3 3 3<br />

101 6 3 3 3 2 4 3 2 3 3 5 3 3 5 4 5 4 4 4 4<br />

102 2 2 4 2 2 2 1 2 2 4 2 2 2 2 2 2 3 3 3 2<br />

103 3 3 3 3 3 3 3 2 3 3 5 2 2 2 2 3 3 6 5 3<br />

104 4 1 2 2 2 2 2 2 2 3 2 2 2 1 2 2 1 2 2 2<br />

105 2 2 4 1 2 2 2 2 2 4 2 2 2 2 2 2 1 2 2 2<br />

106 14 1 2 2 2 2 2 2 15 2 2 2 2 2 2 2 2 2 2 1<br />

107 4 2 2 2 2 2 2 1 2 1 2 2 2 2 2 2 2 2 2 1<br />

108 2 2 3 2 2 2 2 2 2 17 2 2 2 2 1 1 2 1 2 2<br />

109 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2<br />

110 4 2 2 2 3 1 2 2 2 2 2 2 1 2 2 2 2 2 3 1<br />

111 3 3 4 3 15 3 3 2 2 4 2 3 3 3 3 2 2 2 3 4<br />

112 2 1 1 2 1 2 16 2 2 2 2 2 1 1 2 2 2 2 2 1<br />

113 4 2 3 2 3 2 2 3 2 2 1 4 3 1 4 2 4 2 2 2<br />

114 2 2 4 1 1 2 2 2 2 4 1 2 2 2 2 3 2 2 2 1<br />

115 3 3 3 2 4 20 5 4 3 3 3 3 3 2 3 3 3 3 3 3<br />

116 5 3 3 4 3 3 18 3 4 3 4 4 4 5 5 3 3 4 3 3<br />

117 1 2 4 2 1 1 1 2 1 4 2 2 2 3 2 2 2 2 2 2<br />

118 2 2 6 2 2 2 1 2 2 2 2 2 1 1 2 2 2 2 2 1<br />

119 3 2 2 2 2 2 1 2 1 2 2 2 3 2 2 2 2 3 2 2<br />

120 16 3 5 3 3 3 15 3 3 5 3 2 3 3 3 3 4 4 4 3<br />

121 2 19 2 2 3 3 2 3 3 4 4 3 3 2 2 3 2 3 3 2<br />

122 4 2 2 2 1 1 3 2 2 2 2 3 2 2 2 1 2 2 2 2<br />

123 3 3 19 3 3 3 3 3 3 5 3 2 3 3 3 5 3 4 3 3<br />

124 3 16 3 3 3 3 2 3 3 5 3 3 3 3 3 4 3 4 3 3<br />

125 18 15 3 3 3 2 3 15 3 2 3 2 2 4 2 3 2 2 2 3<br />

126 2 2 4 2 2 1 3 2 1 4 2 3 2 2 2 1 2 1 2 1<br />

127 3 3 3 3 3 3 3 3 3 3 3 3 2 4 4 7 3 3 7 3<br />

128 47 2 3 3 3 3 2 3 3 2 2 2 3 2 3 3 2 3 3 3<br />

129 2 2 4 2 2 2 2 2 2 17 2 2 2 2 2 2 2 2 2 2<br />

130 4 3 4 3 3 4 2 3 3 4 3 2 3 4 3 3 5 4 4 4<br />

131 7 2 2 3 3 15 2 3 3 3 2 3 2 2 3 3 2 2 3 2<br />

132 2 1 15 2 2 2 2 2 1 3 2 3 2 2 2 2 2 2 3 2<br />

133 3 7 3 2 3 3 3 2 4 3 3 3 3 3 3 14 3 3 5 3<br />

134 6 4 3 4 17 16 2 16 17 3 3 2 4 3 3 5 4 5 4 3<br />

135 2 1 4 2 2 2 1 2 2 4 2 2 2 2 2 2 1 2 2 2<br />

136 2 2 3 2 2 15 2 1 14 2 2 2 2 2 2 1 2 1 2 2<br />

95


A. Appendix<br />

A.3. Evaluation Data: Tracking of Whole Area Selection (384x288)<br />

We performed 20 test-runs with the implemented Java feature tracker, selecting the<br />

complete image region with 384x288 pixels. 10 runs were done with standard JVM<br />

options, 10 were done with the following tuning: -verbose:gc -Xms64m -Xmx256m<br />

-XX:NewRatio=2 -XX:+UseConcMarkSweepGC.<br />

The measurement unit is milliseconds.<br />

frame JVM tuned JVM standard options<br />

1 55 61 135 112 57 126 147 133 120 108 71 117 128 67 67 69 71 124 133 70<br />

2 35 75 66 60 28 75 33 53 31 69 59 59 33 37 38 58 34 58 37 33<br />

3 32 49 53 53 30 49 38 58 31 56 47 49 42 43 43 49 44 47 42 42<br />

4 31 44 57 48 39 52 29 49 37 53 43 44 44 43 43 38 44 38 44 43<br />

5 24 34 31 30 23 29 28 35 27 28 39 35 29 29 40 35 29 37 29 35<br />

6 34 48 27 28 24 27 25 29 38 27 33 33 31 29 36 33 32 33 28 32<br />

7 25 25 29 24 23 27 27 24 25 31 31 29 32 32 34 28 28 32 32 27<br />

8 26 33 42 29 26 26 27 27 26 23 26 30 30 36 30 29 29 29 29 29<br />

9 27 21 24 21 32 20 21 21 22 21 32 28 28 37 37 28 29 28 28 27<br />

10 30 25 30 28 24 29 27 32 28 23 127 126 127 142 128 124 133 124 125 124<br />

11 21 25 24 23 26 21 24 20 23 20 31 30 24 30 34 36 30 25 29 30<br />

12 25 24 25 24 23 26 25 23 24 24 30 25 24 31 30 32 31 28 24 33<br />

13 30 26 26 23 39 24 25 31 26 26 36 36 37 37 36 36 38 39 36 35<br />

14 25 31 26 43 17 24 23 22 18 22 123 134 122 126 121 127 123 128 126 123<br />

15 23 20 20 20 26 20 21 20 21 19 31 30 29 28 28 29 30 29 29 30<br />

16 23 23 24 28 25 22 23 22 24 22 29 30 31 30 22 31 30 29 35 30<br />

17 19 23 23 19 39 19 21 23 19 18 25 26 34 30 31 30 25 25 25 25<br />

18 24 23 18 27 18 17 23 18 23 18 123 123 138 123 140 122 123 123 127 130<br />

19 23 20 24 24 23 20 21 21 21 22 29 31 35 30 29 31 30 33 30 29<br />

20 29 25 24 22 23 24 27 23 27 24 56 31 22 30 29 30 30 30 24 34<br />

21 26 18 20 19 27 18 19 22 19 18 23 24 25 25 26 24 27 24 23 23<br />

22 19 23 18 22 19 17 18 18 17 28 123 123 122 123 130 123 122 123 123 124<br />

23 26 24 24 24 25 21 22 25 22 37 36 31 30 32 30 37 25 29 34 30<br />

24 25 24 26 23 26 23 26 24 25 25 31 31 30 34 31 23 25 30 29 31<br />

25 31 22 23 19 32 22 22 26 22 27 39 38 39 38 39 38 39 39 37 38<br />

26 21 22 19 18 17 24 20 30 18 17 123 123 133 126 122 125 125 133 123 124<br />

27 22 20 22 22 21 18 19 34 20 20 28 29 23 29 28 28 28 27 29 28<br />

28 20 17 29 26 29 21 25 22 17 25 21 29 21 21 21 22 23 21 21 20<br />

29 17 18 19 20 16 16 19 16 16 16 25 24 25 23 26 24 24 22 24 22<br />

30 21 19 23 17 16 16 18 17 18 16 142 125 135 129 123 138 134 133 145 136<br />

31 22 26 23 21 19 20 27 20 22 20 28 28 28 28 29 28 29 29 28 28<br />

32 25 20 22 29 21 25 16 22 17 25 20 29 20 22 20 22 21 22 22 32<br />

33 18 27 16 18 23 16 21 16 17 16 22 24 27 23 22 23 23 27 24 24<br />

34 17 24 23 33 29 18 20 22 20 17 137 135 122 126 123 124 135 124 124 134<br />

35 23 23 23 27 18 21 22 21 22 20 28 29 28 29 30 28 28 28 28 27<br />

36 24 19 22 28 33 21 16 28 16 21 20 21 22 20 22 21 23 21 20 20<br />

37 19 16 17 17 16 16 16 17 17 16 22 23 23 22 22 23 23 23 28 23<br />

38 17 22 18 18 20 16 20 16 20 17 135 123 164 134 125 135 142 133 121 134<br />

39 22 24 19 20 18 19 17 19 17 20 28 28 28 28 28 27 28 27 27 28<br />

40 22 23 17 21 22 17 17 16 17 16 20 21 21 28 27 21 28 29 21 21<br />

41 17 19 16 20 16 16 18 16 23 16 23 23 23 23 23 22 23 23 22 22<br />

42 16 22 18 18 16 18 23 22 24 23 124 130 134 121 133 123 134 133 124 140<br />

43 22 17 23 26 23 25 16 25 16 22 22 28 28 28 27 22 28 27 28 27<br />

44 17 19 20 17 20 16 21 17 21 16 20 22 27 21 20 21 21 27 28 20<br />

45 17 19 31 25 16 17 33 16 34 17 22 28 23 23 23 23 28 22 23 22<br />

46 19 17 20 37 19 19 16 19 17 19 123 129 134 133 134 134 128 122 134 136<br />

47 23 17 22 18 26 17 17 20 17 21 28 29 28 27 29 27 28 27 27 28<br />

48 27 20 18 18 20 17 19 17 19 17 20 22 21 33 20 21 21 27 28 21<br />

49 17 27 27 28 27 23 29 23 27 23 30 24 23 22 22 23 23 22 23 22<br />

50 23 20 23 22 22 22 17 22 16 22 124 123 123 122 126 122 122 123 134 140<br />

51 25 21 17 17 26 16 16 16 17 16 28 31 28 27 27 28 28 32 28 27<br />

52 17 19 20 18 17 16 19 17 19 16 28 29 27 28 29 28 29 27 21 33<br />

53 22 26 19 22 17 19 22 18 31 18 28 28 23 23 23 23 23 23 23 23<br />

54 20 16 21 21 19 20 17 21 17 21 123 123 126 123 129 122 126 136 134 123<br />

55 22 16 18 18 25 17 16 21 17 17 27 28 28 35 29 28 29 32 52 27<br />

56 20 22 20 22 16 21 23 17 20 21 27 28 28 28 28 21 28 34 22 28<br />

57 18 23 19 19 21 18 22 19 22 18 24 23 23 23 23 32 24 22 29 34<br />

58 20 25 21 21 23 21 17 20 16 20 136 129 128 130 123 129 138 128 130 124<br />

59 23 26 17 17 22 16 17 20 17 16 29 27 28 27 33 28 29 27 27 27<br />

60 17 17 21 20 16 16 20 21 20 16 28 28 33 28 21 28 34 28 27 30<br />

61 21 18 19 18 24 18 22 18 22 19 22 23 23 23 28 22 23 22 22 22<br />

continued on next page..<br />

96


A. Appendix<br />

frame JVM tuned JVM standard options<br />

62 26 20 24 20 20 20 17 23 16 20 134 123 122 129 123 130 135 123 138 134<br />

63 23 20 18 18 22 16 17 17 16 17 27 28 26 27 27 28 28 28 28 28<br />

64 17 17 17 20 16 17 19 16 19 17 21 28 29 28 27 27 27 28 21 28<br />

65 19 25 19 19 25 18 21 18 22 17 22 22 23 22 25 22 24 23 23 23<br />

66 20 23 25 21 19 20 16 21 17 21 137 135 139 124 134 122 141 129 123 138<br />

67 25 22 17 17 22 17 17 20 16 17 28 28 28 28 31 28 28 27 28 28<br />

68 20 17 17 21 20 17 23 16 23 16 29 29 30 29 34 29 30 30 29 29<br />

69 17 18 19 20 17 19 21 19 22 19 23 23 26 23 23 22 23 23 23 22<br />

70 21 21 24 21 19 20 17 37 16 20 124 126 122 123 122 122 123 122 124 125<br />

71 27 17 17 21 26 16 17 17 16 16 34 28 27 27 27 28 28 27 27 31<br />

72 20 21 17 17 20 16 19 17 19 17 29 30 30 29 29 29 30 30 28 29<br />

73 20 18 18 18 21 18 22 22 22 22 22 22 22 22 22 22 23 22 22 21<br />

74 20 22 22 22 24 22 17 21 17 21 135 141 134 134 135 134 125 133 139 136<br />

75 22 16 33 21 22 15 25 16 17 16 27 27 32 28 29 27 27 28 28 27<br />

76 17 21 17 22 17 16 19 24 20 17 27 28 27 32 21 32 28 22 27 28<br />

77 21 20 19 21 25 19 21 19 22 19 22 24 23 22 23 23 23 22 23 23<br />

78 20 25 26 30 19 26 17 21 16 21 124 124 135 123 131 138 134 123 124 130<br />

79 25 16 17 17 23 17 17 17 16 16 27 28 27 27 27 28 29 30 28 27<br />

80 17 22 22 21 16 17 20 25 20 16 30 30 30 29 29 29 30 29 29 29<br />

81 17 20 24 20 17 18 21 20 22 20 22 28 23 23 23 23 23 22 22 23<br />

82 20 22 22 23 19 25 16 22 16 25 124 125 129 123 129 124 134 136 123 123<br />

83 26 17 17 17 22 16 17 20 18 18 27 27 28 29 28 28 29 28 28 27<br />

84 16 27 21 23 16 17 19 22 20 21 28 30 29 30 29 29 30 29 30 29<br />

85 16 19 22 19 21 19 22 18 21 18 21 22 22 22 22 22 22 22 23 22<br />

86 19 24 17 16 19 15 16 16 16 16 142 138 135 136 136 134 133 140 136 135<br />

87 17 17 16 22 18 15 16 21 24 16 28 27 27 31 30 27 28 31 27 27<br />

88 17 18 28 18 20 17 19 21 18 17 21 28 21 20 21 20 21 21 21 20<br />

89 18 23 25 22 21 21 16 21 16 22 22 23 22 22 22 22 22 22 29 26<br />

90 22 16 20 19 21 15 17 16 16 16 134 135 137 141 135 145 138 133 129 123<br />

91 16 17 18 17 17 16 18 16 19 16 26 27 28 27 28 21 27 27 27 22<br />

92 17 17 17 20 16 18 20 18 20 18 19 21 21 20 21 20 20 20 20 20<br />

93 27 25 21 21 19 21 19 21 19 21 21 23 22 22 23 23 22 22 22 21<br />

94 17 22 17 16 16 16 16 15 16 21 135 135 122 139 135 140 135 135 133 134<br />

95 17 21 16 25 16 17 18 16 18 17 27 29 27 22 28 28 31 27 28 27<br />

96 18 22 23 19 18 18 23 19 20 18 20 21 20 20 20 20 21 20 20 28<br />

97 20 16 20 17 23 15 16 15 17 21 21 23 21 22 27 22 23 23 21 27<br />

98 17 16 16 16 20 15 16 19 16 15 136 135 135 122 122 139 141 133 128 134<br />

99 17 20 19 22 17 18 19 18 18 22 27 27 31 28 28 28 27 28 27 29<br />

100 18 27 20 20 19 20 21 20 21 19 27 29 32 27 28 28 28 28 27 27<br />

101 21 17 18 17 25 16 17 17 17 22 22 23 23 23 23 23 23 22 22 25<br />

102 21 16 17 17 19 16 17 20 17 16 135 133 123 138 134 122 123 124 137 124<br />

103 21 19 18 23 17 19 18 18 19 17 28 28 27 22 28 27 29 28 28 27<br />

104 19 24 21 36 18 20 21 21 20 19 29 30 30 30 30 29 29 29 30 29<br />

105 22 20 18 17 20 16 23 16 17 22 22 23 23 34 23 22 23 32 23 22<br />

106 17 16 16 17 17 17 16 20 16 16 135 129 135 127 138 122 123 123 123 141<br />

107 17 22 22 19 17 18 18 19 18 16 27 28 28 27 27 28 28 28 32 34<br />

108 20 24 21 21 18 20 20 20 21 20 29 30 29 29 29 34 30 29 29 29<br />

109 32 19 17 38 24 16 16 17 17 22 22 29 23 23 23 23 23 23 23 22<br />

110 19 16 17 17 17 16 16 21 16 17 124 126 136 135 125 123 135 140 135 127<br />

111 20 18 19 19 17 18 18 18 19 16 27 28 28 27 27 28 28 27 28 28<br />

112 19 20 20 21 19 20 20 22 20 18 29 30 29 29 29 29 29 29 29 29<br />

113 21 21 18 17 24 16 17 16 17 20 22 23 22 22 23 22 22 22 22 23<br />

114 17 16 18 17 17 17 16 20 17 16 136 136 129 135 129 135 123 140 124 127<br />

115 18 18 23 19 17 19 18 18 19 16 27 28 28 27 28 35 28 27 27 27<br />

116 22 31 21 20 19 20 21 21 21 18 29 34 30 29 29 29 30 29 29 29<br />

117 24 16 16 17 21 16 17 17 17 20 22 23 23 23 23 22 22 23 22 44<br />

118 17 16 17 17 16 16 16 21 16 29 135 138 135 142 138 135 134 138 140 126<br />

119 17 19 23 19 16 18 19 19 19 22 27 27 28 27 27 30 27 27 27 27<br />

120 19 24 21 21 18 20 20 20 21 18 29 29 28 29 29 29 29 29 29 33<br />

121 21 16 18 21 21 16 16 16 17 20 22 23 23 22 23 23 23 27 22 23<br />

122 16 17 16 18 17 16 16 20 17 17 134 134 123 124 135 139 136 123 124 126<br />

123 17 17 18 19 16 18 18 18 19 16 22 27 26 29 22 22 23 27 27 26<br />

124 18 21 27 22 17 21 22 21 22 18 20 21 20 27 20 20 21 27 20 19<br />

125 21 17 20 17 21 15 16 17 16 21 21 22 22 23 23 22 23 22 23 23<br />

126 17 20 21 17 19 16 16 24 16 16 136 134 134 135 133 133 141 133 134 133<br />

127 20 19 20 22 23 18 19 35 18 16 22 27 22 22 23 27 23 29 22 26<br />

128 19 15 17 18 18 16 17 16 16 18 20 20 20 20 21 20 27 20 20 20<br />

129 16 16 19 16 16 15 16 16 16 21 21 22 23 22 22 26 23 22 22 22<br />

130 20 18 21 18 17 18 18 18 17 16 136 136 133 134 135 135 133 122 134 136<br />

131 22 27 25 22 18 22 21 26 21 15 22 22 22 22 22 27 28 27 23 26<br />

132 23 22 16 24 21 15 16 15 16 19 20 24 27 20 20 20 21 21 20 23<br />

133 17 16 16 21 16 16 20 16 16 15 21 22 22 22 22 23 23 22 26 21<br />

134 17 18 23 19 16 18 18 18 19 15 135 141 133 140 134 134 135 140 135 134<br />

135 19 16 17 17 18 17 16 17 17 18 27 27 27 28 28 27 27 27 27 27<br />

136 20 17 17 17 23 17 17 17 16 21 27 27 27 32 30 27 29 31 31 27<br />

97


A. Appendix<br />

A.4. Evaluation Data: Canny Preprocessing<br />

We performed 20 test-runs with the class PreprocessingTimeTest (which is responsible<br />

for the wrong frame order). 10 runs were done with standard JVM options, 10 were<br />

done with the following tuning: -verbose:gc -Xms64m -Xmx256m -XX:NewRatio=2<br />

-XX:+UseConcMarkSweepGC.<br />

The measurement unit is milliseconds.<br />

frame JVM tuned JVM standard options<br />

3 202 205 229 217 225 203 218 222 214 202 343 412 345 352 345 348 342 346 358 357<br />

4 157 164 173 156 156 157 161 156 153 160 242 237 248 231 251 245 258 250 244 247<br />

5 147 152 147 148 148 152 148 152 149 148 167 212 170 170 176 170 170 165 164 185<br />

6 151 144 144 152 145 145 145 145 155 144 227 228 236 229 235 236 226 224 230 232<br />

7 152 151 149 153 159 148 152 148 149 148 226 221 220 222 220 222 230 223 222 217<br />

8 155 154 155 153 150 156 161 151 155 152 234 218 210 207 210 209 215 213 210 207<br />

9 144 558 145 148 146 146 144 150 144 148 214 210 213 208 194 218 210 207 209 207<br />

10 147 163 147 152 148 155 147 148 148 152 224 212 230 217 197 219 211 206 216 221<br />

1 152 152 146 151 151 148 146 146 146 148 211 212 213 208 216 216 214 210 211 209<br />

11 230 367 226 231 230 229 227 231 245 230 209 208 214 211 211 211 234 213 210 214<br />

12 151 273 150 147 147 148 152 148 148 147 210 210 216 213 211 208 214 208 210 209<br />

13 145 229 145 150 145 146 145 152 151 149 214 212 212 213 212 227 215 215 213 221<br />

14 145 203 144 144 143 145 144 145 143 148 211 208 210 212 192 211 216 209 213 223<br />

15 146 166 147 149 152 147 146 153 145 148 208 210 207 206 185 206 216 216 206 207<br />

16 152 146 146 151 149 147 153 147 147 149 206 212 214 206 213 207 210 207 213 206<br />

17 146 211 146 143 148 143 145 142 143 143 214 212 212 213 214 214 188 206 222 217<br />

18 150 160 150 151 144 144 148 145 149 145 208 207 214 208 213 207 191 208 206 208<br />

19 146 172 144 145 144 151 146 152 145 146 207 206 207 212 212 211 221 211 206 207<br />

20 147 148 148 177 148 149 146 153 147 151 208 207 209 209 207 208 211 209 210 210<br />

21 141 154 141 146 147 142 141 141 144 142 217 212 218 214 218 213 223 207 242 213<br />

22 146 171 148 151 146 146 146 145 168 145 214 211 208 207 208 213 216 209 206 207<br />

23 148 159 147 149 148 143 145 143 146 143 208 187 186 213 207 206 212 212 221 207<br />

24 155 376 157 148 147 147 151 148 157 148 208 187 189 207 207 207 208 206 208 213<br />

25 151 365 147 152 149 154 148 153 149 149 230 205 226 213 214 214 210 210 211 215<br />

26 143 532 138 144 143 149 147 144 231 145 209 208 211 208 207 186 210 209 185 208<br />

27 209 372 283 223 219 210 210 209 435 210 220 208 208 215 209 185 216 217 194 194<br />

28 177 345 173 149 151 146 147 146 156 146 210 208 211 213 208 218 208 209 221 194<br />

29 157 331 165 149 149 148 154 149 167 148 214 218 216 214 216 217 213 209 213 192<br />

30 143 316 142 142 143 144 142 148 142 145 209 208 208 208 214 207 214 209 212 224<br />

31 145 291 146 146 147 147 147 150 147 152 211 209 210 208 208 208 212 225 214 209<br />

32 145 259 146 147 173 150 145 144 147 144 209 207 209 215 209 210 209 207 208 214<br />

33 149 255 152 153 153 148 147 151 171 148 231 220 218 216 216 215 209 218 214 216<br />

34 152 239 149 156 154 148 150 149 148 148 213 209 209 209 207 207 212 209 208 209<br />

35 148 234 144 143 144 143 149 144 143 144 209 209 214 208 209 208 215 214 211 209<br />

36 150 191 189 146 145 146 153 156 146 145 211 212 210 209 208 212 208 211 210 209<br />

37 145 163 207 145 145 152 145 150 146 149 220 216 216 217 215 212 211 209 213 217<br />

38 147 171 199 151 149 148 147 148 148 150 210 209 213 207 208 212 212 208 207 208<br />

39 142 246 143 146 142 147 142 141 148 145 208 208 208 207 209 208 213 217 207 208<br />

40 153 150 148 146 154 147 149 163 147 146 209 207 207 207 209 208 207 207 189 212<br />

41 144 144 143 149 144 146 151 144 142 150 214 216 218 215 215 219 217 207 193 214<br />

42 148 147 298 147 148 150 148 176 148 144 208 208 207 212 210 193 210 212 223 212<br />

43 212 148 155 217 220 219 213 220 213 213 207 208 208 207 212 186 218 213 208 208<br />

44 154 173 193 152 144 148 147 147 152 149 209 211 213 209 208 185 212 227 207 206<br />

45 147 146 144 145 150 145 146 146 145 145 217 216 192 215 217 225 208 208 221 214<br />

46 150 151 145 146 145 145 150 147 145 145 209 209 186 207 211 214 211 210 208 208<br />

47 148 148 146 152 152 152 148 153 174 148 207 209 209 212 208 228 214 212 207 212<br />

48 142 141 144 143 142 147 144 142 143 146 213 208 237 209 208 186 212 208 207 208<br />

49 145 145 151 148 146 146 145 144 145 145 222 215 222 213 194 188 207 216 215 213<br />

50 144 158 146 148 143 143 144 148 145 143 209 207 207 211 184 212 214 210 211 212<br />

51 152 194 146 148 152 147 150 148 149 149 209 215 210 208 199 214 213 215 208 209<br />

52 157 267 149 149 149 149 152 149 149 148 212 208 208 209 207 217 209 208 212 186<br />

53 148 266 167 142 143 142 143 147 143 144 214 217 216 219 214 209 208 208 216 189<br />

54 147 157 151 145 144 150 144 145 146 154 210 214 214 209 210 212 210 210 208 227<br />

55 144 151 146 152 155 148 145 139 152 147 212 209 209 210 208 212 213 230 208 208<br />

56 153 149 150 155 151 148 150 148 148 149 212 207 208 210 207 211 208 194 208 186<br />

57 147 139 208 143 143 142 143 147 142 143 215 217 218 215 217 209 214 185 216 187<br />

58 212 147 146 214 214 213 222 218 241 217 210 209 217 210 209 212 211 186 210 207<br />

59 145 145 142 144 144 146 146 144 146 149 209 216 209 218 209 210 220 245 209 209<br />

60 153 149 153 151 148 154 148 149 154 148 213 210 213 209 207 212 209 194 208 186<br />

61 150 149 171 153 150 149 148 148 150 148 218 217 217 216 211 209 208 186 224 187<br />

62 148 144 142 149 149 142 144 145 144 145 209 211 209 209 218 213 215 186 208 222<br />

continued on next page..<br />

98


A. Appendix<br />

frame JVM tuned JVM standard options<br />

63 150 146 144 145 152 150 150 148 146 145 212 209 209 208 210 210 191 237 208 186<br />

64 145 146 146 145 145 146 146 153 146 146 212 208 207 209 208 208 186 208 207 187<br />

65 148 149 153 149 148 154 148 151 149 152 214 217 217 215 212 209 210 208 216 217<br />

66 142 214 135 145 142 141 141 141 147 142 209 209 208 208 208 214 207 186 209 211<br />

67 147 149 143 151 146 150 146 145 146 145 208 214 214 213 209 212 230 188 213 211<br />

68 147 144 144 144 149 143 147 143 143 172 213 209 208 209 210 186 210 224 207 209<br />

69 153 149 148 147 148 148 153 149 147 151 215 216 213 215 211 187 222 209 220 210<br />

70 149 149 151 149 150 150 149 153 149 153 210 208 186 209 213 208 209 215 208 210<br />

71 143 146 182 151 143 149 144 143 146 144 227 214 187 209 211 208 215 211 210 214<br />

72 144 146 212 149 146 147 144 144 150 146 212 208 221 212 208 209 208 214 209 214<br />

73 149 145 145 150 146 144 146 150 145 145 220 216 212 219 189 208 213 210 215 208<br />

74 219 147 148 214 213 213 218 214 212 212 207 209 212 208 193 217 209 207 211 211<br />

75 143 143 145 141 142 143 142 148 143 144 209 210 215 208 219 209 214 210 208 211<br />

76 146 148 150 146 146 150 146 147 146 150 213 208 208 209 208 208 208 212 207 209<br />

77 149 154 149 144 148 149 148 148 149 149 220 220 216 221 246 219 217 217 225 216<br />

78 153 152 152 164 153 152 151 152 152 151 213 217 223 214 220 220 215 216 213 228<br />

79 162 183 152 158 156 152 156 153 153 152 218 214 213 216 217 213 228 215 212 213<br />

80 149 255 149 153 157 148 152 150 149 151 218 212 215 212 217 218 213 217 216 213<br />

81 150 157 152 150 149 150 153 158 151 151 219 234 217 220 217 213 214 214 220 214<br />

82 150 159 155 150 150 162 152 151 152 151 209 212 212 217 218 218 216 213 212 218<br />

83 151 152 153 156 153 152 151 152 166 153 191 214 214 214 199 214 219 216 213 215<br />

84 150 146 145 151 148 147 147 146 153 147 195 217 213 214 190 216 217 217 217 213<br />

85 151 146 146 147 146 145 147 148 146 147 226 216 215 217 192 210 208 209 217 219<br />

86 144 145 146 144 146 144 149 147 143 145 209 210 208 215 232 217 211 213 209 214<br />

87 147 148 153 147 149 149 147 147 148 153 213 210 209 208 219 212 214 211 211 210<br />

88 148 153 213 155 148 154 149 148 150 148 212 208 208 209 207 209 208 208 208 209<br />

89 210 143 146 215 216 208 208 214 214 209 214 218 218 218 211 209 208 209 215 248<br />

90 148 145 145 146 149 145 146 149 144 144 210 211 204 209 215 218 212 209 212 213<br />

91 150 166 146 145 146 149 155 148 145 144 209 209 186 208 208 209 220 215 209 187<br />

92 149 151 152 147 146 148 147 152 146 148 212 213 189 210 207 208 208 210 210 186<br />

93 141 148 148 142 141 148 142 142 143 146 194 220 218 215 216 209 209 209 219 229<br />

94 145 217 146 149 146 148 146 146 156 146 187 209 213 212 220 217 210 212 214 210<br />

95 145 143 143 148 143 143 143 144 144 143 213 210 209 213 208 207 213 213 208 209<br />

96 151 148 147 147 153 147 149 149 146 148 207 208 208 209 208 209 208 208 208 209<br />

97 148 149 148 154 149 148 154 149 148 150 228 215 216 216 212 213 246 214 221 208<br />

98 143 144 147 141 143 144 144 148 143 146 208 212 215 209 214 214 210 208 207 211<br />

99 150 145 149 149 145 150 145 146 146 146 211 188 213 211 209 211 224 212 209 214<br />

100 145 145 146 149 146 145 145 147 151 148 209 186 208 210 208 204 209 208 208 218<br />

101 156 148 148 148 149 148 149 148 148 149 219 228 211 194 217 192 208 209 217 209<br />

102 148 143 141 144 143 147 144 144 142 144 210 209 215 191 213 192 211 212 215 216<br />

103 146 148 216 146 145 146 150 148 145 148 209 217 215 224 209 216 214 215 213 209<br />

104 145 150 149 143 143 145 144 145 144 149 210 211 209 209 208 187 220 209 237 210<br />

105 216 149 149 220 216 221 216 216 217 216 215 217 212 215 212 186 208 210 214 209<br />

106 151 150 148 152 152 148 157 150 155 148 210 213 213 208 218 202 187 209 209 219<br />

107 147 143 143 152 148 142 144 144 143 143 212 210 215 209 212 211 264 211 209 212<br />

108 146 217 146 148 145 145 150 146 144 144 209 187 209 212 209 209 222 209 209 214<br />

109 150 149 151 148 146 147 146 150 147 146 228 190 216 218 214 212 240 213 219 209<br />

110 150 152 154 148 155 154 149 149 150 153 208 210 210 224 215 225 248 210 208 225<br />

111 146 146 147 150 147 146 145 145 156 146 213 213 214 211 213 216 231 221 213 213<br />

112 154 151 149 151 151 155 151 151 152 150 216 219 212 212 211 213 223 214 214 217<br />

113 153 149 181 150 149 149 149 150 148 149 224 216 216 240 215 212 220 213 218 213<br />

114 152 156 154 152 153 152 156 154 151 154 214 214 216 213 217 218 234 213 212 218<br />

115 150 156 156 150 151 152 151 151 151 156 215 213 225 216 215 212 271 225 218 215<br />

116 152 151 147 155 152 158 154 152 152 149 218 213 213 215 213 215 333 213 215 219<br />

117 150 149 148 160 152 150 150 149 153 148 220 221 222 232 195 220 225 215 219 218<br />

118 153 150 150 150 156 151 151 150 150 150 219 217 213 217 197 218 214 213 212 219<br />

119 158 153 221 153 153 152 165 154 152 153 214 214 214 213 219 213 405 221 212 213<br />

120 149 147 152 145 149 148 146 158 212 148 213 213 217 214 211 214 213 222 219 213<br />

121 215 149 151 217 215 222 215 216 150 221 202 220 222 221 215 213 220 214 219 215<br />

122 150 148 149 153 149 147 147 147 152 148 191 211 213 214 213 224 212 212 213 219<br />

123 150 218 147 141 152 153 149 147 147 147 218 188 210 209 186 210 209 217 209 214<br />

124 153 149 147 150 148 149 149 148 148 148 214 186 211 210 186 208 211 209 213 214<br />

125 143 144 150 142 143 144 146 146 142 146 215 224 232 215 225 209 217 253 215 209<br />

126 138 149 150 146 143 146 145 146 145 150 210 212 209 213 214 214 213 209 215 216<br />

127 151 147 146 153 146 150 146 145 146 145 210 210 219 214 210 213 208 223 208 209<br />

128 149 147 147 154 149 149 148 147 176 148 213 209 208 195 208 209 209 208 210 212<br />

129 145 142 142 143 149 142 145 143 149 143 215 217 218 195 211 210 214 214 214 212<br />

130 150 146 145 146 145 145 151 153 147 145 214 208 208 187 218 215 208 204 208 216<br />

131 149 145 147 142 146 145 143 154 143 145 211 209 217 226 208 208 207 191 213 209<br />

132 147 154 147 147 147 147 146 148 148 153 214 209 210 215 210 213 215 188 212 210<br />

133 147 148 149 151 150 153 153 148 150 148 226 237 235 222 218 221 224 229 219 225<br />

134 142 142 209 148 144 145 142 142 147 142 208 214 208 208 213 214 206 208 207 214<br />

135 150 145 145 146 151 151 146 146 145 146 213 209 220 209 209 209 208 217 213 216<br />

136 151 147 148 147 146 146 152 147 214 149 213 187 213 216 209 213 210 208 214 209<br />

2 214 154 153 214 214 214 214 219 151 215 228 190 216 194 187 214 217 213 227 211<br />

99


A. Appendix<br />

A.5. Evaluation Data: Sobel Preprocessing<br />

We performed 20 test-runs with the class PreprocessingTimeTest (which is responsible<br />

for the wrong frame order). 10 runs were done with standard JVM options, 10 were<br />

done with the following tuning: -verbose:gc -Xms64m -Xmx256m -XX:NewRatio=2<br />

-XX:+UseConcMarkSweepGC.<br />

The measurement unit is milliseconds.<br />

frame JVM tuned JVM standard options<br />

3 2 2 2 2 2 3 2 2 2 2 2 2 2 2 1 2 1 2 2 5<br />

4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1<br />

5 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 0 1 1 1<br />

6 2 1 2 1 1 2 1 2 1 1 1 12 1 1 1 1 1 1 1 11<br />

7 6 5 5 5 4 5 5 5 5 4 4 5 15 4 5 5 4 5 5 4<br />

8 2 2 2 2 2 2 1 2 2 2 2 2 13 13 2 2 2 2 3 2<br />

9 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1<br />

10 1 1 0 1 1 1 0 1 1 1 1 1 0 1 1 11 1 1 1 1<br />

1 1 1 1 0 0 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1<br />

11 1 1 0 1 0 1 1 1 0 1 1 0 0 1 1 1 1 1 0 1<br />

12 1 0 1 1 1 1 1 0 1 0 1 0 1 0 1 1 1 1 1 1<br />

13 1 1 1 1 1 1 1 1 1 0 1 2 1 1 1 1 1 2 1 1<br />

14 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1<br />

15 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1<br />

16 1 3 1 1 2 2 2 1 2 6 2 1 13 2 2 1 2 2 2 2<br />

17 1 1 1 0 1 1 1 1 1 1 1 5 4 1 1 1 0 0 1 1<br />

18 2 2 1 1 2 2 2 2 1 2 3 3 2 2 3 3 3 13 3 2<br />

19 1 2 1 1 2 1 2 1 1 3 1 1 1 1 1 1 1 1 1 1<br />

20 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 12 1 1 1 1<br />

21 1 0 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0<br />

22 1 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 0 12 1<br />

23 2 1 1 2 2 2 1 2 1 1 2 1 1 2 13 2 2 1 1 1<br />

24 1 1 0 0 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1<br />

25 1 1 1 1 1 1 1 1 1 2 1 1 1 5 1 1 1 1 1 1<br />

26 1 1 0 1 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 1<br />

27 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1<br />

28 1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 0<br />

29 1 1 1 1 1 1 1 0 0 1 1 1 0 0 1 0 0 1 1 1<br />

30 0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 1<br />

31 1 0 0 1 1 1 1 1 0 0 1 0 0 1 0 1 1 1 1 1<br />

32 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 11 12 1 2 1<br />

33 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0<br />

34 1 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 0 0 0 0<br />

35 1 1 1 0 0 1 1 0 1 1 1 1 0 0 1 1 1 0 1 1<br />

36 0 1 0 0 1 0 0 1 1 1 1 1 0 1 1 0 1 12 0 1<br />

37 4 1 1 1 0 1 0 0 1 1 0 0 1 0 1 1 1 0 0 0<br />

38 1 1 0 0 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1<br />

39 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1<br />

40 0 1 1 0 0 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1<br />

41 1 1 1 1 0 0 1 1 1 0 1 0 5 0 1 1 0 1 1 0<br />

42 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 12 1 1 1 1<br />

43 1 1 1 0 0 0 0 0 0 1 0 1 1 1 11 1 1 1 1 1<br />

44 1 1 1 1 0 1 1 0 0 1 1 12 1 1 1 1 1 0 0 1<br />

45 1 1 1 1 1 0 1 1 1 0 1 0 0 1 1 1 1 0 1 1<br />

46 1 1 1 1 1 1 0 1 1 1 1 2 1 1 1 1 0 1 1 1<br />

47 1 0 0 1 1 1 1 1 1 0 1 2 1 1 0 1 1 1 1 1<br />

48 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1<br />

49 1 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 12 1 0 0<br />

50 1 1 1 1 1 0 1 0 1 0 1 1 1 1 1 0 1 1 1 1<br />

51 1 1 1 3 1 0 1 3 3 3 1 1 1 0 1 0 0 1 0 1<br />

52 1 3 2 1 3 3 3 1 1 8 0 1 1 1 1 1 0 1 0 0<br />

53 1 0 0 1 1 1 1 0 1 1 1 1 1 5 1 0 0 0 5 1<br />

54 1 1 0 0 0 1 0 1 1 0 0 1 0 0 1 1 1 1 0 1<br />

55 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1<br />

56 1 1 1 0 0 1 1 0 0 1 0 1 1 0 1 1 1 1 1 1<br />

57 1 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1<br />

58 1 1 1 0 1 0 1 1 1 1 1 1 12 1 1 0 1 1 1 0<br />

59 1 0 1 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0<br />

60 1 1 1 1 2 1 1 1 1 1 1 0 0 1 1 1 1 1 1 2<br />

61 1 0 1 1 1 1 1 1 1 0 0 1 0 1 2 0 12 1 1 0<br />

62 1 1 0 1 1 1 1 1 1 0 1 1 0 0 13 0 1 1 0 11<br />

continued on next page..<br />

100


A. Appendix<br />

frame JVM tuned JVM standard options<br />

63 1 1 1 1 1 0 0 1 1 1 0 0 1 1 0 0 1 0 1 1<br />

64 9 10 6 6 10 6 6 6 5 6 3 3 3 9 4 14 4 3 3 3<br />

65 0 1 0 1 0 0 1 1 1 1 1 1 1 0 1 0 1 1 0 0<br />

66 1 1 1 1 1 1 2 1 2 2 1 1 1 0 1 1 1 1 1 0<br />

67 1 1 1 1 1 1 0 1 0 1 1 0 1 0 0 1 0 1 1 1<br />

68 1 1 1 1 0 0 1 1 0 0 1 1 1 0 1 1 0 1 0 0<br />

69 3 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2<br />

70 1 1 1 1 1 1 1 1 1 0 5 1 1 1 1 1 1 1 1 0<br />

71 1 1 0 0 0 0 0 1 1 0 1 0 0 1 1 1 0 1 1 1<br />

72 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 4 1 1 0 0<br />

73 0 1 0 0 1 1 1 0 1 1 0 4 1 5 1 1 1 1 1 0<br />

74 1 1 0 0 1 0 0 0 1 1 4 0 1 1 1 1 1 1 16 4<br />

75 1 1 1 1 0 1 0 1 1 0 0 1 1 0 15 1 4 1 4 0<br />

76 1 0 1 0 0 0 1 1 1 0 1 1 4 1 1 1 1 1 1 0<br />

77 1 0 0 0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1<br />

78 1 1 1 0 1 0 1 0 1 1 1 1 2 1 1 1 0 1 1 1<br />

79 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 1 1 0 1<br />

80 6 1 1 1 1 1 0 0 1 0 1 1 1 1 0 1 1 0 0 1<br />

81 0 1 0 0 1 0 1 0 0 1 0 1 1 0 0 1 1 1 1 1<br />

82 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 0 0 1 0<br />

83 1 1 1 1 1 1 1 0 1 0 1 1 11 1 1 1 1 0 0 1<br />

84 1 0 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 1 0 1<br />

85 1 1 1 0 1 1 0 1 0 0 0 0 1 1 1 0 1 0 0 0<br />

86 0 0 0 0 1 1 0 1 0 0 0 0 1 1 1 1 1 0 1 1<br />

87 1 0 1 1 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1<br />

88 1 1 1 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 0 1<br />

89 1 0 1 1 0 1 1 1 1 1 1 14 1 1 1 0 1 0 1 1<br />

90 0 1 0 0 0 1 1 1 1 1 0 12 1 1 1 1 0 1 1 1<br />

91 0 0 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 0 1<br />

92 1 1 0 0 1 1 1 0 1 1 1 1 1 1 0 0 0 0 0 1<br />

93 1 0 0 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 1 0<br />

94 1 1 1 0 1 0 1 1 1 1 1 0 0 1 0 1 1 0 1 1<br />

95 2 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0 1<br />

96 0 1 1 1 1 1 1 0 1 1 1 11 0 1 1 1 0 1 1 0<br />

97 0 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0<br />

98 1 0 1 2 0 0 0 1 1 1 1 0 0 0 2 0 1 1 1 0<br />

99 1 1 0 0 1 1 0 0 0 1 1 0 1 1 1 1 1 1 11 1<br />

100 1 0 0 0 1 1 0 1 0 0 0 1 1 1 1 1 1 0 12 1<br />

101 0 1 0 1 0 1 1 0 0 1 1 0 1 0 1 1 0 1 1 1<br />

102 1 1 1 0 0 0 1 0 0 0 1 0 1 1 0 4 0 11 1 1<br />

103 0 0 1 1 1 1 0 1 1 0 1 3 1 14 0 0 0 3 0 1<br />

104 1 1 1 0 1 0 1 1 0 0 4 1 0 1 0 1 1 1 3 0<br />

105 0 1 1 1 0 0 1 1 1 1 0 0 3 0 7 1 4 0 0 1<br />

106 1 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 1 1 0 1<br />

107 1 2 1 1 1 2 1 1 1 2 1 2 2 2 2 1 1 1 1 1<br />

108 0 0 1 1 1 1 1 1 0 0 1 0 17 0 1 1 1 1 0 1<br />

109 1 1 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1<br />

110 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 1 0 1<br />

111 0 2 2 1 2 1 1 1 0 1 1 1 1 0 0 0 0 1 1 1<br />

112 0 1 0 0 1 1 0 1 0 0 1 1 0 0 1 0 0 1 1 1<br />

113 1 1 1 0 0 1 0 1 1 0 1 0 1 0 3 0 2 1 1 1<br />

114 0 1 0 0 0 1 1 0 1 0 1 0 1 1 1 0 0 1 1 2<br />

115 1 1 1 1 1 1 0 1 0 1 1 0 1 1 1 0 0 1 1 0<br />

116 1 0 1 1 1 1 1 1 1 1 0 3 1 1 1 1 0 1 1 1<br />

117 1 1 0 0 0 0 1 1 1 0 2 0 0 1 0 1 0 2 13 1<br />

118 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0 2 1 0 0<br />

119 1 0 1 1 1 1 0 1 0 0 1 0 1 0 1 1 1 1 1 0<br />

120 0 0 0 1 0 1 1 0 0 1 1 0 1 1 0 1 1 1 0 2<br />

121 1 1 1 1 1 9 1 0 0 1 1 0 0 0 3 0 3 1 1 1<br />

122 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1 0 0 1 1 0<br />

123 0 1 0 1 0 0 1 1 0 0 1 0 1 1 1 1 2 0 1 1<br />

124 0 1 1 0 0 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1<br />

125 1 0 0 1 0 0 1 1 0 1 1 1 2 0 1 1 1 1 0 0<br />

126 1 1 0 1 0 0 1 0 0 1 1 1 1 1 0 1 1 1 1 1<br />

127 3 1 0 1 0 1 0 0 1 1 0 0 1 1 1 2 1 1 0 1<br />

128 1 0 0 1 1 1 0 0 1 1 0 4 1 2 0 1 1 2 1 11<br />

129 0 0 0 1 0 1 0 0 1 0 2 3 1 1 1 1 1 0 14 0<br />

130 0 1 0 1 1 1 0 0 1 0 1 0 3 0 3 0 2 1 0 1<br />

131 1 0 1 0 0 1 0 1 1 1 0 1 12 1 1 1 1 1 1 1<br />

132 1 0 0 1 1 0 0 1 0 0 0 12 0 1 1 0 0 0 1 1<br />

133 0 0 1 1 0 0 1 1 1 0 0 1 1 0 1 1 1 1 1 1<br />

134 1 1 0 1 1 1 1 0 1 0 0 0 1 0 1 1 1 1 1 1<br />

135 1 1 0 1 1 0 0 1 1 0 1 0 0 1 0 0 11 0 1 1<br />

136 3 0 1 1 1 1 1 1 1 0 1 1 0 0 1 1 0 11 0 1<br />

2 1 1 1 1 1 0 0 0 1 1 0 0 1 0 1 1 0 1 0 0<br />

101

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!