Diploma Thesis - Bad Request - Fachhochschule Vorarlberg

Moment-based 

Facial Feature Tracking 

using Java 

Diploma Thesis in the Degree Program 

iTec - Information and Communication Engineering 

Manuela Hutter 

011 0109 038 

Supervisor: Dipl.-Ing. (FH) Walter Ritter 

Dornbirn, August 2005 

Fachhochschule Vorarlberg GmbH

Eidesstattliche Erklärung 

Ich erkläre hiermit ehrenwörtlich, dass ich die vorliegende Arbeit selbständig angefer- 

tigt habe. Die aus fremden Quellen direkt oder indirekt übernommenen Gedanken sind 

als solche kenntlich gemacht. Die Arbeit wurde bisher keiner anderen Prüfungsbehörde 

vorgelegt und auch noch nicht veröffentlicht. 

Acknowledgments 

Manuela Hutter (Dornbirn, August 2005) 

Many people helped me with this work in one way or another. I especially thank 

Walter Ritter, my supervisor, for his patience and help in all concerns. I am grateful 

to my introductory supervisor Miglena Dontschewa, who provided the initial idea for 

this thesis and enthused me for it; to Guido Kempter, who supported me in a diffi- 

cult phase of the project and assisted my statistical analyses; and to Avinash Manian, 

who gave me a helping hand with the data analysis in SPSS (thanks for your patience). 

I thank Colin Gregory-Moores and Lisa Newman for helping me with the basic structure 

of my English writing; Regine Bolter, the head of the study program, for giving me 

important hints to get on the right track; and Justin Zobel for writing the most helpful 

book about “writing for computer science”[Zobel, 2004]. Thanks to Wolfgang Mähr 

for proofreading and making helpful suggestions, and to my brother Matthias Hutter 

for collecting statistical data. Last but not least, thanks to my parents, Christine and 

Josef Hutter, for their personal and financial commitment. 

For scientific work with ethical awareness and without animal abuse. 

The use of registered names, trademarks etc. in this material does not imply, even in the absence of a specific statement, that 

such names are exempt from the relevant protective laws and regulations and therefore free for general use. 

i

Zusammenfassung 

Diese Diplomarbeit stellt ein plattform-unabhängiges, in Java entwickeltes Programm 

für die Gesichtsbewegungserkennung vor. Trackingalgorithmen, die markante Punkte 

im menschlichen Gesicht lokalisieren und verfolgen, sind eine wichtige Grundlage für 

viele unterschiedliche, darauf aufbauende Anwendungen: in der 3D-Modellanimation 

werden Punkte für die Gesichtsanimation eines Charakters benötigt; Analysen von 

menschlichen Emotionen verwenden die Punkte für automatische Klassifikation der 

Gesichtsmimik; und alternative Benutzerschnittstellen können Gesichtsbewegungen 

als Basis ihrer Funktionsweise benutzen. Zahlreiche Forschungsarbeiten beschreiben 

Bemühungen im Bereich der Erkennung von Gesichtsbewegungen. Trotzdem sind 

praktikable Lösungen selten. Es wurde nur eine Anwendung auf dem Markt gefunden, 

welche das Trackingproblem in Echtzeit und ohne physische Markierungen auf dem 

untersuchten Gesicht löst; es funktioniert allerdings nur auf Windows-Plattformen. 

Die entwickelte Java-Anwendung kann auf allen Plattformen verwendet werden, auf 

denen eine ‘Java Virtual Machine’ installiert ist. Sie benutzt eine Trackingmethode, 

die auf Bildmomenten und einer ‘Binary Space Partitioning’-Datenstruktur basiert, 

und einen Canny-Kantendetektor für die Datenaufbereitung verwendet. Die Software 

arbeitet mit Video-Eingangsdaten, ohne Markierungen auf dem betrachteten Gesicht. 

Sie hat eine modulare Programmstruktur, welche die Verwendung und den Austausch 

von externen Bibliotheken zulässt. Derzeit werden das ‘Java Media Framework’ für die 

Extrahierung der Video-Frames, und entweder ‘Java2D’ oder ‘Java Advanced Imag- 

ing’ für die Bildaufbereitung verwendet. Das Programm kann relevante Merkmale 

in vorausgewählten Bildregionen finden. Obwohl die extrahierten Punkte nicht mit 

standardisierten Gesichtsparametern wie den MPEG-4 ‘Facial Animation Parameters’ 

übereinstimmen, zeigen 2 untersuchte Beispielpunkte bemerkenswerte Korrelationen 

von bis zu 98% im Vergleich zu manuell ermittelten Punkten; das Erkennen von 

Gesichtsmerkmalen auf einem vorverarbeiteten Bild dauert zirka 5 ms. Nachdem der 

Tracking-Prozess abgeschlossen ist, können die gefundenen Punkte in einer Ausgabe- 

datei gespeichert werden, um sie für nachfolgende übergeordnete Aufgaben verfügbar 

zu machen. 

ii

Abstract 

In this thesis, we present a platform-independent program for facial feature tracking, 

implemented in Java. Facial feature tracking algorithms, which locate and pursue dis- 

tinctive points in a human face, are an important basis for many different high-level 

tasks: 3D model animation needs feature points for moving the model’s facial fea- 

tures; programs that analyze human emotions use the points for automatic emotion 

recognition; and facial movements may provide a basis for alternative user interfaces. 

Numerous papers describe research efforts in the field of facial feature tracking. Nev- 

ertheless, practicable solutions are rare. We found only one application on the market 

that solves the tracking task in realtime and without physical markers on the tracked 

face. However, it only works on Windows platforms. The implemented Java tracking 

program can be used on all platforms that have a ‘Java Virtual Machine’ installed. It 

uses a tracking method based on image moments and a ’Binary Space Partitioning’ 

data structure, the input data is prepared by a Canny edge detection mechanism. 

The software works on video input, without markers on the processed face. It has 

a modular program structure that allows for the use and interchange of external li- 

braries. Currently, it uses the ‘Java Media Framework’ for video frame extraction, 

and either ‘Java2D’ or ‘Java Advanced Imaging’ for preprocessing. The program is 

able to find relevant feature points in preselected image regions. While the extracted 

points are not in accordance with point definition standards like the MPEG-4 ‘Facial 

Animation Parameters’, 2 tested sample points show remarkable correlations of up to 

98% in comparison to manually ascertained points; the computation time of feature 

points on a preprocessed image region lies around 5 ms. After the tracking process, 

the extracted points can be saved to an output file in order to make them available 

for subsequent higher level tasks. 

iii

Contents 

Introduction 1 

1. State of the Art 5 

1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 

1.2. Basic Tracking Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 

1.3. Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 

1.3.1. Optical Flow Techniques . . . . . . . . . . . . . . . . . . . . . . 8 

1.3.2. Active Contours (Snakes) . . . . . . . . . . . . . . . . . . . . . 12 

1.3.3. Image Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 

1.4. Commercial Implementations . . . . . . . . . . . . . . . . . . . . . . . 22 

1.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 

2. Algorithms in Consideration 27 

2.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 

2.2. Testing Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 

2.2.1. Testing Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 

2.2.2. Input Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 

2.3. Testing Snake Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 29 

2.4. Testing Image Moments . . . . . . . . . . . . . . . . . . . . . . . . . . 34 

2.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 

3. Input Data and Its Preparation 36 

3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 

3.2. Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 

3.2.1. Data Format Prerequisites . . . . . . . . . . . . . . . . . . . . . 36 

3.2.2. Video Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 

3.2.3. Video Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 

3.3. Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 

iv

Contents 

3.3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 

3.3.2. Edge Detection Algorithms . . . . . . . . . . . . . . . . . . . . 44 

3.3.3. Edge Detector Realization . . . . . . . . . . . . . . . . . . . . . 47 

3.3.4. Further Improvements . . . . . . . . . . . . . . . . . . . . . . . 48 

3.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 

4. Programming 50 

4.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 

4.2. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 

4.2.1. Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 

4.2.2. Basic Application Flow . . . . . . . . . . . . . . . . . . . . . . 54 

4.2.3. Tracking Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 55 

4.3. Implementation Process . . . . . . . . . . . . . . . . . . . . . . . . . . 60 

4.3.1. Working Environment . . . . . . . . . . . . . . . . . . . . . . . 60 

4.3.2. Difficulties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 

4.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 

5. Evaluation 64 

5.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 

5.2. Program Abilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 

5.3. Tracking Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 

5.3.1. Test Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 

5.3.2. Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 

5.3.3. Statistical Evaluation . . . . . . . . . . . . . . . . . . . . . . . 69 

5.4. Time Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 

5.4.1. Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 

5.4.2. Statistical Evaluation . . . . . . . . . . . . . . . . . . . . . . . 77 

5.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 

Conclusions 80 

Bibliography 82 

Glossary 86 

v

Contents 

A. Appendix 92 

A.1. Evaluation Data: Coordinates of Corners of the Mouth . . . . . . . . . 92 

A.2. Evaluation Data: Tracking Mouth Area Selection (60x20) . . . . . . . 94 

A.3. Evaluation Data: Tracking of Whole Area Selection (384x288) . . . . . 96 

A.4. Evaluation Data: Canny Preprocessing . . . . . . . . . . . . . . . . . . 98 

A.5. Evaluation Data: Sobel Preprocessing . . . . . . . . . . . . . . . . . . 100 

vi

Introduction 

In this thesis, we develop a modular Java application that is able to detect and track 

facial features in video input data. The program finds distinctive points in the first 

frame of the input video and tracks them in subsequent video frames. Such facial fea- 

ture trackers are needed in different fields of computer vision. One area of application 

includes facial expression recognition, classification and detection of emotional states, 

where feature points can be used for an automated recognition process. Another area 

of application is information based encoding and compression; extracted feature in- 

formation could for example be used for low-bandwith video chats where only the 

movement information has to be transmitted. In terms of model acquisition and an- 

imation, feature points are needed for moving the character’s facial features. Facial 

movements may also provide a basis for alternative user interfaces that, for example, 

allow handicapped people to operate computers with facial expressions. 

This project started because there is no freely available, undisclosed, and platform 

independent program that is able to track feature points out of an input video. Various 

papers deal with the tracking of facial movements, using different techniques to localize 

and follow facial features. These works describe their approach in a more or less trans- 

parent way, they mostly mention that they implemented an example prototype, but 

they do not provide implementation details or even source code. We assume that they 

mainly use C or C++ as programming languages, at least no paper was found that 

explicitly uses Java for feature tracking. Despite the number of available approaches, 

only one solution, the commercial product VeeAnimator, which was developed for 3D 

model animation, currently solves the facial movement tracking task without the need 

of physical markers on the tracked face. In contrast to this application, we wanted 

to implement a free feature tracker, with the aim to be based on a comprehensible 

tracking method, to function on different platforms, and with disclosed sources and 

documentation. It should help to understand tracking algorithms, and evaluate the 

feasibility to implement a feature tracker in Java. 

1

Introduction 

In this paper, we first investigate and evaluate various movement tracking algorithms 

according to cost factors and their comprehensibility and practicability. Existing track- 

ing algorithms work with a range of different approaches. Techniques based on active 

contours can track deformable objects. However, they require manual initialization, 

and, due to their complexity, implementation as well as computation times may be too 

high. Local approaches, such as optical flow, surpass active contours concerning their 

complexity and computation times. Still, they have some limitations, as they may 

for instance compute more pixels than strictly necessary. Other algorithms designed 

for basic shape tracking, like the moment-based approach by Rocha et al. [2002], may 

not be applicable for facial movement tracking because they require clear, not frayed 

objects in the input images. 

Besides the comprehensibility and practicability of the algorithms, we also evaluate 

the quality of the obtained output. The extracted feature points may either follow 

defined standards, or they may not be consistent and well-positioned enough to fit into 

these norms. Figure 0.1 shows a possible tracking output, where the feature points 

are valid in the respect that they hold essential positions on the contour of the mouth. 

However, one feature point cannot permanently be defined as a certain facial feature 

point (the right corner of the mouth, for example), since its position may not be close 

enough to a predefined feature location, or the point’s position may unexpectedly 

change in subsequent video frames. 

(a) (b) 

Figure 0.1.: Not standardized output of facial feature localization. The tracked points 

(a) may not correlate with standardized points like the MPEG-4 Facial 

Animation Parameters (FAP) (b) [Antunes Abrantes and Pereira, 1999]. 

2

Introduction 

For evaluation and decision making, we used modified versions of existing Java code 

that implement the algorithms in consideration. We then selected one algorithm that 

scored well in the evaluation process to be translated into Java for the usage in the 

final program. The chosen approach is based on a shape tracking algorithm by Rocha 

et al. [2002]. It uses image moment calculations and a ’Binary Space Partitioning’ data 

structure (described in Section 1.3.3) in order to find object positions and orientations. 

The algorithm is straightforward and shows small computation times. However, the 

output data may not be applicable to standards. 

The quality and the condition of the input data are very important factors for the 

algorithm to function and produce good results. To enhance these factors, we define 

certain prerequisites, select an appropriate Java library for the technical realization, 

and describe the preparation of the data. We therefore use the Java Media Frame- 

work (JMF) for video frame acquisition, and a Java2D based Canny edge detection 

mechanism for the data preprocessing. 

The resulting Java program is able to execute simple tracking tasks on preselected 

facial regions. For convenience, the users can perform the preselection, and control the 

video flow using a Graphical User Interface (GUI). Because this work only deals with 

the basic tracking process and leaves out important procedures (like the determination 

of face position and orientation), the problem area was reduced and special prerequi- 

sites were added (as described in Section 3.2). The architecture of the Java tracker has 

a modular design with 3 exchangeable components: the part responsible for the video 

frame acquisition, the preprocessing mechanism and the tracking implementation (see 

Section 4.2). 

The evaluation of the implemented application shows that the bottleneck in terms 

of computation times is the currently used Canny preprocessing technique. Investiga- 

tions on two exemplary feature points, the corners of the moth, demonstrate that the 

moment-based tracking algorithm is producing results similar to manually ascertained 

points. Variations are mainly caused by ragged edges in the preprocessed images. 

3

This paper is split up into five chapters: 

Introduction 

Chapter 1 informs about the basic tracking process and the current state of research 

on face localization and movement tracking. It gives an overview of commercial solu- 

tions and established algorithms in that field and makes a comparison of their range 

of application, their strengths and weaknesses. 

Chapter 2 goes into detail on preselected algorithms and analyzes the choice of the 

moment-based approach to be implemented in the final program. It describes sample 

implementations that we used for decision making, and reflects the test runs that we 

made in order to come to a decision. 

Chapter 3 defines the required input data format for the program, the prerequisites 

and the preparation of videos for the tracking process. We compare techniques to 

read videos and split them up into single images, and we select the most appropriate 

option for our aims. For that purpose, we define video information constraints, like a 

convenient and constant position and orientation of the face in all video frames. After 

having the right input data, with the face in the right place, we need to transform the 

images into edge images to be applicable to the tracking algorithm. 

Chapter 4 describes the code development of the feature tracker. It illustrates the 

program architecture with its general structure and the basic application workflow. In 

that context, we also describe the implementation of the tracking algorithm in detail. 

In the second part of this chapter, we briefly outline the working environment, and 

state difficulties that arouse during the implementation process. 

Chapter 5 shows the achieved results of the Java feature tracker and evaluates them 

according to the correctness of 2 calculated feature points, the corners of the mouth, 

and the required preprocessing and calculation time. 

4

1. State of the Art 

1.1. Overview 

In order to select the most appropriate facial feature tracking method for the Java 

implementation, we inspect the basic tracking process and look at the way it is imple- 

mented by different algorithms. We group these algorithms into 3 categories: optical 

flow techniques, active contour models (snakes), and moment-based shape tracking. 

For every group, we first state general definitions and properties in order to explain 

the mode of operation. We then give examples of approaches that use an algorithm of 

this group for facial feature tracking, and evaluate their feasibility for the Java feature 

tracker. In this context, we also introduce two commercial solutions, which unfortu- 

nately do not provide technical background information. 

Facial feature tracking is not to be mistaken with face tracking, which aims to locate 

the complete face inside a video sequence, characterizing its position and orientation, 

but often not evaluating further details inside the tracked face. It may however be 

a preparatory step for, or mixed with, facial feature tracking. Several authors have 

dealt with the face tracking problem ([Krüger et al., 2000; Sahbi and Boujemaa, 2002; 

Wu et al., 1999]). 

Within the field of computer vision, “recognition of the facial expressions is a very 

complex and interesting subject where there have been numerous research efforts” 

[Goto et al., 1999]. Most of these works are based on a similar process workflow, with 

videos and/or video streams as input data (see Section 1.2). However, they differ in 

their complexity, the initialization strategy and output quality, and in the way how 

they deduct facial feature points and movements. As it is the goal to find a straight- 

forward tracking algorithm, we examine existing algorithms according to cost factors, 

their comprehensibility and practicability. One group of techniques, based on snakes, 

can track deformable objects, but requires manual initialization and is computationally 

5


expensive. It may also require extensive training and implementation periods. Other 

approaches, such as optical flow, surpass more complex methods concerning compu- 

tation speed, but “provide a low-level problem characterization and suffer from some 

drawbacks” [Rocha et al., 2002]: They might, for example, “integrate more pixel than 

strictly necessary” [Dellaert and Collins, 1999]. Basic shape tracking methods may 

not be applicable for facial movement tracking because of their need of clear, singular 

objects in the input images. All examined works provide a theoretical and mathe- 

matical description of the developed algorithm, but they lack an illustration of the 

programmatic implementation, flow diagrams, or code snippets. The commercial so- 

lutions provide very little technical background information, the tracking methodology 

is not disclosed. We found two products that are able to track facial features, only 

one of them solves this task without the need of physical markers on the tracked face. 

1.2. Basic Tracking Process 

Most of the work in facial feature tracking is based on a similar basic process workflow. 

As illustrated in Figure 1.1, the process works on input data from standard video or 

web cameras. This data may underlie some constraints, such as the distance between 

the tracked face and the camera, or lightning conditions (described in Section 3.2 for 

the Java feature tracker). The core tracking procedure consists of a number of steps. 

Most feature trackers have a preprocessing step, where the video data is prepared to 

be applicable to the tracking algorithm. The image sequences may be smoothed, con- 

verted into a different color model, or object details may be accentuated (see Section 

3.3). As facial feature tracking methods mostly assume a certain size or orientation 

of the tracked objects, the face has to be detected, located, and probably transformed. 

Due to time constraints, this project does not deal with face localization and bridges 

this gap with input data prerequisites. Having the preprocessed facial data in the 

right position in the image sequence, the feature tracking algorithm can set to work. 

Some algorithms, like the motion-based Java tracker or Snake-based methods, require 

pre-selection of feature points or feature regions. In case the implemented Java pro- 

gram, this step could be automated in further development steps. After the tracking 

procedure, all methods provide more or less standardized facial parameters, 2D or 

3D, depending on the tracker application. These parameters may then be used for 

facial model animation, or for high level face processing tasks such as facial expression 

recognition, face classification or face identification. 

6


Figure 1.1.: Basic facial feature tracking workflow. All non-grey elements are part of 

the implemented Java feature tracker. 

Gorodnichy [2003] has illustrated a similar “hierarchy of face processing tasks”. He 

does not state input and output data, but goes into detail with the face localization 

as a preliminary step, and he lists a range of higher level tasks. Facial feature tracking 

is not mentioned in his his illustration, he seems to include this procedure in a step 

called “Face Localization (precise)”. 

In the following section, we describe a set of well-established tracking methodologies 

and their practice in the illustrated facial feature tracking workflow. 

7

1.3. Algorithms 

1.3.1. Optical Flow Techniques 

Definitions and Properties 


Optical flow is a concept for considering the motion of objects within a visual represen- 

tation, where the motion is typically represented as vectors originating or terminating 

at pixels in a digital image sequence. Every pixel in an optical flow image is repre- 

sented by a motion vector that indicates the direction and the intensity of motion in 

this point. The work of Beauchemin and Barron [1995] extensively describes optical 

flow techniques. Figure 1.2, taken from their work, illustrates the computation of 

optical flow. 

(a) (b) 

Figure 1.2.: One frame of an image sequence (a) and its optical flow (b) Beauchemin 

and Barron [1995]. 

An optical flow algorithm “estimates the 2D flow field from image intensities”[Cutler 

and Turky, 1998]. In the survey of Cédras and Shah [1995], the methods are di- 

vided into four classes: differential methods, region-based matching, energy-based, 

and phase-based techniques: 

“Differential methods compute the velocity from spatiotemporal derivates 

of image intensity. Methods for the computation of first order and sec- 

ond order derivates were devised, although estimates from second order 

approaches are usually poor and sparse. In region-based matching, the 

velocity is defined as the shift yielding the best fit between image regions, 

according to some similarity or distance measure. 

8


Energy-based (or frequency-based) methods compute optical flow using 

the output from the energy of velocity-tuned filters in the Fourier domain, 

while phase-based methods define velocity in terms of the phase behavior 

of band-pass filter output, for example the zero crossing techniques.” 

Optical Flow for Feature Tracking 

Cohn et al. An example of how optical flow techniques are used in facial feature 

tracking is described by Cohn et al. [1998]. In their work, they manually select feature 

points in the first frame. Each of these points is then defined as the center of a 13x13 

pixel flow window. The position of all feature points is normalized by automatically 

mapping them to a standard face model based on three facial feature points: the 

medial canthi of both eyes and the uppermost point of the philtrum (see Figure 1.3). 

Figure 1.3.: Standard face model according to Cohn et al. [1998]. Medial canthus: 

inner corner of the eye, philtrum: vertical groove in the upper lip. 

A hierarchical optical flow method is used to automatically track feature points in 

the image sequence. The displacement of each feature point is calculated by subtract- 

ing its normalized position in the first frame from its current normalized position. 

The resulting flow vectors are concatenated to produce a 12-dimensional displacement 

vector in the brow region, a 16-dimensional displacement vector in the eye region, a 

12-dimensional displacement vector in the nose region, and a 20-dimensional vector in 

the mouth region (see Figure 1.4). The technique is based on the Facial Action Coding 

System (FACS), a widespread method for measuring and describing facial behaviors 

developed by Ekman and Friesen [1978] in the 1970s. Facial activities are described in 

terms of a set of small, basic actions, each called an Action Unit (AU). The AUs are 

based on the anatomy of the face and occur as the result of one or more muscle actions. 

9


Figure 1.4.: Feature point displacements. Change from neutral expression (AU 0) 

to brow raise, eye widening, and mouth stretched wide open (AU 

1+2+5+27). Lines trailing from the feature points represent replacement 

vectors due to expression Cohn et al. [1998]. 

Essa and Pentland The work of Essa and Pentland [1997] describes another facial fea- 

ture tracking method based on optical flow. They base their work on a self-developed, 

“extending FACS”, encoding system. They analyzed image sequences of facial expres- 

sions and probabilistically characterizing the facial muscle activation associated with 

each expression. This is achieved using a detailed physics-based dynamic model of the 

skin and muscles coupled with optical flow in a feedback controlled framework. They 

call this analysis control-theoretic approach, which produces muscle-based representa- 

tions of facial motion (Figure 1.5 shows an example). 

(a) (b) 

Figure 1.5.: A motion field for the expression of smile from optical flow computation 

(a) mapped to a face model using the control-theoretic approach (b) [Essa 

and Pentland, 1997]. 

10

Evaluation 


The approach by Cohn et al. uses the FACS for feature tracking. This system has 

been widely used for controlling computer animation, but was not intentionally devel- 

oped for this purpose. The intended goal was to “create a reliable means for skilled 

human scorers to determine the category or categories in which to fit each facial be- 

havior” (http://face-and-emotion.com/dataface/facs/description.jsp). Essa 

and Pentland [1997] state in their work, that 

“it is widely recognized that the lack of temporal and detailed spatial in- 

formation (both local and global) is a significant limitation to the FACS 

model. [...] Additionally, the heuristic ‘dictionary’ of facial actions origi- 

nally developed for FACS-based coding of emotion has proven to be difficult 

to adapt to machine recognition of facial expression”. 

The results of the method show that the accuracy is between 83% and 92% compared to 

previous tests and results of human testers, depending on the region. The authors find 

one reason for the lack of 100% agreement is “the inherent subjectivity of human FACS 

coding, which attenuates the reliability of human FACS codes” [Cohn et al., 1998]. 

Two other possible reasons were the “restricted number of optical flow feature windows 

and the reliance on a single computer vision method”. In contrast to this approach 

by Cohn et al., the solution by Essa and Pentland specifically deals with the facial 

expression recognition. The work describes a complete tracking framework, which 

includes a physics-based dynamic model for skin and muscles-description, something 

that is not intended for the Java tracker. Both algorithms do not mention complexity 

or computation time for the tracking process. 

11

1.3.2. Active Contours (Snakes) 



Active contour models, commonly called snakes, are energy-minimizing curves that 

deform to fit image features. Snakes, first introduced by Kass et al. [1988], “lock on to 

nearby minima in the potential energy generated by processing an image. (This energy 

is minimized by iterative gradient descent [...]) In addition, internal (smoothing) forces 

produce tension and stiffness that constrain the behavior of the models; external forces 

may be specified by a supervising process or a human user” [Ivins and Porrill, 1993]. 

Figure 1.6 shows the basic functionality of a closed snake. 

Figure 1.6.: A closed snake. The snake’s ends are joined so that it forms a closed 

loop. Over a series of time steps the snake moves into alignment with the 

nearest salient feature [Ivins and Porrill, 1993]. 

Snakes are applied to a range of different image processing problems. They sup- 

port the detection of lines and edges, but can also be used for stereo matching or for 

segmenting image sequences. Snakes have often been used in medical research appli- 

cations, and motion tracking systems use them to model moving objects. The main 

limitations of the models are that they “usually only incorporate edge information 

(ignoring other image characteristics) possibly combined with some prior expectation 

of shape; and that they must be initialized close to the feature of interest if they are 

to avoid being trapped by other local minima”[Ivins and Porrill, 1993] 1 . 

1 An overview of John Ivins’ publications about snakes is available at http://www.computing.edu. 

au/~jim/snakes.html 

12


A snake (V ) is an ordered collection of n points in the image plane: 

V = {vi, . . . , vn} (1.1) 

vi = (xi, yi), i = {i, . . . , n} 

The points in the contour iteratively approach the boundary of an object through the 

solution of an energy minimizing problem. For each point in the neighborhood of vi, 

an energy term is computed 

Ei = αEint(vi) + βEext(vi) (1.2) 

where Eint(vi) is an energy function dependent on the shape of the contour, and 

Eext(vi) is an energy function dependent on the image properties, such as the gradient, 

near point vi. α and β are constants providing the relative weighting of the energy 

terms. Ei, Eint, and Eext are calculated using matrices. The value at the center of each 

matrix corresponds to the contour energy at point vi. Other values in the matrices 

correspond (spatially) to the energy at each point in the neighborhood of vi. Each 

point vi is moved to the point v ′ i , corresponding to the location of the minimum value 

in Ei. This process is illustrated in Figure 1.7. If the energy functions are chosen 

correctly, the contour V should approach the object boundary and stop when done so. 

Figure 1.7.: An example of the movement of a point vi in a snake. The point vi is 

the location of minimum energy due to a large gradient at that point 

[Mackiewich, 1995]. 

13

Snakes for Feature Tracking 


The work of Terzopoulos and Waters [1993] describes a hybrid method, where shape 

models and snakes are taking part in the tracking process. Face models are set up, 

which are then tracked by snakes. The approach incorporates many complex proce- 

dures, described by the authors as following: 

“An approach to the analysis of dynamic facial images for the purposes 

of estimating and resynthesizing dynamic facial expressions is presented. 

The approach exploits a sophisticated generative model of the human face 

originally developed for realistic facial animation. The face model which 

may be simulated and rendered at interactive rates on a graphics work- 

station, incorporates a physics-based synthetic facial tissue and a set of 

anatomically motivated facial muscle actuators. The estimation of dynam- 

ical facial muscle contractions from video sequences of expressive human 

faces is considered. An estimation technique that uses deformable contour 

models (snakes) to track the nonrigid motions of facial features in video 

images is developed. The technique estimates muscle actuator controls 

with sufficient accuracy to permit the face model to resynthesize transient 

expressions.” 

Figure 1.8 illustrates how snakes are used in this work. 

(a) (b) 

Figure 1.8.: Snakes and fiducial points used for muscle contraction estimation: neutral 

expression (a) and surprise expression (b) 

14

Evaluation 


Snakes are mostly used in combination with other methods, as they require pre- 

initialization close to the feature of interest. A big disadvantage of the snake algorithm 

is that it is easily mislead if the edge is uncontinuous. Xie and Mirmehdi [2003] call 

this characteristic weak edge: 

“Despite their significant advantages, geometric snakes only use local in- 

formation and suffer from sensitivity to local minima. Hence, they are 

attracted to noisy pixels and also fail to recognize weaker edges for lack 

of a better global view of the image. The constant flow term can speed 

up convergence and push the snake into concavities easily when gradient 

values at object boundaries are large. But when the object boundary is 

indistinct or has gaps, it can also force the snake to pass through the 

boundary.” 

They developed an improved edge algorithm, called RAGS, that is able to undergo 

this problem. It works with “extra diffused region force which delivers useful global 

information about the object boundary and helps prevent the snake from stepping 

through”[Xie and Mirmehdi, 2003]. Figure 1.9 shows improvement with RAGS. 

(a) (b) 

Figure 1.9.: Weak-edge leakage. A regular snake leaks out of a weak edge (a); RAGS 

snake converges properly using its extra region force (b). 

Snakes have a great potential to work well in a tracking environment. However, the 

weak-edge leakage problem and the complexity of the algorithm argues against the use 

of snakes. 

15

1.3.3. Image Moments 



In order to define its basic position, size, and orientation, a binary or greyscale image 

object can be approximated by a best-fitting ellipse. This ellipse is defined by the 

centroid, major and minor axis, and the angle of the major axis with the x-axis. 

These values are calculated using image moment functions. Figure 1.10 shows example 

moment calculations for a binary image object (the black pixels in the illustration). a, 

b and θ, and the resulting ellipse are illustrated in the image. The following paragraphs 

derive and explain the functions necessary for the calculation of the best-fitting ellipse. 

m00 = 5 

m10 = 15, m01 = 15 

m20 = 49, m02 = 47, m11 = 43 

c = (3, 3) 

θ = 31.7 ◦ , a = 1.84, b = 0.70 

Figure 1.10.: Example for moment calculations and shape representation. 

The image ellipse is represented by the semi-major axis a, the semi-minor 

axis b and the orientation angle θ. 

General Moment Definition A grayscale image can be seen as a two-dimensional 

density distribution function, written in the form of f(x, y), where the function value 

represents the intensity of a pixel at the position (x, y). A general definition of two- 

dimensional (p + q) order moments is then given by the following equation: 

Φpq = 

�∞ 

�∞ 

−∞ −∞ 

Ψpq(x, y) f(x, y) dx dy p, q = 0, 1, 2, 3... (1.3) 

where Ψpq is a continuous function of (x, y), known as the moment weighting kernel 

or the basis set. The indices p, q usually denote the degrees of the coordinates (x, y), 

as defined inside the function Ψ. For example, a zeroth order moment is given by 

16


p = 0 and q = 0. Applied to an image, the intensity function f(x, y) is bounded, 

and therefore the integrals in equation 1.3 are finite. In consequence, the general 

two-dimensional moment function can also be written in the form 

�� 

Φpq = Ψpq(x, y) f(x, y) dx dy p, q = 0, 1, 2, 3... (1.4) 

ζ 

where ζ represents the image region, that is the number of foreground pixels in the 

image. Detailed moment function descriptions can be found in the book “Moment 

Functions in Image Analysis” [Mukundan and Ramakrishnan, 1998]. 

Geometric Moments “Geometric moments are the simplest among moment func- 

tions, with the kernel function defined as a product of the pixel coordinates.” [Mukun- 

dan and Ramakrishnan, 1998, p. 9]. Compared with more complex weighting kernels, 

geometric moments are easy to perform and implement. They are also called Carte- 

sian moments, or regular moments. Equation 1.5 shows the two-dimensional geometric 

moment function, referred to as mpq. 

�� 

mpq = 

ζ 

x p y q f(x, y) dx dy p, q = 0, 1, 2, 3... (1.5) 

In this equation, the basis set is defined as x p y q (compare to equation 1.3). 

As the number of values in the image region is discrete and finite, the integral can 

be replaced by a summation to make it easier to compute. The equation can then be 

written as 

mpq = � 

A 

x p y q f(x, y) dx dy p, q = 0, 1, 2, 3... (1.6) 

where A is the number of pixels in the image region. 

Moments that are calculated from a binary (or silhouette) image are called silhouette 

moments. The pixels of a binary image can only adopt the values 0 and 1. If a pixel 

is part of an image region, it is set to 1. If it belongs to the background, its value is 

0. For silhouette moments, the image region ζ only contains the pixels with value 1, 

17


and the equation can be written in the form 

�� 

mpq = 

Shape Representation Using Moments 

ζ 

x p y q dx dy p, q = 0, 1, 2, 3... (1.7) 

A set of low order moments can be used to describe the shape of image regions. 

Geometrical properties like the image area, the center of mass and the orientation 

can be defined by using moments of zeroth, first and second order. The moment 

of zeroth order (m00) represents the total intensity of an image. If the image is 

binary, m00 represents the image area, that is the number of foreground pixels. The 

intensity centroid can be calculated by combining first order moments m10, m01 with 

the moment of order zero. The first order moments “provide the intensity moment 

about the y-axis and x-axis of the image” [Mukundan and Ramakrishnan, 1998, p. 

12]. For example, m10 on a silhouette image sums up all the x-coordinates of the 

image region. The centroid c = (xc, yc) is given by 

xc = m10 

m00 

, yc = m01 

. (1.8) 

For a silhouette image, c represents the geometrical center of the image region, also 

called the center of mass. 

Central moments shift the reference system to the centroid to make the moment 

m00 

calculations independent of the image area position. They are defined as 

�� 

µpq = 

ζ 

(x − xc) p (y − yc) q f(x, y) dx dy p, q = 0, 1, 2, 3... (1.9) 

As the image region remains unchanged during the transformation and the pixel co- 

ordinates are in equal shares on both sides of the reference system, we have 

µ00 = m00; µ10 = µ01 = 0. (1.10) 

According to equation 1.9, the image area is traversed twice for central moment cal- 

culations, as the centroid is determined before µpq can be calculated. The work of 

18


Rocha et al. [2002] avoids the double traversation. It uses the following equations for 

the calculation of the second order central moments: 

µ20 = m20 

− x 

m00 

2 c 

µ11 = m11 

− xcyc 

m00 

µ02 = m02 

− y 

m00 

2 c 

(1.11) 

(1.12) 

(1.13) 

The second order moments are “a measure of variance of the image intensity distri- 

bution about the origin. The central moments µ20, µ02 give the variances about the 

mean (centroid). The covariance is given by µ11.” [Mukundan and Ramakrishnan, 

1998, p. 12]. The second order central moments can also be seen as moments of inertia 

with the coordinate axes moved to have the intensity centroid as their origin. If these 

so-called principal axes of inertia are used as the reference system, they make the 

product of inertia component (µ11) vanish. The moments of inertia (µ20, µ02) of the 

image about this reference system are then called the principal moments of inertia. 

We can use these moments to provide useful descriptors of shape. The work of Morse 

[2004] gives a good description of these techniques: 

“Suppose that for a binary shape we let the pixels outside the shape have 

value 0 and the pixels inside the shape value 1. The moments µ20 and 

µ02 are thus the variances of x and y respectively. The moment µ11 is the 

covariance between x and y [...] . You can use the covariance to determine 

the orientation of the shape.” 

The covariance matrix C is 

C = 

� 

µ20 µ11 

µ11 µ02 

� 

(1.14) 

By finding the eigenvalues and eigenvectors of C and looking at the ratio of the eigen- 

value, we can determine the eccentricity, or elongation, of the shape. The direction 

of elongation can then be derived using the direction of the eigenvector whose corre- 

sponding eigenvalue has the largest absolute value. 

19

The eigenvalues of C are defined as 


I1 = (µ20 

� 

+ µ02) + (µ20 − µ02) 2 + 4µ 2 11 

� 

2 

I2 = (µ20 + µ02) − 

(µ20 − µ02) 2 + 4µ 2 11 

The semi-major axis a and the semi-minor axis b can then be calculated as 

2 

(1.15) 

a = � 3 ∗ I1; b = � 3 ∗ I2. (1.16) 

These axis-calculations are derived from the paper by Rocha et al. [2002]. Other au- 

thors described a and b differently ([Mukundan and Ramakrishnan, 1998, p. 14],[Sonka 

et al., 1999, p. 258]). During the implementation phase, testing results were most ap- 

propriate with the usage of the stated formulas. 

The orientation angle θ of one of the principal axis of inertia with the x-axis is given 

by 

Image Moments for Feature Tracking 

θ = 1 

2 tan−1 

� � 

2µ11 

. (1.17) 

µ20 − µ02 

The work of Rocha et al. [2002] introduces a moment-based object tracking method 

where the object in the binary image is approximated by best-fitting ellipses. Binary 

Space Partitioning (BSP), a method for recursively subdividing a space into convex sets 

by hyperplanes, is used for the approximation. Each node of the BSP tree represents 

a part of the image object, described by its best-fitting ellipse. 

(a) (b) (c) 

Figure 1.11.: Object fitting by 2 k ellipses at each level. Construction of the BSP tree 

at level 0 (a), level 1 (b), and the result of level 3 (c) [Rocha et al., 2002]. 

20


As illustrated in Figure 1.11, the algorithm starts by calculating the ellipse of the 

root node (level 0). Then, the image region is divided along the minor axis, and the 

child nodes are created, each incorporating the pixels on one side of the splitting axis. 

This subdivision is repeated until a certain predefined tree depth is reached where the 

ellipses sufficiently approximate the image shape (see (c) in Figure 1.11). 

The approach by Rocha et al. [2002] was designed for basic shape tracking purposes, 

with only one simple object on the image region. It is not yet used and evaluated for 

more complex tasks, such as a facial feature tracking. As stated by the authors, 

“problems that we did not address in this paper are occlusion, tracking of multiple 

objects and motion discontinuities. Future work will go in these directions”[Rocha 

et al., 2002]. We did not find any further papers that base their work on this moment 

tracking algorithm. 

Evaluation 

Despite its simple approach, the proposed ellipse approximation method of Rocha 

et al. [2002] surprises with the quality of the achieved results. The paper is described 

in a very legible way, and the results are illustrated graphically. Therefore the work 

presages a straightforward implementation. The algorithm is not yet tested on multiple 

objects, but with an appropriate region selection on the preprocessed face images, we 

can simplify the object structures in order to make them applicable to the tracking 

procedure. The paper does not state processing times, but the design of the algorithm 

permits to expect short operating times. 

21

1.4. Commercial Implementations 

Overview 


The number of facial movement tracking software on the market is still very limited. 

During inquiry, we have found two products. X-IST FaceTracker by the German com- 

pany noDNA (http://www.nodna.com/FaceTracker.26.0.html) and VeeAnimator 

by Vidiator Technology (USA, http://www.vidiator.com/facestation.php). Both 

of them keep the technical specification short and do not provide information on what 

tracking methods and algorithms have been used. 

The basic operating sequence is the same for the two systems, even though they differ 

in some key factors. Both of them take video streams as input data, are able to process 

and transfer in realtime, and provide data for proprietary 3D animation software. 

However, only VeeAnimator can operate without physical markers on the tracked 

person. They also differ in the scope of supply, hardware requirements and integration 

with proprietary 3D animation software, where the German product is ahead. Still 

surprising is the fact that VeeAnimator, which gets by without any physical markers, 

is about a fourth the price of the X-IST FaceTracker. 

X-IST FaceTracker 

The X-IST FaceTracker is characterized by a head-mounted video camera, required 

facial markers and lighting conditions, and the support for range of different 3D an- 

imation formats. In contrast to the VeeAnimator, X-IST FaceTracker uses its own 

proprietary headset for video recording (see Figure 1.12). The camera on this headset 

is near infrared sensitive, with PAL or NTSC video output and adjustable camera 

focus. It has a near infrared dimmable light source built into. Currently, X-IST works 

on Microsoft Windows 2000, it will be available for Windows XP Professional in future. 

22


It works with infrared reflective markers on the face of the tracked person, which 

are then recognized by the tracking software. To detect these markers correctly, the 

studio environment has to be kept in fluorescent (cold) light, without daylight or 

other warm light sources such as halogene or light bulbs. It provides drivers for 3D 

animation programs (Alias Mocap, Famous3D, 3ds Max, FBX), a Portable Control 

Unit (PCU) and a Software Development Kit (SDK) for 3rd party integration. The 

package with the headset system and the provided software costs e 6.999, without 

required additional hardware and drivers. 

Figure 1.12.: The X-IST FaceTracker. With the provided headset (on the left) it is 

possible to create facial animations. 

VeeAnimator (formerly FaceStation) 

VeeAnimator stands out with the ability to track in realtime, without the use of 

physical markers and with standard hardware components, which makes the tracking 

process simple in execution. 

It is “a suite of software applications that allow you to animate heads and faces 

in Discreet’s 3ds max or Alias|Wavefront Maya” [vidiator, 2004] that uses a normal 

video camera. The camera does not have to be head mounted and, in contrast to 

the X-IST FaceTracker, whole head movements are recorded. The software places 22 

virtual markers at key positions on the face. The movement of these markers is then 

‘tracked’ from each video frame to the next to generate facial animation data. This 

data is used to animate a model in the 3D animation package. 

23


At any given video frame, the face is analyzed into a mixture of 16 different facial 

expression elements (including smile, frown, lip pucker, vowel sounds, raised eyebrows, 

closed eyelids). These facial expression elements can then be used for animation, for 

example to drive a set of morph targets with the defined expressions. The software 

additionally provides audio (speech) analysis tools that can be used to refine lip move- 

ments. The big advantage of VeeAnimator is that is does not need any additional 

hardware or special lighting. Soft diffused illumination on the actors face, from what- 

ever light source, is sufficient for the program to work satisfactorily. Figure 1.13 shows 

a tracking example with this software, taken from the VeeAnimator demonstration 

video 2 . 

(a) (b) 

Figure 1.13.: VeeAnimator in action. The tracked feature points (a), the real-life per- 

son (right) and its virtual reality equivalent during realtime tracking (b). 

VeeAnimator contains 4 parts: FaceLifter tracks prerecorded computer video files, 

FaceTracker does realtime tracking on video streams, FaceDriver is the 3ds Max or 

Maya plug-in component, and the Avatar Editor creates fully textured head models. 

Comparison 

Table 1.1 on page 25 gives a summarizing overview over the mentioned two programs. 

They differ in a lot of points, especially in prerequisites and the required hardware. 

Especially the comparison of the number of supported feature points of these two ap- 

plications is interesting. 

2 http://www.vidiator.com/demos/facestation/FSDemoFinal_small.wmv 

24

Components 


X-IST FaceTracker V 4.5 VeeAnimator 

Included Software package, headset 

system, cables and marker 

tape 

Software package 

Required PCI Framegrabber Card Ordinary digital video 

Optional Drivers/Converters for 3rd 

Requirements 

party animation software; 

PCU; SDK 

Software Windows 2000 

(XP in progress) 

camera, ‘Alias Maya 3D’ or 

‘Autodesk 3ds Max’ 

Windows 2000/XP; 

Maya 4.5 or 5.0 / 

3ds Max 4.26, 4.3, 5.0, 6.0 

Clock rate ≥800 MHz prerecorded: ≥700 MHz, 

Hardware 20 GB HD, 128 MB RAM, 

Specification 

2D Graphics Card XGA, 1 

PCI Slot 

Feature Points up to 36 (typically 15) 22 

Physical Markers Yes No 

Environment no daylight/warm light; 

fluorescent (cold) light only 

Tracking Rates 25/50 fps (PAL), 

30/60 fps (NTSC) 

realtime: ≥2.0 GHz 

200 MB HD, 

Maya 3D / 3ds Max 

requirements 

soft defused illumination 

30/60 fps (NTSC) 

Price e 6,999.00 �1,995.00 

Table 1.1.: Comparison of commercial products 

25

1.5. Summary 


Many researchers have already developed facial feature tracking algorithms, describing 

their work with different levels of detail. The examined approaches based on optical 

flow use the FACS, which suffers from major drawbacks because of the lack of spatial 

information. Other methods that work with snakes combine various tracking and loca- 

tion techniques and are hence more complex. Moreover, snakes can suffer the leaking 

edge problem, which worsens the result dramatically. The investigated image moment 

technique is straightforward, but may not be applicable to complex tracking tasks. No 

paper states information about the used programming language, and sources are not 

freely available on the Internet. We therefore assume that all works are implemented 

with a platform dependent language like C++, which may have advantages in the 

required processing time, but raises constraints in the portability and the ease of use. 

26

2. Algorithms in Consideration 

2.1. Overview 

In Chapter 1 we summarized different approaches for facial feature tracking. We 

showed that FACS-based methods have difficulties as they were not originally devel- 

oped for machine recognition. The motion-based approach surprised to be straight- 

forward and comprehensible. Other algorithms have been computationally expensive 

or seem not to be straightforward to implement. According to their description in pa- 

pers and their predicted practicability, we selected two tracking procedures for a closer 

examination: active contour models (snakes) and moment-based tracking. Both ap- 

proaches need manual or automated initialization in the first video frame. Snakes need 

an initial contour and therefore exact feature points for processing, the moment-based 

solution works on the complete picture, but needs initialization of feature regions be- 

cause it can only recognize single objects. The required preprocessing steps for the two 

methods are also similar. They both work on binary edge images, but will presumably 

produce better results on grayscale edge images, where the edge intensity varies and 

and therefore also weaker edges can be handled. The two algorithms mainly differ in 

their implementation complexity and processing time. This factor is investigated in 

this chapter. 

2.2. Testing Method 

2.2.1. Testing Tool 

For testing the practicability and performance of the algorithms in consideration, we 

have used and extended Java code examples which already implement the required 

functionality. These examples are programmed as plugins for the Java based image 

processing tool ImageJ. It is a public domain program, available at http://rsb.info. 

nih.gov/ij/. 

27


On the homepage, the program is described as following: 

“ImageJ is [...] inspired by NIH Image for the Macintosh. It runs, either as 

an online applet or as a downloadable application, on any computer with a 

Java 1.1 or later virtual machine. Downloadable distributions are available 

for Windows, Mac OS, Mac OS X and Linux. [...] 

ImageJ was designed with an open architecture that provides extensibil- 

ity via Java plugins. Custom acquisition, analysis and processing plugins 

can be developed using ImageJ’s built in editor and Java compiler. User- 

written plugins make it possible to solve almost any image processing or 

analysis problem.” 

At the time of inquiry, ImageJ was available in version 1.33, which had errors in 

working with Java 1.5 on Linux 1 and was therefore used with Java 1.4.2. The recent 

ImageJ version 1.34 works fine with Java 1.5. 

2.2.2. Input Data 

For the following tests we used a binary edge image of a human face. We therefore 

extracted a video frame that shows a face in neutral position in the middle of the im- 

age. This enables us to have clearly identifiable facial features, represented by edges, 

which eases the selection of feature region and therefore the correct comparison of the 

output. Moreover, we approximate the test situation to the conditions of the final 

Java tracking program. In order to transform the video frame into the correct format, 

we converted the color image into a grayscale image and processed it with a Canny 

edge detector. 

In the following sections we describe the results of the ImageJ feature tracking 

plugins. 

1 java.lang.NullPointerException is thrown during image window initialization. 

28

2.3. Testing Snake Algorithms 


We have found two ImageJ plugins that implement Snake algorithms, which both 

work on grayscale images: Jacob’s SplineSnake implementation, and the snake plugin 

by Boudier. 

SplineSnake The SplineSnake implementation of Jacob et al. [2004] allows to select 

any required image region by drawing a path onto the source image. Points on this 

path, which have a preset distance between each other, are called knots and are the 

initialization for the snake algorithm. Additionally, the user can specify constraint 

knots that have to be passed by the final snake. All adjustable parameters are de- 

scribed at http://ip.beckman.uiuc.edu/Software/SplineSnake/usage.html, the 

values in Table 2.1 are directly used by the Snake algorithm: 

Parameter Default 

Image energy: proper linear combination of gradient and region 

energies can result in better convergence. The right combina- 

tion depends on the image. 

“100% Region” 

Maximum number of iterations. 2000 

Size of one step during optimization. 2.0 

Accuracy to which the snake is optimized. 0.01 

Smoothing radius of the image smoothing procedure that is 

computed before running the snake algorithm. 

Spring weight: specifies how the constraint knots are weighted. 0.75 

Table 2.1.: SplineSnake parameters. 

For testing, we have drawn a nearly rectangular path around the mouth, with a 

knot distance of 5 pixels. During the testing process, we varied the step size and the 

number of iterations. Satisfying results were possible with a step size of 10, and 200 

iterations. With 50 iterations (as in (b) and (c)), we were not able to approximate the 

mouth contour close enough. A step size of more than 10 did not enhance the process. 

Results are illustrated in Figure 2.1. 

29 

1.5

(a) 

(c) 


Figure 2.1.: Overview of SplineSnake results. The initial selection (a), SplineSnake 

with step size 1.0 and 50 iterations (b), step size 10.0 and 50 iterations 

(c), and SplineSnake with step size 10.0 and 200 iterations. 

SplineSnake cannot omit small sources of interference in its processing. A tracking 

example with distracting pixels is illustrated in Figure 2.2. 

Figure 2.2.: SplineSnake interference. The final snake (the inner red line) is not able to 

ignore single interfering pixels on the right side of the upper lip contour. 

The plugin delivers information about the processing time and the resulting snake 

knots. Table 2.2 shows a result of SplineSnake test cycles. For these results, we tested 

with different manually drawn mouth selections, 200 snake iterations and a step size 

of 10.0. Other values were not changed from default. The average processing time 

after 20 cycles was 2.42 seconds, with 26.3 resulting knots and 4.35 curve-describing 

samples per knot. 

30 

(b) 

(d)


Requiring a tracking program that is able to work close to realtime, these tracking 

times are not supportable. However, we have to notice that the tracking times of 

subsequent video frames could be reduced by initializing the snake with the parameters 

of the preceding frame. The initial snake would then be close to the final snake, and 

therefore less cycles (presumably < 10) have to be processed. 

no. knots samples/knot time 

1 29 4 2.280 

2 27 4 2.587 

3 29 4 3.846 

4 25 4 2.674 

5 32 3 4.133 

6 27 5 2.844 

7 24 5 3.684 

8 17 6 0.616 

9 23 5 1.353 

10 27 4 2.207 

11 25 4 1.579 

12 25 5 0.849 

13 25 3 2.027 

14 27 5 2.051 

15 25 4 1.543 

16 25 4 2.662 

17 22 4 1.233 

18 35 5 4.131 

19 29 4 3.263 

20 28 5 2.833 

26.3 4.35 2.420 

Table 2.2.: SplineSnake: Results 

31


Boudier Snake Plugin The second ImageJ plugin for snakes is written by Thomas 

Boudier. It is available at http://www.snv.jussieu.fr/~wboudier/softs/snake. 

html. For testing, we used the default parameters listed in Table 2.3. 

Parameter Value 

Gradient threshold 20 

Regularization 0.10 

Number of iterations 200 

Step result show 5 

Alpha-Canny-Deriche 1.00 

Table 2.3.: Bodier snake parameters 

For a comparison to the SplineSnake plugin, we chose to use a rectangular initial 

selection. As illustrated in Figure 2.3, the success of the snake procedure greatly 

depends on this initial selection. During testing, a change of the selection by one pixel 

resulted in extreme outgrowths of the resulting snake. 

1(a) 

2(a) 

Figure 2.3.: Overview of snake results. 1: a Selection of (231, 208, 59, 16) (a) delivers 

1(b) 

2(b) 

good results (b), 2: an enlargement of the region width by 1 pixel (a) has 

significant negative effects (b). 

Table 2.4 shows testing results that were made with this snake plugin. The values 

specified represent the rectangular selection on the edge image round the mouth region 

32


(position on x/y-axes, width and height). A checkmark in the last column indicates 

whether the result is satisfying (that is the snake bounds the mouth region), or leaked 

out over a big part of the displayed face. 

As the results show, the plugin delivers a successful result in only about one third of 

the testcases. In the last row we took the average of all selections as snake initialization 

values, which also lead to a negative outcome. If the region selection is closer to the 

mouth contour, for example with an elliptical selection, the algorithm works more 

reliably. 

no. x y w h result 

1 236 215 54 9 ✗ 

2 236 210 54 16 ✗ 

3 235 210 55 15 ✗ 

4 234 207 56 20 ✗ 

5 233 211 57 17 ✗ 

6 233 211 54 18 ✗ 

7 233 208 60 18 ✗ 

8 233 208 58 17 ✗ 

9 232 209 61 21 ✗ 

10 232 209 61 19 ✗ 

11 232 209 60 17 ✓ 

12 231 208 64 21 ✗ 

13 231 208 62 19 ✓ 

14 231 208 60 16 ✗ 

15 231 208 59 16 ✓ 

233 209 58 17 ✗ 

Table 2.4.: Snake: Results 

33

2.4. Testing Image Moments 


An ImageJ moment calculation implementation was found at http://rsb.info.nih. 

gov/ij/plugins/moments.html, which was apparently integrated into ImageJ ver- 

sion 1.34 2 . The plugin calculates image moments from rectangular image selections 

up to the 4th order, and calculates the elongation and orientation of objects. The 

implementation allows a mapping of image intensity values before the moments are 

calculated. For that purpose, it uses the equation 

pi,j = f ∗ (pi,j − c) (2.1) 

where pi,j is the intensity value of the pixel. Factor f and cutoff c can be specified 

manually in the user interface. This mapping allows the user to specify another back- 

ground color than black (by setting the cutoff accordingly), and to process images with 

a different color range (by changing the factor). The plugin provides tabular output 

of the moment calculations, the results are not illustrated in the image. The calcu- 

lations and the provided source code still give a good overview on how the moment 

calculations work. The implementation is straightforward and very comprehensible. 

It executes the following steps: 

Step 1: Compute moments of order 0 and 1. 

Step 2: Compute coordinates of the centroid. 

Step 3: Compute moments of orders 2, 3, and 4. 

Step 4: Normalize 2nd moments and compute the variance around the centroid. 

Step 5: Normalize 3rd and 4th order moments and compute the skewness (symmetry) 

and kurtosis (peakedness) around the centroid. 

Step 6: Compute orientation and eccentricity. 

Source: Awcock [Awcock, 1995, pp. 162–165] 

In the case of a moment-based facial feature tracker, moment calculations above the 

2nd order are not necessary. Step 5 and 6 can therefore be left out. Note that the 

image pixels have to be traversed twice (in step 1 and step 3), which increases the 

complexity of g(n) = (n) for a region with n foreground pixels by factor 2. 

2 Measurements in ImageJ (‘Analyze→Set Measurements...’ and ‘Analyze→Measure’) 

34


For testing purposes, we changed the plugin code so that it displays processing time 

information. This information shows calculation times between 10 ms and < 1 ms for a 

60x20 pixel selection. The more often the plugin is executed, the less calculation time 

is needed. JVM caching may be responsible for that behavior. This time information 

cannot be compared one-to-one to the data produced by the snake code. The plugin 

calculates moments of higher orders, which are not necessary for a feature tracker. 

Still, this procedure does not have an influence on the complexity of the algorithm, as 

no additional traversation of the image pixels is necessary. The complexity will change 

with the implementation of the BSP tree structure, as the moment information has 

to be calculated for every tree node. It is then g(n) = O(log(d) ∗ n) for a tree depth 

of d. Still, the processing times are far shorter than those of the snake plugin, and we 

assume that this will also be the case if Java tracker is based on moments. 

2.5. Summary 

We have shown that the performance and reliability of snake algorithms strongly de- 

pends on the adjustment of its parameters and the initialization of snake knots. The 

accurate selection of the image area and snake parameters has been problematic and 

challenging during the test phase. Small changes of region selections have caused in- 

comprehensible huge differences in the processing results. Calculation times of about 

2.5 seconds per execution seem to be too high for an application that aims to work 

close to realtime. The time could be reduced in subsequent frames by initializing the 

snake knots with the knots of the previous frame. Then the major execution time 

would only accrue in the first video frame. In contrast, moment calculations proved 

to be straightforward, fast and comprehensible. In addition to the results of our tests, 

the paper about tracking with moments calculation [Rocha et al., 2002] describes an 

exact course of action, which gives a clear path on how to proceed and therefore eases 

future work. For that reason we decided to implement a moment-based Java feature 

tracker. 

The next section describes the necessary prerequisites and preparations that have 

to be made, so that a moment-based tracking algorithm can work satisfactorily. 

35

3. Input Data and Its Preparation 

3.1. Overview 

In order to be applicable to the selected tracking algorithm, the input video has to 

be read and transformed into a proper format and quality. In this chapter, we divide 

this procedure into 2 steps. First, we read the input data and extract the video 

frames. Therefore, we specify prerequisites according to the used reading technique 

and the presumed video quality, and select sample input videos to be used in the 

development process. In the second step, we process the data in order to enhance 

the image features. We discuss the requirements of the tracking algorithm, and state 

how we meet these requirements with edge detection algorithms. Well-established 

algorithms are explained as well as the Java libraries that can facilitate this step. 

3.2. Prerequisites 

3.2.1. Data Format Prerequisites 

Media API Selection 

In order to specify the video format for the program, we have to know what Application 

Programming Interface (API) is used for the media handling. The API should be able 

to incorporate time-based media into the implemented Java application. It should be 

a platform independent and pure Java library to avoid Java Native Interface (JNI) for 

dumping into native media API’s. We investigated the following possibilities: 

� JMF, developed by Sun Microsystems and IBM 

� Quicktime for Java, developed by Apple 

� MPEG-4 Toolkit, developed by IBM 

Implementing a media import from scratch is not stated as an option, as the complexity 

of the task and the necessary implementation time do not meet the time constraints 

of the project. 

36


Java Media Framework The JMF 2.0 API is Sun Microsystem’s freely available 

API that enables the presentation of time-based media. It provides support for cap- 

turing and storing media data, controlling the type of processing, and performing 

custom processing on media data streams. In addition, JMF 2.0 defines a plug-in 

API that enables the programmer to customize and extend JMF functionality. The 

current JMF 2.1.1e Reference Implementation supports the media types and formats 

listed in Table 3.1 1 . The list of formats is limited and, due to the latest JMF release 

date in March 2001, does not contain currently well-established formats like MPEG-4; 

MPEG-1 is only supported in the platform specific performance packs. Therefore the 

pure Java version of JMF is not able to decode MPEG-1 videos. Moreover, differ- 

ent authors, for example Davison [2005], describe the framework as buggy. Search- 

ing for JMF on Sourceforge 2 only returns a handful of sparsely active projects that 

are dealing with JMF’s video functionality. However, sticking with JMF despite its 

limited collection of supported media formats and codecs is, according to Adamson 

(http://www.oreillynet.com/pub/wlg/2933), still the most practical all-java op- 

tion. 

The JMF API Guide [JMF] describes the basic working model as following: 

“A data source encapsulates the media stream much like a video tape and 

a player provides processing and control mechanisms similar to a VCR. 

Playing and capturing audio and video with JMF requires the appropriate 

input and output devices such as microphones, cameras, speakers, and 

monitors.” 

A lot of implementation examples are freely available for JMF, for example in Sun’s 

JMF Forum (http://forum.java.sun.com/forum.jspa?forumID=28). An ImageJ 

plugin for JMF is available at http://rsb.info.nih.gov/ij/plugins/jmf-player. 

html. 

1 found at http://java.sun.com/products/java-media/jmf/2.1.1/formats.html 

2 One of the largest collections of Open Source software: http://www.sourceforge.net 

37

Media Type Cross Platform 


Version 

Solaris/Linux 

Performance Pack 

Windows 

Performance Pack 

AVI (.avi) read/write read/write read/write 

Cinepak D D,E D 

MJPEG (422) D D,E D,E 

RGB D,E D,E D,E 

YUV D,E D,E D,E 

VCM 3 - - D,E 

HotMedia 

(.mvr) 

read only read only read only 

IBM HotMedia D D D 

MPEG-1 Video 

(.mpg) 

Multiplexed 

System stream 

Video-only 

stream 

QuickTime 

(.mov) 

- read only read only 

- D D 

- D D 

read only read only read only 

Cinepak D D,E D 

H.261 - D D 

H.263 D D,E D,E 

JPEG (420, 422, 

444) 

D D,E D,E 

RGB D,E D,E D,E 

D: format can be decoded and presented 

E: media stream can be encoded in the format 

read: media type can be used as input (read from a file) 

write: media type can be generated as output (written to a file) 

Table 3.1.: JMF 2.1.1 - Supported Video Formats 

3 VCM - Window’s Video Compression Manager support. Tested for these formats: IV41, IV51, 

VGPX, WINX, YV12, I263, CRAM, MPG4. 

38


Quicktime for Java QuickTime for Java (QTJava) brings together the QuickTime 

movie player and the Java programming language. As a result, it is possible for 

Java applications to play QuickTime movies, edit and create them, capture audio and 

video, and perform 2D and 3D animations. QTJava provides a basic set of function- 

ality across all platforms that support Java and QuickTime. It is currently in version 

6.4, which works with Java 1.4.1. “The previous version of QTJava supported J2SE 

1.4.1, but only on Windows” 4 . QTJava wins in the supported media types, as it can 

play all types supported by the current QuickTime version. These formats include 

MPEG-4, Flash 5, H.261, H.263, H.264, DV and DVC Pro NTSC, DV PAL and DVC 

Pro PAL. Iverson describes the media playback handling with QTJava in his book 

“Mac OS X for Java Geeks” [Iverson, 2003, p. 154]. He praises the “rich range of 

supported media types”, but claims the API to be “still relatively C-like”. 

QTJava consists of two layers 5 : 

� A core layer which provides the ability to access the complete QuickTime API 

� An application framework for easy integration into Java applications. It includes: 

1. Integration of QuickTime with the Java Runtime. This includes sharing 

display space between Java and QuickTime and sharing events from Java 

with QuickTime. 

2. A set of classes that simplifies the effort required to perform common tasks 

while providing an extensible framework that application developers can 

customize to meet their specific requirements. 

The Java method calls are claimed to provide very little overhead to the native call; 

they do parameter marshalling and check the result of the native call for any error 

conditions. The major limitation of QTJava is that QuickTime is only supported for 

Windows and Mac platforms. As this project claims to develop a platform-independent 

software that runs on all platforms that provide a Java Virtual Machine (JVM) 1.4.1 

or higher, this library cannot be used. Moreover, we want to omit that additional 

programs have to be installed in order to make the Java tracker work correctly. 

4 see http://developer.apple.com/quicktime/qtjava/ 

5 as described in an Apple developer article at http://developer.apple.com/quicktime/qtjava/overview.html 

39


IBM Toolkit for MPEG-4 The IBM Toolkit for MPEG-4 is currently in version 1.2.4, 

which is usable for Java 1.1 up to 1.5. It consists of a set of Java classes and APIs 

with five sample applications: three cross-platform playback applications, and two 

tools for generating MPEG-4 content for use with MPEG-4-compliant devices. These 

applications are the following: 

� AVgen: a simple, easy-to-use GUI tool for creating audio/video-only content for 

ISMA- or 3GPP-compliant devices 

� XMTBatch: a tool for creating rich MPEG-4 content beyond simple audio and 

video 

� M4Play: an MPEG-4 client playback application 

� M4Applet for ISMA: a Java player applet for ISMA-compliant content 

� M4Applet for HTTP: a Java applet for MPEG-4 content played back over HTTP. 

Since the toolkit is Java based, the client applications and the content creation appli- 

cations are cross-platform and will run on any Java-supporting platform. Its minimum 

requirement is a Java SDK with Swing, for higher performance and more capabilities 

SDK version 1.4 or above is recommended. More details can be found at the project 

homepage 6 . The major disadvantage of the IBM Toolkit is that is not freely avail- 

able. It is possible to download a 90 days trial license, commercial licenses cost from 

�500 upwards. Furthermore, it is limited to the playback of MPEG-4 movies, which 

decreases the range of possible input data. 

MPEG-4 Video for JMF The MPEG-4 Video for JMF is a freely available plug-in 

that enables decoding of MPEG-4 videos in Java, independent of the IBM Toolkit for 

MPEG-4. This plug-in allows for the decoding of MPEG-4 video streams, which are 

created with any encoder that supports the MPEG-4 Simple Profile. The decoder of 

MPEG-4 Video for JMF can be used on any JMF-enabled platform. In order to func- 

tion, it needs JMF 2.1.1 and all the JMF requirements. “The implementation is 100% 

pure Java and has undergone special optimizations to ensure adequate performance” 

(http://www.alphaworks.ibm.com/tech/mpeg-4). 

6 see http://www.alphaworks.ibm.com/tech/tk4mpeg4. Implementation demos are available at 

http://www.research.ibm.com/mpeg4/Demos/index.htm 

40


Selection The JMF was selected for the Java motion tracker, as it is the only freely 

available solution that works on a Java-enabled machine without further requirements. 

We are aware that data format and implementation problems could arouse due to the 

development status of the library. If MPEG-4 support is needed at a later point, the 

IBM-plugin can be used to enhance the JMF functionality. 

3.2.2. Video Quality 

Due to the selected tracking algorithm and missing preliminary tracking stages, the 

Java tracker has certain requirements to the input videos. For the desired quality of 

the tracking algorithm, it is necessary that the frame sequence is continuous, which 

should be given if the video is directly recorded with about 25 frames per second. 

The lighting conditions should be diffused soft light, frontal on the tracked face. As 

discussed in Section 1.2, the Java tracker leaves out the face tracking step. Therefore 

the face has to be in the middle of the image, the size and orientation have to stay 

almost constant. 

3.2.3. Video Samples 

For testing purposes, we used video material from the University of Tübingen (http: 

//vdb.kyb.tuebingen.mpg.de). On their homepage, they describe the setup for video 

recording: 

“The video cameras were arranged in a semi-circle around the subject at 

a distance of roughly 1.3m [as shown in Figure 3.1]. Each camera was 

centered on the subject and leveled. The cameras recorded 25 frames/sec 

in full PAL video resolution (786*576, non-interlaced). In order to facilitate 

the recovery of rigid head motion, the subject wore a headplate with 6 green 

foam markers attached to it. 

Figure 3.1.: Top view of camera layout used for recordings (taken from http://vdb. 

kyb.tuebingen.mpg.de). 

41


Each recording contains one isolated action unit, repeated three times, with 

a (close to) neutral expression in between. For each action unit, there are 

six video files (one for each camera). Each video file has identical length 

and starts at exactly the same time. The videos were converted from raw 

single chip CCD data to RGB using a Bayer filter, then encoded as MPEG1 

using mpeg2enc.” 

The chosen MPEG-1 format, which has a well-defined specification with little or no 

unsupported variations and few incompatibilities between encoders and decoders. It 

is freely distributable, and players are available on all platforms. According to our 

constraints, we used camera positions C and D, as they provide faces in an almost 

frontal view. 

3.3. Preparation 

3.3.1. Overview 

After we have decided on how to read videos and extract frames, we now have to 

prepare the data in a way that the picture information is applicable to the track- 

ing algorithm and robust against small perturbations in the input data. The feature 

extraction process should be stable against small changes in illumination, viewing 

direction, and deformations of the objects in the environment. Otherwise, if small 

changes in any of these quantities lead to large changes in the position of facial feature 

points, the interpretation these points would be difficult. 

As described in Section 1.3.3, the moment-based tracking approach works by deter- 

mining the figure’s position and orientation. Therefore the algorithm needs 

� a binary or grayscale image and 

� well-defined and possibly only sparsely ragged image areas that are silhouetted 

against a defined background. 

It has to to be assured that the input data satisfies these needs. We identified two 

types of applicable preprocessed images: filled image regions and unfilled edge images. 

Figure 3.2 shows an example of these two types on a mouth region. 

42


(a) (b) 

Figure 3.2.: Two types of binary image regions applicable for the tracking algorithm. 

A binarized mouth region, displayed as a filled figure (a) and with detected 

edges (b). 

Tests on these examples show that the moment-based calculation of the best fitting 

ellipse has similar results for both image types: The orientation angle θ differs by 0.1% 

between the filled region and the edge image. The ellipse axes a and b vary by 3.75 

and 0.77 pixels in a total area of 60x20 pixels. 

A disadvantage of filled regions is that the number of foreground pixels that have 

to be processed by a tracking program is considerably larger than in edge images. 

Moreover, the tracked centroids will be located in the middle of the lips, which makes 

it impossible to track the facial feature contours. In order to create the region images, 

thresholding could be used as a preparation technique. However, it has to be done 

differently for each image region, as, for example, the mouth region has a different 

color and hue distribution than eye regions. In contrast, an edge detection mechanism 

significantly reduces the number of foreground pixels, and the subsequent tracking 

algorithm can presumably locate points directly on the contour. The amount of data 

present in the edge map is reduced compared to the original image, which leads to a 

better performance of the overall system. Edge detection is the most common method 

for feature extraction in machine vision, the number of edge detection algorithms is 

enormous. Therefore we decided to use an edge detection mechanism for the prepro- 

cessing of the input frames. 

43

3.3.2. Edge Detection Algorithms 


Edge detectors are “used to locate changes in the intensity function; edges are pixels 

where this function (brightness) changes abruptly” [Sonka et al., 1999, p. 77].The 

purpose is to convert the large array of brightness values that comprise an image 

into a compact, symbolic code. The goal is to determine the location of brightness 

discontinuities in the image. In order to detect such brightness changes in the in- 

tensity function, the edge detector algorithms mostly approximate the first or second 

derivative of the image function (see Figure 3.3). 

Figure 3.3.: Function f(x) with intensity change, its first derivative f ′ (x), and second 

derivative f ′′ (x). 

We selectively inspected 4 edge detection algorithms, which are commonly used 

and promise satisfying results: Prewitt and Sobel operators, Laplacian of Gaussian, 

and the Canny edge detection. In the next sections, we briefly describe the different 

algorithms. 

Prewitt and Sobel Edge Detectors Prewitt and Sobel operators are using filters for 

the estimation of local gradients that approximate the first derivative. The gradient 

is estimated in 8 possible directions (for a 3x3 convolution mask). 

44


The first three masks for the Prewitt operator are 

h1 = 

⎡ 

⎢ 

⎣ 

⎤ 

⎡ 

⎤ 

⎡ ⎤ 

1 1 1 

0 1 1 

−1 0 1 

⎥ 

⎢ 

⎥ 

⎢ ⎥ 

0 0 0 ⎦ , h2 = ⎣−1 

0 1⎦ 

, h3 = ⎣−1 

0 1⎦ 

. (3.1) 

−1 −1 −1 

−1 −1 0 

Accordingly, the Sobel operators are defined as 

−1 −2 −1 

−1 0 1 

⎡ 

⎤ 

⎡ ⎤ 

⎡ ⎤ 

1 2 1 

0 1 2 

−1 0 1 

⎢ 

⎥ 

⎢ ⎥ 

⎢ ⎥ 

h1 = ⎣ 0 0 0 ⎦ , h2 = ⎣−1 

0 1⎦ 

, h3 = ⎣−2 

0 2⎦ 

. (3.2) 

−2 −1 0 

−1 0 1 

The other masks can be determined by rotating the matrices of equation 3.1 and 3.2. 

Sobel and Prewitt filters are very similar, Sobel puts more weight on the central row 

and column. Its simplicity and the good results make the Sobel operator a popular 

edge detection mechanism. The main disadvantage of the first derivative operators 

is “their dependence on the size of the object and sensitivity to noise” [Sonka et al., 

1999, p. 83]. 

Laplacian of Gaussian The Laplacian of Gaussian (LoG) is combining the Laplace 

convolution operator with a Gaussian smoothing. The Laplace operator is approxi- 

mating the second derivative, which only returns the gradient magnitude and not the 

direction. For 4-neighborhoods and 8-neighborhoods, the 3x3 masks are defined as 

⎡ ⎤ 

⎡ ⎤ 

0 1 0 

1 1 1 

⎢ ⎥ 

⎢ ⎥ 

h1 = ⎣1 

−4 1⎦ 

, h2 = ⎣1 

−8 1⎦ 

, (3.3) 

0 1 0 

1 1 1 

If the Laplace operator is used separately, it responds doubly to some edges in the 

image. Together with the Gaussian smoothing, it is able to retrieve good results. The 

advantage of this approach compared to classical first derivative edge operators is that 

a larger area surrounding the current pixel is taken into account. 

Canny Edge Detector Canny’s aim was to discover the optimal edge detection al- 

gorithm. His parameter definition for an optimal algorithm consists of 3 criteria: 

45


� good detection 

A low error rate. Occurring image edges are not dismissed by the algorithm. 

� good localization 

Well localized edges, being on the same position as the occurring edges. 

� minimal response 

One given edge is marked once, and image noise does not create false edges. 

The Canny operator works in a multi-stage process. First, the image is smoothed 

by Gaussian convolution. Then a simple 2D first derivative operator is applied to 

the smoothed image to create edges on regions of the image with high first spatial 

derivatives. In this step, the gradient magnitude is calculated in both x and y direction, 

and is thereafter combined into one edge image. The algorithm then tracks along the 

edges, a process known as non-maximal suppression. The tracking process is controlled 

by two thresholds: T 1 and T 2 with T 1 > T 2. Tracking only begins if the starting 

point has a value higher than T 1. Tracking then continues in both directions until the 

intensity value falls below T2. This method helps to ensure that noisy edges are not 

broken up into multiple edge fragments. The final step uses heuristic thresholding to 

keep only edge information and eliminate data that was wrongly identified. Figure 3.4 

shows the multi-stage edge detection process. 

Figure 3.4.: Multi-stage canny edge detection process. 

According to this process, the effect of the canny operator is influenced by 3 pa- 

rameters: the width of the Gaussian convolution mask and the thresholds T 1 and T 2. 

Increasing the width of the Gaussian mask “reduces the detector’s sensitivity to noise, 

at the expense of losing some of the finer detail in the image. The localization error in 

the detected edges also increases slightly as the Gaussian width is increased” [Fisher 

et al., 1994]. Example illustrations of Canny edge detection results with different con- 

volution masks in comparison to other edge detection mechanisms can be found in the 

work of Burger and Burge [2005, pp. 111–125]. 

46

3.3.3. Edge Detector Realization 


In order to realize the edge detection mechanisms in Java, we need backing frameworks 

that provide Image information and ease the image processing. We therefore used 

Java2D, which is part of the Java 2 Platform Standard Edition, and Java Advanced 

Imaging (JAI), which is an additional, freely available at the Java Sun homepage. In 

the following paragraphs, we shortly describe how these libraries can be used for edge 

detection. 

Java2D Edge Detectors Java2D does not provide a predefined edge detection mech- 

anism. In order to implement an edge detector, actual pixel values of an AWT image 

have to be processed. There are two ways to access these individual pixel values. 

The image manipulation features in AWT are primarily aimed at modifying individ- 

ual pixels as they pass through a ’filter’. A stream of pixel data is sent out by a 

ImageProducer, passes through the ImageFilter, and onto an ImageConsumer. The 

pre-defined ImageFilter subclass for processing individual pixels is the RGBImage- 

Filter. As the data is pushed out by the producer, this model is known as the push 

model. An alternative approach is to use the PixelGrabber class to collect all the pixel 

data from an image into an array, where it can then be conveniently processed. In 

this case, use must also be made of MemoryImageSource to funnel the changed array’s 

data as a stream to a specified ImageConsumer. Figure 3.5 shows an overview of the 

2 pixel acquisition methods. 

(a) (b) 

Figure 3.5.: Workflow of fetching individual pixels with Java2D. 

47


JAI Edge Detectors JAI provides the GradientMagnitude operation, an edge de- 

tector that computes the magnitude of the image gradient vector in two orthogonal 

directions. It performs two convolution operations on the source image by detecting 

edges in horizontal and vertical direction. The algorithm then calculates the gradient 

norm of the two intermediate images. 

The result of the GradientMagnitude operation may be defined as 

dst[x][y][b] = � (SH(x, y, b)) 2 + (SV (x, y, b)) 2 (3.4) 

where SH(x, y, b) and SV (x, y, b) are the horizontal and vertical gradient images gen- 

erated from band b of the source image by correlating it with the supplied orthogonal 

gradient masks. The default masks for the GradientMagnitude perform a Sobel edge 

enhancement. 

3.3.4. Further Improvements 

The Canny edge detection mechanism gives good results for the preprocessing. Still, 

this method could be improved by face specific preprocessing. Different image features 

have a different edge intensities. The eye region, for example, provides a wide range 

of hard gradient transitions. The mouth region does not have these clear boundaries, 

and especially the lower edge of the lower lip is often lost. An improved preprocessing 

algorithm could assume the image regions of facial features and treat these regions 

with a different intensity of the Gauss smoothing. With this technique, the mouth 

region edges could be improved, and interferences in less important regions, like the 

cheeks, could be omitted. Another possibility could be to add feature specific checks 

after the edge detection process. These checks could then determine if an important 

facial edge is missing and could close this gap by reanalyzing the input data. A 

straightforward way to improve the results for the mouth region could be to weight 

the red channel higher during the grayscale image production, as the most recognizable 

difference between the mouth and the alongside skin parts is in the intensity of the 

red channel. 

48

3.4. Summary 


In order to fulfill the requirements of the tracking algorithm, the input data has to be 

read and manipulated in a proper way. We therefore examined different APIs to grab 

frames from video input data, and decided to work with JMF, as it is available on 

different platforms, is freely available, and promises to be practicable. We also defined 

basic preconditions to the video data, and selected MPEG-1 sample data for testing 

purposes. In order to be applicable for the subsequent tracking algorithm, the video 

frames are processed with an edge detection mechanism, where we decided to look at 

both Java2D and JAI based functionality, preferably using a Canny edge detection 

method, as it shows the best tracking results. In the next section we go into detail 

with the code development of the Java tracker. 

49

4. Programming 

4.1. Overview 

After selecting the necessary libraries and methodologies, we can now illustrate the 

development of the Java tracker. We therefore describe the architecture, and state 

implementation details. We chose a modular program structure to ease the exchange 

of components and to separate the performed tasks. The graphical representation 

of tracking information is strictly separated from the data representation. All these 

design decisions are described in Section 4.2. Additionally, we explain the basic appli- 

cation flow and the implementation of the tracking algorithm. Section 4.3 then gives 

an insight of the implementation process. It describes the working environment and 

states problems that arouse during the coding phase. 

4.2. Architecture 

4.2.1. Structure 

The Java feature tracker is split up into 5 packages, grouped by the tasks of con- 

taining classes. These packages are the GUI, the graphical data representation layer, 

the domain layer, the data storage layer and the controlling and connecting classes. 

Figure 4.1 illustrates the implemented classes and according packages, the following 

paragraphs describe the functionality of each group. 

Controlling and connecting classes 

The controlling and connecting classes are responsible to establish and manage the 

communication between other packages, and therefore also manage the instantiation 

of important facade objects. For that purpose, the package has to provide constant 

values for the exchange of states between packages. 

50


The Main class, the startup object for the program, is part of this package. It creates 

the main GUI object and the TrackerController, and connects the two instances by ex- 

changing a StateChangedListener. The TrackerController is an interface that receives 

commands and propagates them to underlying domain classes. The TrackerFacade is 

the implementing class that accomplishes this task. The StateChangedListener is an 

interface that manages the communication from lower layers to the user interface and 

logfiles. 

Graphical User Interface 

The GUI holds the Swing interface that is presented to the user. It is implemented 

in the TrackerWindow class, which is created and linked to the remaining applica- 

tion. The TrackerWindow additionally encapsulates an implementation of the State- 

ChangedListener interface. This subclass is responsible to log application messages or 

print them to the user interface. 

Graphical Data Representation 

The graphical data representation provides functionality to generate a drawable rep- 

resentation of the BSP tracking tree, and presents all video and tracking information 

to the user. When the main GUI class requests the video visualization and tracking 

component from the TrackerController, a TrackerComponent interface is returned. 

It is implemented by the TrackerPanel, which manages the current image selections, 

the correct display of the current video frame, and the according tracking data. For 

each selection, the TrackerComponent holds a BSPTree2D, the visual equivalent to a 

BSPTree, and the Extrema2D, that is the leaf nodes with minimum or maximum x- 

or y-values. Both classes implement the interface Drawable, which eases the presen- 

tation of the shapes onto a Graphics context. They are compositions of Figures2D, 

an interface which is implemented by Points2D in the case of Extrema2D, and by 

BSPFigure2D in the case of BSPTree2D. The latter holds drawing information of the 

ellipses, centroids, major and minor axes, represented by the subclasses of BSPFig- 

ure2D, separately for leafs and inner nodes of the BSPTree. 

51

Domain Layer 


The domain layer encapsulates the core functionality of the Java feature tracker. It 

consists of 3 main parts: the classes responsible for video frame extraction, for pre- 

processing of frames, and for the tracking process itself. In each of these parts, the 

responsible class is created by a factory which returns an instance of the specific 

interface or abstract class. The TrackerFacade then communicates with these inter- 

faces. For the extraction of the video frames, the FrameExtractorFactory creates an 

instance of the FrameExtractor interface. For preprocessing, the PreprocessorFactory 

creates the Preprocessor. The core element of the program, the feature tracking im- 

plementation, is created by the RegionTrackerFactory, which returns a subclass of the 

RegionTracker. This class builds the tree of BSPNodes and returns the root node. 

Storage Layer 

The storage layer is responsible to store important tracked data for further usage, like 

for evaluation or higher-level tasks. The data is therefore saved to a TrackedRegion 

and returned to the TrackerFacade. The facade then forwards this information to the 

implementation of the TrackedDataController that manages the received data and files 

it to a specified location. 

52

domain 

Main 

> 

StateChangedListener 

gui 

FrameExtractorFactory 

> 

FrameExtractor 

FrameAccessPrePerformed FrameAccess 

TrackerWindow 

MyStateChangedListener 

bsp 

TrackerComponent 

TrackerPanel 

BSPCentroids2D 

JMFSnapper 

BSPNode 


* 

BSPEllipses2D 

AboutDialog 

BSPTree2D 

BSPFigure2D 

PreprocessorFactory 

> 

Preprocessor 

JAIEdgeDetector 

RegionTrackerFactory 

RegionTracker 

RGBRegionTracker 

SimpleFileFilter 

Figure2D 

* * 

TrackerFacade 

uses JMF, JAI uses JMF uses JMF 

uses JAI 

> 

Drawable 

BSPMajorAxes2D BSPMinorAxes2D 

> 

TrackerController 

PreprocessorException 

CannyEdgeDetector 

Figure 4.1.: Class Diagram 

53 

BinaryInvRegionTracker 

* 

Extrema2D 

Points2D 

ViewPreferences 

data 

> 

TrackedDataController 

TrackedDataHandler 

TrackedRegion

4.2.2. Basic Application Flow 


After describing the involved classes of the Java feature tracker, we now look at the 

way how the important classes communicate during runtime. The basic workflow is 

mainly managed by the TrackerController. As illustrated in Figure 4.2, the class re- 

ceives all calls from the GUI, such as the openVideo, playVideo or process commands. 

The TrackerController then propagates the command to the responsible class. 

During the openVideo command, the FrameExtractor is called, which extracts frames 

from the video input data. If a new image frame is available, the StateChangedListener 

is notified, and this object then updates the TrackerComponent, the visual component 

that displays video frames in the user interface. The tracker component is also re- 

sponsible to call the Preprocessor and request the preprocessed image. Other video 

playback functions trigger a similar process. 

process is called to execute the actual feature tracking. For that purpose, the method is 

redirected to the RegionTracker, which then creates the BSP tree for the current video 

frame. The TrackedDataController is responsible for saving the tracked information 

to a file. 

Figure 4.2.: Overview of the basic application workflow 

54

4.2.3. Tracking Algorithm 


The basic application workflow, as described in the previous section, has one core 

element, the RegionTracker. It is responsible to produce feature points out of a pre- 

processed image. It therefore creates a hierarchical BSP tree, where each node holds 

shape information for a certain part of the image. This process is based on the work 

of Rocha et al. [2002], as described in Section 1.3.3. Most of the tracking procedure is 

implemented in the class BSPNode that is responsible for the creation of a BSP tree. 

The basic procedure of the BSPNode works in three steps: 

1. Add image foreground pixels to the BSPNode (see procedure 4.2). 

2. Calculate orientation values of the added points (see procedure 4.3). 

After this step, the flag isCalculated is set to true, so that subsequent procedures 

can verify that the calculation step has not been left out. 

3. Create a BSP tree by subdividing the current node (see procedure 4.4). Return 

the current node as root. 

In order to save image information, the BSPNode holds three arrays: X and Y 

for the position of point P (x, y), and array V for the image intensity value of the 

point. If the algorithm only works with binary images, the intensity values in V 

is set to 1. The number of pixels that were already added to the node is held in 

ipix. The image moments are named mpq with p and q set to 0, 1, 2 (see the moment 

calculations in procedure 4.2). Additionally, every BSPNode holds a reference to 

the StateChangedListener (called listener) to propagate informations or errors to the 

user, and to a TrackedRegion (called trackedRegion) to permanently save tracking 

information. The tracking procedure is started by the class RegionTracker, which 

currently traverses all pixels of a rectangular image raster and adds all foreground 

pixels (that is pixels with a non-zero intensity value) to the root node (see procedure 

4.1). 

55

Functions of Class RegionTracker 


The abstract class RegionTracker is responsible to trigger the feature tracking pro- 

cess. In its method createBSPTree, the class creates the root node of the BSP tree, 

and carries out all steps that are necessary for this node. The class is derived by 

the BinaryInvRegionTracker, which works with binary images. It therefore takes the 

intensity value of the image pixel at band 0 of the raster and inverts it. (In the pre- 

processed image, all image pixels that belong to the image area are black (=0). After 

inverting, the value of image area pixel is 1). 

Procedure 4.1: createBSPTree(levels) 

1. Create a new BSPNode N with the following parameters: 

(a) The image raster r. It contains the pixels of a certain region of the image. 

(b) The maximum number of non-zero pixels nmax, in this case the number of 

pixels in the raster (widthr ∗ heightr). 

(c) A listener and a trackedRegion for feedback and data storage purposes. 

2. For each position (i, j) in the raster, do the following: 

(a) Let (xmin, ymin) be the position of the upper left raster point in the image. 

Fetch rxmin+i,ymin+j, the pixel value at postion (xmin + i, ymin + j) in the 

raster. 

(b) If ri,j �= 0, add point (i, j) to the node N. 

3. If at least one point was added to the root node, do: 

(a) Call function calculateValues() of node N. 

(b) Call function subdivide(levels-1) of node N. 

4. Return N 

56

Functions of Class BSPNode 


After having described the initial function calls to the BSPNode, we now look at the 

inside of these functions. The 3 substantial functions of the BSPNode are addPoint, 

calculateValues and subdivide. By splitting up the tracking process into these methods, 

it is possible to add region points independently to the current node. During the 

initialization of a new child node, the points that will be added to this node are 

not yet known. Moreover, the independent calculation of localization values allows 

for a method execution only if it is really needed, and for additional checks between 

calculation and subdivision. 

Procedure 4.2: addPoint(x, y, val) 

1. Add the point to the arrays: Xipix ← x, Yipix ← y and Yipix ← val 

2. Increase the number of pixels: ipix ← ipix + 1 

3. Add up to the moments: 

(a) m00 ← m00 + val 

(b) m10 ← m00 + x ∗ val 

(c) m01 ← m00 + y ∗ val 

(d) m11 ← m00 + x ∗ y ∗ val 

(e) m20 ← m00 + x 2 ∗ val 

(f) m02 ← m00 + y 2 ∗ val 

57


Procedure 4.3: calculateValues() 

1. Check if m00 is 0. If so, no pixels are set in this node and all further calculations 

are skipped. Return false in this case. 

2. Calculate the image centroid c(xc, yc): 

(a) xc ← m10 

m00 

(b) yc ← m01 

m00 

3. Calculate the second order central moments µ20, µ11 and µ02: 

(a) µ20 ← m20 

− x 

m00 

2 c 

(b) µ11 ← m11 

− xcyc 

m00 

(c) µ02 ← m02 

m00 

− y2 c 

4. Calculate θ. Two special cases have to be treated separately: µ11 = 0 and 

m20 = m02. The treatment of these cases was determined by program test runs. 

(a) If µ11 is 0, set θ as following: 

– If µ02 < µ20: θ ← 0 

– Else: θ ← π 

2 

(b) If m20 = m02 do: 

– If µ11 < 0: θ ← π 

4 

– Else: θ ← 3π 

4 

(c) Else, do the default calculation: 

θ ← tan−1 � � 

2µ11 

/2 

µ20 − µ02 

Note that Math.atan2(y, x) should be used instead of Math.atan( y 

x ) 

in Java. Otherwise, a sign error could lead to an angle rotated 90�. 

5. Set the flag isCalculated to true. 

6. Return true. 

58


Procedure 4.4: subdivide(levels) 

1. Check if flag isCalculated is true. Otherwise, stop the subdivision. 

2. If the current node is a leaf node, that is if no levels to compute are left 

(levels = 0), add the point to the trackedRegion and exit. 

3. Create the child nodes C1 and C2. 

The constructor parameter nmax (the maximum number of non-zero-pixels) is 

set to the number of non-zero pixels of the current node (ipix). 

4. Calculate the orthogonal angle to θ (the angle of the x-axis with the minor axis): 

θ⊥ = (θ + π 

2 ) mod π 

5. Iterate over all points. For every point P = (Xi, Yi) with the intensity value Vi, 

proceed as described: 

(a) If the point is the current centroid (Xi = xc and Yi = yc), add P to both 

child nodes. 

(b) Otherwise, divide the image area along the minor axis of the best fitting 

ellipse. For this, the reference system is shifted to have the centroid as the 

origin. Then the angle θP ′, the angle of the shifted point P ′ with the x-axis, 

is computed. The difference between θP ′ and θ⊥ is then taken to decide if 

the point is added to C1 or C2 (see Figure 4.3): 

1. Calculate θP ′. The y-value of P ′ is mirrored along the x-axis to correspond 

to the standard Cartesian coordinate system: 

π 

θP ′ ← 2 − atan2(xp − xc, −(yp − yc)) 

2. Calculate the difference angle β: β ← θP ′ − θ⊥ 

3. Verify that β is between −π and +π: 

If β ≤ −π, then β ← 2π + β 

4. If β ≤ 0 or β = π, then add point P to C1. 

5. If β ≥ 0, then add point P to C2. 

(c) Check that the current point was added in at least one child node. 

6. For both child nodes C1 and C2, call the function calculateValues(). If it returns 

true, call the function subdivide(levels-1). Otherwise, set the child node to null. 

(As it is an empty node, it is not used any more). 

59


(a) (b) 

Figure 4.3.: Angle calculation for raster subdivision. The best-fitting ellipse and the 

according splitting axis are defined for an image area (a). To decide what 

side of the splitting axis point P belongs to, the coordinate system is 

shifted to have the centroid C as origin (b). 

4.3. Implementation Process 

After showing the architecture, the basic workflow, and details of the tracking algo- 

rithm, we now describe the process of implementation. We therefore state the working 

environment for the development as well as the problems that arouse during the im- 

plementation of the previously described architecture. 

4.3.1. Working Environment 

The programming was done on a SuSE Linux platform, with Eclipse SDK in version 

3.1.0 for the Java development. NetBeans 4.1 was used for building the GUI. Code 

was written compatible with Java 1.4, but it was tested with both Java 1.5.0 01 and 

1.4.2 08 on Linux, and 1.4.2 on Windows. The JMF was used in the 2.1.1e Linux 

performance pack version (and the Windows performance pack for Windows testing), 

as the reading of the MPEG-1 files does not work with the pure java crossover version 

of JMF. Poseidon for UML Community Edition 3.0.1 was used for the initial class 

design and code generation. 

60

4.3.2. Difficulties 


During the development process of the Java movement tracker, we had to deal with 

some difficulties. The major drawbacks and delays originated in three main parts of the 

architecture: the implementation of the tracker algorithm, the video frame extraction, 

and the preprocessing methods. 

Algorithm 

The main problem during implementation of the tracking algorithm was the correct 

calculation of the orientation angle, and to find a straightforward way to subdivide the 

current image area into two child areas along the minor axis. The calculation of the 

angle θ required special treatment because the method sometimes delivers the sought 

angle rotated 90�. We found a hint that this problem can arise if the inverse tangent 

is calculated with Math.atan instead of Math.atan2. Even though we implemented 

this change, the problem is still not solved in all cases. Furthermore, there are two 

special cases where the standard angle calculation formula does not work: if µ11 = 0, 

or if µ20 = µ02. We solved this problem by manually testing various cases with 

different angles for these exceptions. Hence, we came up with values for θ that deliver 

satisfactory results. Though, we did not prove these values. For the area splitting, we 

first worked with linear equations in the form y = kx+d, and in the point-vector form. 

Both methods required additional conversions using the (invert) tangent. After some 

test runs we came up with the solution described in procedure 4.4. It is based on a shift 

of the coordinate system to have the centroid as its origin. Then, the difference angle 

between the θ and the angle between axis and shifted point P ′ is taken for decision 

making (see Fgure 4.3 for an illustration). 

Frame Extraction 

For the video frame extraction, we currently use a modified version of the sample class 

FrameAccess.java that is provided in the JMF guide [JMF, p. 54]. However, this code 

completely traverses the video, and start/stop functionality is only possible with rough 

workarounds. Other possibilities, like buffering all images in the cache is not possible 

due to limited memory. 

61


Caching images as files is too slow and generates too much file IO. We then tried a so- 

lution based on the class Seek.java 1 . It uses the FramePositioningControl helper class 

to access single video frames. However, this code did not work with our input videos 

(with both Linux and Windows JMF performance pack versions), as it returns 0 as a to- 

tal number of video frames. A ray of hope is the JMFSnapper implementation provided 

by Davison (http://fivedots.coe.psu.ac.th/~ad/jg/ch283/index.html). It is 

described in a draft chapter of the book “Killer Game Programming in Java”[Davison, 

2005] and explains a solution without using the FramePositioningControl. It works 

fine and is fast, but it is not yet completely integrated in the framework of the Java 

feature tracker. This would be a possibility for further enhancements. 

Edge detection 

The first aim was to implement the Canny edge detection algorithm using JAI. De- 

scriptions on how to proceed were vague and did not give enough help for the cod- 

ing. We found a project, called Beeegle, that uses a JAI Canny implementation, 

but the downloadable source code is incomplete and defective (http://beeegle.nl/ 

modules/sections/index.php?op=listarticles&secid=2). Consultation with the 

authors showed that the code is not used any more and will therefore not be updated 

or corrected. Hence, we reverted to an implementation that is provided in a Java forum 

(http://forums.java.sun.com/thread.jspa?threadID=546211&start=45&tstart= 

1) and adapted it to fit into the program architecture. Later on we tried to imple- 

ment a second edge detection mechanism, a simple Sobel operator proposed by the 

JAI framework. If the method processes an image that is fetched with JAIs fileload 

method, the operator is very fast and delivers good results. Integrated into the Java 

feature tracking, the function did not work correctly. The process was very slow and 

delivered a binary image, even though a grayscale image was expected. After in- 

quiry, we found out that the image type of the BufferedImage differs in the two cases. 

In the latter case, the image is fetched from an AWTImage, which returns the type 

TYPE 3BYTE BGR. Then the method getAsBufferedImage of the PlanarImage re- 

quires 1 to 3 seconds for processing. This problem could not be solved for the time 

being. 

1 an official JMF solution provided on the Java Sun homepage (http://java.sun.com/products/ 

java-media/jmf/2.1.1/solutions/Seek.java). 

62

4.4. Summary 


The architecture of the Java feature tracker is split up into 5 packages, showing a 

modular structure with 3 exchangeable parts, responsible for frame extraction, pre- 

processing and feature tracking. Because of a factory-based construction of these 

parts, each of them can be replaced and therefore allows for a comparison of differ- 

ent approaches. This is not only important for future enhancements, but also for the 

development process when problematical code could easily be exchanged. Difficulties 

mainly arouse in 3 major parts. Angle calculations and subdivision needed special 

treatment and increased testing; JMF was not as comfortable as it probably claims to 

be; and JAI is not yet used for preprocessing, even though the basic JAI edge detection 

mechanism would be faster. 

63

5. Evaluation 

5.1. Overview 

To evaluate the quality of the developed Java feature tracker, we first look at the 

basic program functionalities and then focus on two main aspects: the quality of the 

extracted feature points, and the time consumption. In a first step, we evaluate the 

abilities that users have via the user interface. Then we rate extracted feature points of 

a mouth region, and compare the elapsed time for the preprocessing and the tracking 

methods in different circumstances. 

5.2. Program Abilities 

The developed Java feature tracker is able to find facial feature points of manually 

preselected feature areas in a sequence of video frames. In a series of steps, the users 

first open a video, and set selections for the areas to process on either the original or 

the preprocessed video frame. By selecting the process-button, the program starts the 

creation of the BSP tree, memorizes the result and presents it frame-by-frame in the 

GUI. After running through the video frames, the users can save the feature points to 

a Comma Separated Values (CSV) file. 

The users can select or unselect various tracking information for visualization. They 

can look at algorithm-specific data like the 16 tracked centroid points, and the cor- 

responding ellipses and ellipse axes. 4 (or more) of these 16 points are then called 

features, which are the points with the biggest or smallest x/y-values. These feature 

points can be viewed separately in the GUI. The user interface allows for basic flow 

control of the video playback, but has some limitations. Play, stop and the display 

of the next frame work fine. However, pausing a video is faulty, and the display of 

the previous frame is not implemented. Moreover, the program architecture provides 

customizing options, like the color and stroke size of the feature display, or the se- 

lection of the preprocessor and the frame extractor. A GUI for these features is not 

implemented yet, but can be added with little additional effort. 

64

5.3. Tracking Quality 

5. Evaluation 

First observations of the tracked feature points show that preselections on single areas 

deliver acceptable results. The most accurate output was achieved for an eyebrow 

selection. Mouth selections have good results except for the lowest point of the lower 

lip, where the preprocessing is not able to find a continuous edge. Since there is no link 

between feature calculations of two subsequent frames, the location of this point may 

flip horizontally from the left side of the mouth in frame n to the right side in frame 

(n + 1). Eye and nose selections require exact area preselection, as nearby edges may 

disturb the tracking process. However, in contrast to the snake algorithm, a selection 

of a larger area without disturbing pixels does not change the result, as background 

pixels (0-value pixels in binary images) do not have an influence on the calculation. 

Calculated points of the implemented Java feature tracker do not necessarily match 

with standardized feature points, as the algorithm has no knowledge about the under- 

lying image section. It therefore processes every image region selection in an equal way, 

without knowing if the produced feature point is, for example, the corner of the mouth. 

Figures 5.1 and 5.2 show feature points that were produced by the Java feature 

tracker. All test runs illustrated in Figure 5.1 produced satisfactory outcome. Image 

(a) and (b) have only minor deviations, the left corner of the mouth in image (a), for 

example, is slightly too low (that is a too high y-value). Images (c), (d), (e), and (f) 

show the problem with the lowest point of the lower lip. The point is either on the 

left or on the right side of the desired centered position. Results on other face regions 

are illustrated in (g) and (h). Figure 5.2 shows a whole-face feature tracking process, 

which works with 6 image selections. As illustrated in (b), the 16 tracked points per 

selected regions correctly approximate the contour of the underlying facial feature. (d) 

shows the Canny preprocessed image, with the discontinuity of the lower line of the 

lower lip. 

65

5. Evaluation 

(a) (b) 

(c) (d) 

(e) (f) 

(g) (h) 

Figure 5.1.: Tracking results for selective regions. 

66

5. Evaluation 

(a) (b) 

(c) (d) 

Figure 5.2.: Tracking result for 6 area selections: 2x eyebrow, 2x eye, nose, mouth. 

The produced features are in (a) and all centroids of the leaf-nodes in the 

BSP tree in (b). The preselected image areas are visible in (c), (d) shows 

the outcome in the preprocessed Canny edge image. 

67

5.3.1. Test Data 

5. Evaluation 

For the statistical evaluation, we have selected a video that shows a mouth movement. 

From our test data (described in Section 3.2.3) we have chosen the recording of AU 

23, described as Lip Tightener, performed by the facial muscle Orbicularis oris 1 . 

5.3.2. Technique 

For the mouth region testing, we perform a mouth region selection of 60x20 pixels, 

starting at point (233, 219). We examine the corners of the mouth, as these two 

points are best comparable and straightforward to examine. The left point (from 

the viewers perspective) is called point1, its coordinates are (x1, y1); the right point 

is called point2, with the coordinates (x2, y2). Three different parties have collected 

data: Two human testers manually examined the two features. The first tester is the 

developer of the Java tracker, female, 21 years old (called man1 from now on). Tester 2 

is an unprejudiced male, 15 years (called man2). The third input comes from the data 

extracted by the algorithm (called algo). Figure 5.3 shows two resulting frames, frame 

number 106 with almost congruent results (0 or 1 pixel difference), and frame number 

88 with the most dissimilar tracking points (up to 5 pixels difference) in the described 

testcase. 

(a) (b) 

Figure 5.3.: Good and bad results: a frame with almost identical tracking points (a), 

and the frame with the biggest differences (b). A white pixel is the selec- 

tion by man1, blue by man2, and green is calculated by algo. 

For the statistical calculations and diagram extraction we used Gnumeric in version 

1.2.13, OpenOffice.org 2.0 beta, and SPSS 11. 

1 the overview of action unit description can be found at http://www.cs.cmu.edu/afs/cs/project/ 

face/www/facs.htm, the recent AU manual is available at http://face-and-emotion.com/ 

dataface/facs/new_version.jsp 

68

5.3.3. Statistical Evaluation 

5. Evaluation 

According to the testcase description in Section 5.3.2, we performed a test run with the 

implemented Java feature tracker and collected the manual capturing of the 2 human 

testers. The output is 3 different sources for both x- and y-value of each corner of the 

mouth. Figure 5.4 illustrates the result of the data ascertainment (see Appendix A.1 

for all values). 

point1 

point2 

Figure 5.4.: Positions of the corners of the mouth. 

In order to examine the correctness of the tracked feature points, we focus on 4 

aspects: First, we look at the absolute values and compare coordinate positions as well 

as curve progressions. Then we examine the quality of the program output relative to 

the manual tracking data, where we look at the curve progression and the relationhip 

between the curves. 

69

5. Evaluation 

Coordinate Position Looking at the means over all x/y values, it shows that espe- 

cially the algorithm-calculated coordinates of point2 differ from the manual selections 

(see Figure 5.1). The x-value is too high (too far right in the image region), and the 

y-value too low (too high in the image region). 

x1 y1 x2 y2 

algo 243.03 220.03 284.07 217.59 

man1 243.2 219.43 282.75 219.51 

man2 243.2 219.43 282.75 219.51 

Table 5.1.: Mean position of x/y coordinates. 

The source of this inaccuracy is most likely to be found in the preprocessing. As 

illustrated in Figure 5.5, the mouth contours of the preprocessed image are ragged 

and uncontinuous. This image also shows why the value of y2 is too high: a false edge 

outside the corner of the mouth is visible in the preprocessed image. 

(a) (b) 

Figure 5.5.: Preprocessing of mouth region. The original image selection (a), and the 

preprocessed version that produced ragged edges (b). 

Looking at the minima and maxima over all tracked frames, we see that the algo- 

rithm data has more outliers than the manually determined data. For example, the 

y-coordinate of point2 has a minimum of 215 where the manual testers reach values 

of 218 and 219. The maximum of this point is not higher, as the algorithm generally 

produces too low y-values (see Table 5.2). 

70

5. Evaluation 

Minima Maxima 

x1 y1 x2 y2 x1 y1 x2 y2 

algo 210 218 277 215 251 224 289 219 

man1 236 218 278 218 249 221 288 221 

man2 237 219 278 219 251 222 288 222 

Table 5.2.: Minima and maxima of x/y coordinates. 

We assume that both average and min/max values would improve in case of a clearer 

and continuous edge detection. 

Curve Progression As a next step, we want to examine if the curve progression is 

continuous, or if the coordinate values change strongly from one frame to the next. We 

therefore determine the squared difference between two subsequent video frames (see 

Figure 5.6) and calculate the sum over all 136 frames, as well as the maximum and 

mean value (see Table 5.3). The curve progression of algo tends to be more erratic and 

oscillative than of man1 and man2. The maximum oscillation is in any case produced 

by algo. Taking the sum over all values, algo has significantly higher values. For the 

x-coordinates, man2 delivers significantly better results, man1 is closer to the values 

of the algorithm. In terms of the y-coordinates, algo is significantly worse than both 

manual testers (see Table 5.3 for the numbers). 

x1 y1 x2 y2 

man1 man2 algo man1 man2 algo man1 man2 algo man1 man2 algo 

sum 197 116 220 29 16 124 108 78 211 27 16 54 

max 25 9 25 1 1 16 9 16 16 1 1 9 

avg 1.46 0.86 1.63 0.21 0.12 0.92 0.8 0.58 1.56 0.2 0.12 0.4 

Table 5.3.: Sum, maximum and average of d 2 on subsequent frames. 

71

point1 

point2 

5. Evaluation 

Figure 5.6.: d 2 of subsequent video frames for both x and y coordinate of point1 and 

point2. 

Curve Relationship In order to determine how the data of algo, man1 and man2 are 

related to each other, we first calculate the correlation of the curves in Figure 5.4. 

Correlations x1 y1 x2 y2 

man1 – man2 0.97148 0.84148 0.97423 0.84805 

man1 – algo 0.97919 0.7235 0.96173 -0.073 

algo – man2 0.98083 0.67658 0.95887 -0.17251 

standard derivation 0.00499 0.08496 0.00816 0.56269 

Table 5.4.: Correlation results. 

72

5. Evaluation 

These correlations show good results for the algorithm in respect to the manual 

testers. The x-value of point1 shows correlations around 97-98%, and a standard 

deviation lower than 0.5%. Only the y-value of point2 does not have satisfactory 

results. The manual testing values do not correlate with the output of the algorithm, 

the standard deviation is over 56%. Table 5.4 shows all results. In addition to the 

correlation, we look at the similarity by calculating the square of the frame-by-frame 

difference between the testers and performing a t-test on this data. Table 5.5 shows 

the result of this approach. 

point1 x1 y1 

T P (T

5. Evaluation 

The output of the t-test has to be treated with caution, as the precondition of having 

normal distributed values is not fulfilled [Smith, 2002] (both Kolmogorov-Smirnov 

and Shapiro-Wilk tests return significances below 0.000). The reason why we still 

calculated the t-test is that the curves are predominantly bell-shaped, and that we 

expect a normal distribution with a greater sample size. Still, the t-test inspects 

the x/y-coordinates individually, but does not deal with the overall quality of the 

two feature points. The Analysis of Variance (ANOVA) calculation in the following 

paragraph tries to fill this gap. 

Relative Point Position In order to compare the position of the automatically de- 

tected feature points with the points determined by the manual tester, we take a closer 

look at the (squared) distance between curves of different testers. Table 5.6 shows the 

sum of all d 2 ’s, as well as minima and maxima of all tracked frames. The distances 

between the curve of one manual tester and the algorithm are significantly higher than 

between the manual testers. man1 is still closer to algo than man2. 

man1 – man2 x1 y1 x2 y2 

sum 189 251 114 224 

maximum 9 9 4 4 

average 1.39 1.85 0.84 1.65 

man1 – algo x1 y1 x2 y2 

sum 223 166 453 779 

maximum 9 9 16 25 

average 1.64 1.22 3.33 5.73 

algo – man2 x1 y1 x2 y2 

sum 192 183 323 1593 

maximum 16 9 9 36 

average 1.41 1.35 2.38 11.71 

Table 5.6.: Sum, maximum and average of d 2 of different tester’s values. 

74

5. Evaluation 

For examining the overall performance of the two algorithm-tracked feature points 

in respect to the manual testers, we calculate ANOVA for the d 2 values described 

above. In order to clean up the data and make the data more likely to be normal 

distributed, we calculate the extrema and removed them from the data set. Table 

5.7 shows the output of this extrema calculation. Note that point2 has less extrema 

(except the x-coordinate of d 2 (man1 − algo)), but with higher values (up to >= 36). 

We therefore assume that point1 has more outliers than point1. 

x1 y1 x2 y2 

d 2 (man1 − algo) – 16 >= 4 24 >= 9 2 >= 25 

d 2 (man2 − algo) 22 >= 4 22 >= 4 13 >= 9 5 >= 36 

Table 5.7.: Extrema of d 2 between the algorithm and each of the manual testers. 

After removing the extrema from the data list, we recalculate ANOVA on the 

cleaned-up data set. According to a test of homogeneity of variances, the new data set 

is not significantly homogenous (with a significance of 0.000). In order to evaluate the 

overall performance of the feature points, we calculate a contrast test where we divide 

the d 2 -results into 2 groups and compare their mean values. We perform 3 groupings: 

all values of man1 (d 2 of x1, y1, x2, and y2) compared to all values of man2; all d 2 of x- 

coordinates (of both testers) compared to the d 2 ’s of y-coordinates; and the distances 

of point1 compared to the distances of point2. The results are illustrated in Table 

5.8. It shows that man2 has a bigger spread as man1, the difference to the algorithm 

is larger. The difference between the means of all y-values and those of all x-values 

is 11.417, so it is likely that there is an overestimation in the vertical direction. The 

spread between all point1-values and all point2-values is 16.529, so point2 is more 

likely to be overestimated. 

Contrast Spread Std. Error t 

avg(valman2) – avg(valman1) 3.997 0.9399 4.253 

avg(valys) – avg(valxs) 11.417 0.9399 12.147 

avg(valpoint2s) – avg(valpoint1s) 16.529 0.9399 17.586 

Table 5.8.: Contrast tests of d 2 between the algorithm and each of the manual testers. 

75

5. Evaluation 

After looking at the overall success of the feature points, we finally create the 

overestimation-table 5.9, where we can compare d 2 of one coordinate to the value 

of each other coordinate. The table shows that d 2 (y2) of man2-algo has the biggest 

difference to all other values, it has the largest spread and therefore varies most during 

the feature tracking process. x1 and y1 have the same spread, their error rate is likely 

to be similar. 

man1-algo man2-algo 

d 2 (x1) d 2 (y1) d 2 (x2) d 2 (y2) d 2 (x1) d 2 (y1) d 2 (x2) d 2 (y2) 

man1- d 2 (x1) – 1.081 -3.801 1.062 1.062 -9.147 

algo d 2 (y1) -1.081 – -1.495 -4.882 -1.116 -10.228 

d 2 (x2) 1.495 – -3.387 1.440 1.440 -8.733 

d 2 (y2) 3.801 4.882 3.387 – 4.826 4.826 3.766 -5.346 

man2- d 2 (x1) -1.026 -1.440 -4.826 – -1.061 -10.172 

algo d 2 (y1) -1.026 -1.440 -4.826 – -1.061 -10.172 

d 2 (x2) 1.116 -3.766 1.061 1.061 – -9.111 

d 2 (y2) 9.147 10.228 8.733 5.346 10.172 10.172 9.111 – 

Table 5.9.: Tamhane post-hoc test on d 2 . Mean difference (row − column) where it is 

significant in the 0.05 level. 

76

5.4. Time Usage 

5.4.1. Technique 

5. Evaluation 

For testing, we use the same input data as in Section 5.3.2. We investigate the time 

consumption of the two methods that are mainly responsible for the tracking process 

and may consume the most time. The first procedure is used for returning the feature 

points (see procedure 4.1 on page 56), the second is responsible for the edge detection. 

For testing the tracking method, we use the standard program, whereas in case of 

the preprocessing we use a testclass, called PreprocessingTimeTest, as it minimizes 

additional overhead. The time information is then extracted from the logfile. 

5.4.2. Statistical Evaluation 

Feature Tracking 

For the comparison of the implemented feature tracking algorithm, we examine 2 test- 

cases: a complete image region selection and a 60x20 pixels mouth region selection. 

During the first test run, we found that the results are disturbed by garbage collection 

latencies. Hence, we tune the JVM parameters to allow for parallel processing during 

garbage collection. The JVM used parameters are: -verbose:gc -Xms64m -Xmx256m 

-XX:NewRatio=2 -XX:+UseConcMarkSweepGC. Figure 5.7 shows the tracking time 

output for both region selections (all tracking time information can be found in ap- 

pendix A.2 and A.3). 

(a) (b) 

Figure 5.7.: Tracking time consumption for the complete region selection (a) and a 

60x20 mouth selection (b). 

77

5. Evaluation 

Remarkably, the JVM tuning considerably improves the tracking of the complete 

region, but has the opposite effect on the smaller selection. The tracking times lay 

between 3.94 ms (5.2 ms JVM tuned) for the mouth region, and 51.76 ms (21.4 ms 

JVM tuned) for the whole-image selection. 

Preprocessing 

To test the preparation of images, we compare two preprocessing implementations: the 

currently used Canny edge detector, and a standard JAI edge detection mechanism, 

which is based on the Sobel operator. As in the testing of the tracking method, 

we perform the tests with both standard and tuned JVM options. The results are 

illustrated in Figure 5.8 (all tracking values can be found in appendix A.4 and A.5). 

(a) (b) 

Figure 5.8.: Preprocessing time consumption for the Canny (a) and the JAI Sobel (b) 

edge detection. 

The figures show that the currently used edge detection mechanism is considerably 

slower than the JAI operator. The average processing time for Canny is 212.91 ms 

(156.42 ms with JVM tuning), in contrast to 1.27 ms (0.86 ms JVM tuned) for Sobel. 

Unfortunately, the excellent values of the JAI Sobel operator cannot be transmitted to 

the Java feature tracker. In both the test class and the final program the algorithm uses 

a BufferedImage. In case of the test class, the type of this image is TYPE INT RGB, 

in the final program it is TYPE 3BYTE BGR, as it the image is directly fetched 

from the video and not read from a saved .png image. In case of the latter type, the 

necessary image conversion requires processing times of more than 1 second. 

78

5.5. Summary 

5. Evaluation 

The evaluation showed that the developed Java feature tracker is able to deliver reason- 

able results. Important feature points can be located; differences in mean coordinate 

positions are below 2 pixels, and correlations of the produced feature points reach val- 

ues of up to 98%. The results differ per inspected feature point. For example, the left 

corner of the mouth showed a more accurate position as the right corner. Most signif- 

icant improvements can be reached by improving the preprocessing. This can be done 

by producing continuous, non-ragged feature contours that do not respond to light- 

ning changes, and omitting intensity changes in the image that do not belong to facial 

features. This step would also improve the oscillating behavior that can be noticed 

in the current version of the Java tracker. Another step would be to add geometrical 

transformations to link subsequent feature points, or to omit outliers by calculating 

more accurate values from the neighboring frames. Performancewise, the bottleneck 

of the tracking process is the preprocessing. It currently needs about 4 times as much 

time as the actual feature tracking. A JAI based edge detector would be faster for the 

image preparation, but further inquiry is necessary to find a workaround for extremely 

time-consuming image type conversions. 

79

Conclusions 

Overview 

The feature tracking solution described in this work is completely based on Java and 

finds facial feature points in preselected image regions of input videos. In this work, 

we explained the selection of algorithms and libraries as well as implementation and 

evaluation details. The main advantages of this solution are a (theoretical) platform- 

independence, little hardware requirements and small tracking efforts. 

Java turns out to be, with some constraints, very practicable as programming lan- 

guage for feature tracking problems. The time consumption is reasonably small and 

the implementation comfortable, as Java libraries are available for video processing and 

imaging tasks. Still, JMF, which was used for video frame extraction, does not keep all 

promises, as jumping between frames appears to be challenging or does not even work 

in the proposed way. JAI for image processing seems to be fast, but conversions be- 

tween image types are inapprehensibly time-consuming and could not be bypassed for 

the time being. For this reason, we alternatively used a Java2D based implementation. 

The applied algorithm, proposed by Rocha et al. [2002], shows the possibility to 

solve a complex task by splitting it up into smaller and therefore easier problems. The 

creation of a BSP tree, with nodes that hold object position, size and orientation, is 

straightforward and understandable. Nevertheless, we had to study different sources 

to verify equation definitions. The implementation is widely trouble-free, as the ap- 

proach is clearly specified in the underlying paper. However, problems arouse during 

the calculation of the orientation angle, where the method sometimes returns an angle 

rotated 90�. We did not yet find a solution for this difficulty. Still, the algorithm 

delivers good results for the feature tracking task. The accuracy of feature points 

often lies around 90%. Errors in the feature determination mostly have their origin in 

preprocessing discrepancies. 

80

Future Work 

Conclusions 

According to the listed problems, the major improvements of the current solution 

would be a revised JMF code, an enhanced preprocessing method and solved track- 

ing algorithm difficulties. Frame-by-frame traversion of the video could be enhanced 

by adding the previousFrame() functionality. Improved edge detection should make 

all important edges visible, like the lower line of the lower lip. Enhanced processing 

times could be reached by exclusively preprocessing selected image regions. Moreover, 

individual tracking cases with wrong object orientations have to be revised. 

In comparison to the commercial VeeAnimator, the Java feature tracker is inferior 

in a number of aspects. Image regions need manual initialization, it currently does 

not work with streaming media and is not able to track in realtime. The feature 

points still flutter if observed over a series of frames, they are not standardized, and 

do not deliver 3D information. The program could be changed to that effect by using 

geometric transformations between the shape information of subsequent frames (as 

described in the paper of Rocha et al. [2002]). JMF could be used to open video 

streams or could be replaced by alternative libraries. Automatic preselection of image 

areas could be introduced, for example by using facial shape models. These models 

could then be used to map the tracked points to a standardized image feature model 

by selecting feature points according to their proximity to model points. For simple 

3D information, the z-axis could be set to standard values. 

Summary 

The current program shows a straightforward and comprehensible feature tracking 

solution that provides basic tracking procedures. In case of further developments, for 

example by the Open Source community, the project could be a free and independent 

alternative in the field of feature tracking, facilitating high-level face processing tasks. 

Having Java as a basis, it would be suitable to professional 3D animations or future 

user interfaces on different platforms. Moreover, it could be used for teaching and 

further investigations in the field of computer vision. With this work we have unveiled 

the possibilities for future developments. 

81

Bibliography 

G. Antunes Abrantes and F. Pereira. MPEG-4 Facial Animation Technology: Survey, 

Implementation and Results. IEEE Transactions on Circuits and Systems for Video 

Technology, 9(2):290–305, March 1999. 

T. Awcock. Applied Image Processing. McGraw-Hill Companies, August 1995. 

S. S. Beauchemin and J. L. Barron. The Computation of Optical Flow. ACM Comput. 

Surv., 27(3):433–466, 1995. ISSN 0360-0300. 

D. L. Bimler, J. Kirkland, and K. A. Jameson. Quantifying variations in personal color 

spaces: Are there sex differences in color vision? Color Research & Application, 2: 

128–134, 2004. 

W. Burger and M. J. Burge. Digitale Bildverarbeitung. eXamen.press. Springer, 2005. 

C. Cédras and M. A. Shah. Motion Based Recognition: A Survey. Image and Vi- 

sion Computing, 13(2):129–155, March 1995. URL http://www.cc.gatech.edu/ 

~jimmyd/summaries/cedras1995.html. 

J. Cohn, A. Zlochower, J.-J. J. Lien, and T. Kanade. Feature-Point Tracking by Op- 

tical Flow Discriminates Subtle Differences in Facial Expression. In Proceedings of 

the 3rd IEEE International Conference on Automatic Face and Gesture Recognition 

(FG ’98), pages 396 – 401, April 1998. URL http://www.ri.cmu.edu/pubs/pub_ 

2075.html. 

R. Cutler and M. Turky. View-Based Interpretation of Real-Time Optical Flow for 

Gesture Recognition. http://citeseer.ist.psu.edu/cutler98viewbased.html, 

1998. 

A. Davison. Killer Game Programming in Java. O’Reilly, 1 edition, 2005. ISBN 

0-596-00730-2. 

82

Bibliography 

F. Dellaert and R. Collins. Fast Image-Based Tracking by Selective Pixel Integration. 

In ICCV 99 Workshop on Frame-Rate Vision, September 1999. URL http://www. 

ri.cmu.edu/pubs/pub_3195_text.html. 

P. Ekman and W. Friesen. Facial Action Coding System: A Technique for the Mea- 

surement of Facial Movement. Consulting Psychologists Press, Palo Alto, 1978. 

I. Essa and A. Pentland. Motion-Based Recognition, volume 9 of Computational Imag- 

ing and Vision, chapter 12: Facial Expression Recognition Using Image Motion. 

Kluwer Academic Publishers, 1997. ISBN 0-7923-4618-1. 

B. Fisher, S. Perkins, A. Walker, and E. Wolfart. Hypermedia Image Processing Ref- 

erence. http://www.cee.hw.ac.uk/hipr/html/canny.html, Department of Arti- 

ficial Intelligence University of Edinburgh/UK, 1994. 

D. Gorodnichy. Facial Recognition in Video. In Proceedings of International As- 

sociation for Pattern Recognition (IAPR) International Conference on Audio- and 

Video-Based Biometric Person Authentication (AVBPA’03), LNCS 2688, pages 505– 

514, Guildford, United Kingdom, June 2003. NRC 47150. URL http://iit-iti. 

nrc-cnrc.gc.ca/publications/nrc-47150_e.html. 

T. Goto, M. Escher, C. Zanardi, and N. Magnenat-Thalmann. MPEG-4 Based 

Animation With Face Feature Tracking. In CAS ’99 (Eurographics workshop), 

pages 89–98, Milano, Italy, September 1999. MIRALab, Springer. URL http: 

//www.miralab.unige.ch/papers/15.pdf. 

W. Iverson. Mac OS X for Java Geeks. O’Reilly, April 2003. URL 

http://www.oreilly.com/catalog/macxjvgks/http://www.oreilly.com/ 

catalog/macxjvgks/chapter/ch10.pdf. 

J. Ivins and J. Porrill. Everything You Always Wanted To Know About Snakes (But 

Were Afraid To Ask). Technical report, Artificial Intelligence Vision Research Unit 

University Of Sheffield, England S10 2TP, July 1993. URL http://www.computing. 

edu.au/~jim/psfiles/aivru86c.ps. AIVRU Technical Memo #86 (Revised June 

1995; March 2000). 

M. Jacob, T. Blu, and M. Unser. Efficient Energies and Algorithms for Parametric 

83

Bibliography 

Snakes. IEEE Transactions on Image Processing, 13(9):1231–1244, September 2004. 

URL http://ip.beckman.uiuc.edu/publications.html. 

JMF. Java�Media Framework API Guide, jmf 2.0 fcs edition, November 1999. 

M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active Contour Models. Interna- 

tional Journal of Computer Vision,, 1:321–331, 1988. 

V. Krüger, A. Happe, and G. Sommer. Affine Real-Time Face Tracking Using Gabor 

Wavelet Networks. In ICPR00: Proceedings of the International Conference on 

Pattern Recognition (ICPR00), volume 1, page 1127, Washington, DC, USA, 2000. 

IEEE Computer Society. 

B. Mackiewich. Intracranial Boundary Detection and Radio Frequency Correction in 

Magnetic Resonance Images. Master’s thesis, Simon Fraser University, August 1995. 

URL http://www.cs.sfu.ca/~stella/papers/blairthesis/main/main.html. 

B. S. Morse. Lecture 11: Shape Representation: Regions (Moments). http://bryan. 

cs.byu.edu/650/home/index.php, January 2004. Course material for ‘Computer 

Vision’ at Brigham Young University. 

R. Mukundan and K. R. Ramakrishnan. Moment Functions in Image Analysis. World 

Scientific, 1998. 

L. Rocha, L. Velho, and P. C. P. Carvalho. Image Moments-Based Structuring and 

Tracking of Objects. sibgrapi, 00:99, 2002. 

H. Sahbi and N. Boujemaa. Coarse to Fine Face Detection Based on Skin Color 

Adaption. In ECCV ’02: Proceedings of the International ECCV 2002 Work- 

shop Copenhagen on Biometric Authentication, pages 112–120, London, UK, 2002. 

Springer-Verlag. ISBN 3-540-43723-1. URL http://www-rocq.inria.fr/imedia/ 

Articles/23590112.pdf. 

T. Smith. When to use and not to use the two-sample t-test. http:// 

www.ubht.nhs.uk/R\&D/RDSU/Statistical%20Tutorials/t-tests.pdf, Novem- 

ber 2002. URL http://www.ubht.nhs.uk/R&D/RDSU/Statistical%20Tutorials/ 

statistical_tutorials.htm. Research and Effectiveness Department (United 

Bristol Healthcare NHS Trust). 

84

Glossary 

M. Sonka, V. Hlavac, and R. Boyle. Image Processing, Analysis, and Machine Vision. 

PWS Publishing, second edition, 1999. 

D. Terzopoulos and K. Waters. Analysis and Synthesis of Facial Image Sequences 

Using Physical and Anatomical Models. IEEE Trans. Pattern Anal. Mach. Intell., 

15(6):569–579, 1993. ISSN 0162-8828. 

vidiator. Using FaceStation 2. http://www.vidiator.com/support/ 

facestationdocs/index.html, 2004. 

H. Wu, Q. Chen, and M. Yachida. Face Detection From Color Images Using a Fuzzy 

Pattern Matching Method. IEEE Trans. Pattern Anal. Mach. Intell., 21(6):557–563, 

1999. ISSN 0162-8828. 

X. Xie and M. Mirmehdi. Geodesic Colour Active Contour Resistent to Weak 

Edges and Noise. In Proceedings of the 14th British Machine Vision Conference, 

pages 399–408. BMVA Press, September 2003. URL http://www.cs.bris.ac.uk/ 

Publications/Papers/2000034.pdf. 

J. Zobel. Writing for Computer Science. Springer, 2 edition, 2004. 

85

Glossary 

ANOVA Analysis of Variance. A series of statistical procedures for examining 

differences in means and for partitioning variance, 74, 75 

API Application Programming Interface. A defined set of calling con- 

ventions allowing a software application to access a particular set of 

services, 36, 37, 39, 49 

AU Action Unit. The front-end interface and navigation design of an 

application, 9, 10, 68 

BSP Binary Space Partitioning. A technique for the division of geomet- 

rical objects. It is mainly used in game engines of computer games, 

20, 35, 51, 54–56, 64, 67, 80 

CSV Comma Separated Values. A file format used as a portable represen- 

tation of a database, 64 

FACS Facial Action Coding System. The front-end interface and navigation 

design of an application, 9–11, 26, 27 

FAP Facial Animation Parameters. Feature points used for facial anima- 

tion that are standardized in the MPEG-4 standard, 2 

FBX A platform-independent 3D authoring and interchange format., 23 

GUI Graphical User Interface. The front-end interface and navigation 

design of an application, 3, 50, 51, 54, 60, 64 

86

Glossary 

JAI The Java Advanced Imaging API. An optional package extending the 

Java 2 Platform, providing additional capabilities for running image 

processing applications and imaging applets in Java, 47–49, 62, 63, 

78–80 

JMF The Java Media Framework API. An optional package extending the 

Java 2 Platform that enables audio, video and other time-based media 

to be added to applications and applets built on Java technology, 3, 

36, 37, 40, 41, 49, 60–63, 80, 81 

JNI Java Native Interface. A programming framework that allows Java 

code running in the Java VM to call and be called by native appli- 

cations and libraries written in other languages, 36 

JVM Java Virtual Machine. A piece of software that converts Java byte- 

code into machine language and executes it, 39, 77, 78 

LoG Laplacian of Gaussian. A convolution operator using a Gaussian 

image smoothing and a second derivative Laplace operator, 45 

PCU Portable Control Unit. Desktop control of lightsource; external power 

supply replaces internal PC backpanel power supply, 23, 25 

QTJava Quicktime for Java. In a different context it is also: Java QT binding, 

39 

SDK Software Development Kit. A programming package that enables 

a programmer to develop applications for a specific platform. Java 

SDK versions below 1.2 and version 1.5 are called JDK, 23, 25, 40, 

60 

87

List of Figures 

0.1. Not standardized output of facial feature localization. . . . . . . . . . 2 

1.1. Basic facial feature tracking workflow. . . . . . . . . . . . . . . . . . . 7 

1.2. Optical flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 

1.3. Standard face model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 

1.4. Feature point displacements. . . . . . . . . . . . . . . . . . . . . . . . 10 

1.5. Control-theoretic mapping of optical flow. . . . . . . . . . . . . . . . . 10 

1.6. A closed snake. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 

1.7. An example of the movement of a point in a snake. . . . . . . . . . . . 13 

1.8. Snakes and fiducial points used for muscle contraction estimation. . . 14 

1.9. Weak-edge leakage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 

1.10. Example for moment calculations and shape representation. . . . . . . 16 

1.11. Object fitting by 2 k ellipses at each level. . . . . . . . . . . . . . . . . 20 

1.12. X-IST FaceTracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 

1.13. VeeAnimator in action. . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 

2.1. Overview of SplineSnake results. . . . . . . . . . . . . . . . . . . . . . 30 

2.2. SplineSnake interference. . . . . . . . . . . . . . . . . . . . . . . . . . . 30 

2.3. Overview of snake results. . . . . . . . . . . . . . . . . . . . . . . . . . 32 

3.1. Top view of camera layout used for recordings. . . . . . . . . . . . . . 41 

3.2. Two types of binary image regions applicable for the tracking algorithm. 43 

3.3. Function with intensity change, its first and second derivative. . . . . . 44 

3.4. Multi-stage canny edge detection process. . . . . . . . . . . . . . . . . 46 

3.5. Workflow of fetching individual pixels with Java2D. . . . . . . . . . . 47 

4.1. Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 

4.2. Overview of the basic application workflow . . . . . . . . . . . . . . . 54 

4.3. Angle calculation for raster subdivision. . . . . . . . . . . . . . . . . . 60 

88

List of Figures 

5.1. Tracking results for selective regions. . . . . . . . . . . . . . . . . . . . 66 

5.2. Tracking result for 6 area selections. . . . . . . . . . . . . . . . . . . . 67 

5.3. Good and bad results. . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 

5.4. Positions of the corners of the mouth. . . . . . . . . . . . . . . . . . . 69 

5.5. Preprocessing of mouth region. . . . . . . . . . . . . . . . . . . . . . . 70 

5.6. d 2 of subsequent video frames. . . . . . . . . . . . . . . . . . . . . . . 72 

5.7. Tracking time consumption. . . . . . . . . . . . . . . . . . . . . . . . . 77 

5.8. Preprocessing time consumption. . . . . . . . . . . . . . . . . . . . . . 78 

89

List of Tables 

1.1. Comparison of commercial products . . . . . . . . . . . . . . . . . . . 25 

2.1. SplineSnake parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . 29 

2.2. SplineSnake: Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 

2.3. Bodier snake parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 32 

2.4. Snake: Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 

3.1. JMF 2.1.1 - Supported Video Formats . . . . . . . . . . . . . . . . . . 38 

5.1. Mean position of x/y coordinates. . . . . . . . . . . . . . . . . . . . . . 70 

5.2. Minima and maxima of x/y coordinates. . . . . . . . . . . . . . . . . . 71 

5.3. Sum, maximum and average of d 2 on subsequent frames. . . . . . . . . 71 

5.4. Correlation results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 

5.5. Two-tailed paired samples t-test on d 2 . . . . . . . . . . . . . . . . . . . 73 

5.6. Sum, maximum and average of d 2 of different tester’s values. . . . . . 74 

5.7. Extrema of d 2 between the algorithm and each of the manual testers. . 75 

5.8. Contrast tests of d 2 between the algorithm and each of the manual testers. 75 

5.9. Tamhane post-hoc test on d 2 . . . . . . . . . . . . . . . . . . . . . . . . 76 

90

List of Procedures 

4.1. createBSPTree(levels) . . . . . . . . . . . . . . . . . . . . . . . . . . 56 

4.2. addPoint(x, y, val) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 

4.3. calculateValues() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 

4.4. subdivide(levels) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 

91

A. Appendix 

A.1. Evaluation Data: Coordinates of Corners of the Mouth 

man1 man2 algo 

frame x1 y1 x2 y2 x1 y1 x2 y2 x1 y1 x2 y2 

1 238 220 287 220 238 221 287 221 238 220 288 219 

2 239 220 287 220 238 221 287 221 237 220 288 219 

3 238 220 287 220 238 221 287 221 237 220 288 219 

4 238 220 287 220 238 221 287 221 237 220 288 219 

5 238 219 287 220 238 221 288 221 237 220 287 219 

6 239 219 285 220 240 221 287 221 239 219 287 219 

7 241 219 284 219 243 220 283 220 244 221 285 218 

8 246 220 282 220 246 220 282 220 245 221 285 219 

9 247 220 282 220 248 221 281 221 246 221 284 218 

10 247 221 282 221 249 221 281 221 246 220 282 219 

11 249 221 281 221 250 221 281 221 249 220 282 219 

12 249 221 281 221 250 222 280 222 250 221 282 219 

13 249 221 281 221 250 222 280 222 250 223 283 219 

14 248 221 281 221 250 222 280 222 250 223 282 219 

15 248 221 280 221 250 222 280 222 250 223 282 218 

16 249 221 280 221 250 222 280 222 251 224 282 218 

17 248 220 280 221 250 222 280 222 250 220 283 218 

18 248 221 280 221 250 222 280 222 250 221 283 218 

19 248 221 279 221 250 222 280 222 250 221 283 218 

20 249 221 281 221 250 222 280 222 250 224 283 218 

21 248 221 280 221 250 222 280 222 250 224 283 218 

22 248 221 280 221 250 222 280 222 250 221 283 218 

23 248 221 280 221 249 222 280 222 249 223 283 218 

24 249 221 280 221 249 222 280 222 248 222 283 218 

25 248 221 280 221 250 222 280 222 248 221 280 217 

26 248 221 280 221 250 222 279 222 249 221 281 218 

27 248 221 280 220 251 222 279 221 249 221 280 217 

28 249 221 280 221 250 222 280 221 249 220 280 217 

29 248 221 280 221 250 222 280 221 250 222 278 216 

30 248 221 280 221 250 222 280 221 250 222 280 217 

31 249 221 280 221 250 222 280 222 248 221 282 218 

32 249 220 280 221 250 222 280 222 251 223 279 217 

33 249 220 281 220 249 222 280 222 249 221 280 217 

34 247 220 281 220 248 221 281 222 246 221 281 216 

35 243 220 284 220 245 221 283 221 244 220 284 218 

36 241 219 286 219 242 220 284 220 244 220 287 218 

37 240 219 286 219 240 220 286 220 240 219 287 218 

38 239 218 286 218 239 219 287 219 238 219 288 218 

39 238 218 286 218 239 219 287 219 237 218 289 218 

40 239 218 286 219 238 219 287 219 237 218 288 218 

41 238 219 286 218 238 220 287 219 238 219 289 218 

42 239 219 286 219 238 220 287 219 237 219 289 218 

43 238 219 287 219 239 220 287 220 238 219 288 218 

44 238 219 286 219 239 220 287 220 237 219 288 218 

45 238 219 286 219 239 220 287 220 238 219 289 218 

46 239 219 286 219 240 220 287 220 238 219 288 219 

47 238 219 286 219 240 220 287 220 238 219 288 218 

48 239 219 286 219 240 220 287 220 238 220 288 218 

49 238 220 287 219 240 220 287 220 237 220 288 219 

50 238 220 287 219 240 220 287 220 238 220 287 219 

51 238 219 286 219 239 220 286 221 237 220 288 219 

52 238 219 286 219 238 221 286 221 238 220 288 219 

53 239 219 286 219 238 221 286 221 238 220 288 219 

54 239 219 287 219 238 221 286 221 238 220 288 219 

55 239 219 286 219 238 221 286 221 238 220 287 218 

56 239 219 286 220 238 221 286 221 237 220 287 219 

57 239 219 286 220 238 221 286 221 237 220 287 219 

58 239 219 286 220 238 221 286 221 238 220 287 219 

59 240 219 287 219 239 220 285 221 238 220 287 219 

60 240 219 285 219 240 220 285 221 240 220 286 218 

continued on next page.. 

92

A. Appendix 

man1 man2 algo 

frame x1 y1 x2 y2 x1 y1 x2 y2 x1 y1 x2 y2 

61 243 219 283 219 243 220 283 221 243 220 283 217 

62 246 220 280 219 246 220 282 221 245 220 282 218 

63 246 220 280 220 247 221 281 221 246 221 281 218 

64 247 220 280 220 248 221 280 221 249 221 281 218 

65 247 220 279 220 248 222 280 222 249 223 279 217 

66 248 220 280 220 248 222 280 222 248 222 279 217 

67 248 220 279 220 248 222 280 222 248 221 279 217 

68 247 220 279 220 248 222 280 222 247 220 279 217 

69 247 220 279 220 248 222 280 222 249 220 280 217 

70 247 220 279 220 248 222 280 222 249 222 279 217 

71 247 219 279 220 248 222 280 222 248 222 281 217 

72 247 220 279 221 248 222 279 222 248 222 278 216 

73 246 220 279 220 248 221 279 221 248 219 278 219 

74 247 220 280 220 248 221 279 221 248 221 280 217 

75 247 219 279 220 248 221 279 221 247 220 280 217 

76 249 220 280 220 248 221 280 221 247 220 282 217 

77 248 220 279 220 247 221 280 221 247 220 281 217 

78 248 219 280 219 247 221 280 221 247 220 281 217 

79 248 220 280 220 247 221 280 221 247 220 282 217 

80 247 219 279 220 247 221 280 221 247 220 280 216 

81 247 219 279 220 247 221 280 221 247 220 281 217 

82 247 220 279 220 247 221 280 221 247 220 282 217 

83 247 220 280 220 247 221 280 221 247 220 282 217 

84 248 219 280 220 247 221 280 221 247 220 282 217 

85 247 219 280 220 247 221 280 221 248 219 281 217 

86 249 220 279 220 246 221 281 221 248 219 280 216 

87 249 220 280 220 246 221 281 221 250 221 280 216 

88 247 219 280 220 246 221 282 221 250 222 280 216 

89 245 219 282 220 244 220 283 220 245 220 283 217 

90 243 219 284 219 242 220 284 220 243 219 285 217 

91 240 219 286 219 241 220 286 220 241 219 286 217 

92 239 219 286 218 240 220 287 220 239 219 288 217 

93 238 219 286 218 238 220 287 220 237 219 289 217 

94 238 218 286 218 238 220 287 220 237 219 289 218 

95 237 218 287 218 237 220 287 220 236 218 289 218 

96 238 218 286 218 237 219 288 219 236 218 289 217 

97 238 218 287 218 237 219 288 219 236 219 289 217 

98 236 218 287 218 237 219 288 219 236 219 289 217 

99 237 218 287 218 237 219 288 219 236 219 289 218 

100 237 218 287 218 237 219 288 219 236 219 289 218 

101 237 218 287 218 237 219 288 219 236 219 289 218 

102 237 218 287 218 237 219 288 219 235 220 289 218 

103 237 218 287 219 237 219 288 219 236 219 289 218 

104 237 218 287 218 237 219 288 219 236 219 289 218 

105 237 218 287 218 237 219 288 219 236 219 289 218 

106 236 218 288 218 237 219 288 219 236 219 288 218 

107 236 218 287 219 237 219 288 219 236 219 289 218 

108 238 218 287 218 237 219 288 219 236 219 289 218 

109 238 218 286 218 237 219 288 219 236 219 289 217 

110 238 218 287 218 237 219 288 219 236 219 289 218 

111 237 218 287 218 237 219 288 219 236 219 289 218 

112 237 218 286 218 237 219 288 219 236 219 289 218 

113 237 219 287 218 237 219 288 219 236 219 289 218 

114 238 218 286 218 237 219 288 219 236 219 289 217 

115 237 218 286 218 237 220 288 220 236 219 289 218 

116 238 219 286 218 237 220 288 220 236 219 289 218 

117 238 219 286 218 238 220 288 220 236 219 289 218 

118 240 219 284 219 240 220 285 220 239 219 286 218 

119 242 219 282 219 242 220 283 220 241 220 285 218 

120 245 220 280 219 244 221 281 221 244 220 282 218 

121 245 220 279 220 245 221 279 221 245 221 278 218 

122 246 220 278 220 245 221 278 221 245 221 278 217 

123 246 220 279 220 245 221 278 221 246 219 279 217 

124 246 220 278 220 245 221 278 221 246 220 277 216 

125 247 220 278 220 245 221 278 221 246 220 277 216 

126 247 220 278 220 245 221 279 221 246 220 278 216 

127 247 220 278 220 245 221 279 221 246 219 277 216 

128 246 219 279 220 245 221 279 221 246 219 280 216 

129 246 219 279 220 245 221 279 221 247 219 280 216 

130 247 219 279 220 245 221 279 221 247 218 282 216 

131 247 219 279 219 246 221 279 221 247 218 281 216 

132 245 219 280 219 246 221 279 221 247 218 281 216 

133 248 219 279 219 246 221 279 221 247 218 281 216 

134 246 219 279 219 246 221 279 221 247 218 280 215 

135 247 219 279 219 246 221 279 221 246 219 277 215 

136 247 219 279 219 246 221 279 221 246 220 277 215 

93

A. Appendix 

A.2. Evaluation Data: Tracking Mouth Area Selection (60x20) 

We performed 20 test-runs with the implemented Java feature tracker, selecting a 

mouth region with 60x20 pixels. 10 runs were done with standard JVM options, 10 

were done with the following tuning: -verbose:gc -Xms64m -Xmx256m -XX:NewRatio=2 

-XX:+UseConcMarkSweepGC. 

The measurement unit is milliseconds. 

frame JVM tuned JVM standard options 

1 27 48 26 26 52 69 70 69 28 69 46 31 50 32 30 24 45 46 48 47 

2 19 13 25 33 13 13 15 33 13 33 29 31 29 29 28 30 33 17 20 29 

3 22 24 39 38 22 24 22 36 21 45 32 31 32 33 33 33 34 27 23 32 

4 8 8 11 12 8 8 7 12 8 12 10 10 11 10 9 10 10 8 8 12 

5 34 29 22 22 30 26 19 22 29 22 25 17 24 24 18 24 25 18 17 22 

6 7 7 10 8 12 10 6 8 11 11 7 7 8 7 8 7 7 18 8 7 

7 5 6 6 6 5 6 7 5 5 6 6 6 5 6 4 5 5 6 7 5 

8 20 18 6 6 36 7 7 6 7 8 6 6 17 6 6 17 5 6 6 6 

9 6 6 20 7 6 8 6 8 5 10 7 6 8 6 7 8 7 8 8 6 

10 10 6 9 11 10 12 9 10 10 11 10 9 9 19 9 10 9 9 9 9 

11 54 12 9 7 16 9 7 7 7 40 9 7 9 9 10 9 9 8 8 9 

12 6 6 7 5 4 8 6 4 5 7 5 6 5 5 6 4 5 7 5 5 

13 10 9 5 8 10 15 9 23 9 9 9 9 9 10 9 9 8 9 9 10 

14 28 6 6 6 7 7 4 8 5 5 8 5 10 6 6 8 6 7 7 6 

15 3 3 9 3 4 4 4 4 2 6 3 4 4 4 4 4 3 4 4 4 

16 5 16 4 3 3 4 3 4 18 4 4 3 4 3 4 4 4 3 3 4 

17 8 4 4 4 17 4 4 4 4 3 4 5 4 4 5 4 4 4 4 4 

18 3 3 19 2 2 2 2 3 3 5 3 2 3 2 3 3 2 3 2 3 

19 3 3 3 3 2 3 3 3 3 5 3 3 3 2 3 3 3 3 3 3 

20 5 3 4 3 17 2 2 2 4 4 4 4 6 4 4 3 4 4 4 4 

21 5 3 6 17 3 3 4 3 3 6 3 4 3 3 3 3 3 3 3 2 

22 4 3 4 4 4 4 18 3 4 5 4 4 4 3 5 4 4 4 4 4 

23 8 9 4 4 4 18 5 18 5 5 6 6 5 5 5 5 7 7 6 5 

24 5 6 6 5 5 4 6 5 4 19 5 6 6 6 6 6 5 4 5 6 

25 5 22 21 7 20 21 5 21 4 4 8 8 8 8 8 9 8 8 8 8 

26 5 4 2 4 2 3 4 2 3 2 2 2 2 2 2 3 3 2 2 3 

27 3 2 5 3 2 3 3 2 3 4 2 2 3 3 3 2 3 3 2 2 

28 3 2 3 2 2 2 3 7 3 2 3 2 2 3 2 3 3 3 3 2 

29 5 2 2 2 7 2 3 2 2 3 3 3 2 3 2 2 2 3 3 2 

30 3 3 56 3 15 5 16 3 16 25 3 3 3 3 3 3 3 3 3 3 

31 6 5 2 3 2 3 3 3 6 2 3 3 3 3 2 2 3 2 2 2 

32 6 3 2 3 3 4 9 4 3 4 3 4 3 3 3 3 3 3 3 3 

33 5 6 8 6 6 19 2 7 4 6 6 16 6 7 6 7 7 6 7 6 

34 3 3 3 3 2 3 3 3 16 3 3 6 3 3 3 3 3 3 3 3 

35 6 4 3 4 17 3 20 17 3 4 4 4 3 4 4 3 3 4 4 4 

36 3 3 6 5 3 3 2 3 17 6 4 3 4 4 3 3 3 4 4 4 

37 3 4 4 3 6 3 3 2 3 3 3 4 3 3 3 3 4 3 3 3 

38 5 3 3 3 3 3 3 3 3 18 3 2 3 3 3 3 3 4 3 3 

39 4 3 6 3 3 3 3 4 6 6 4 3 3 2 3 3 3 3 3 2 

40 6 3 19 3 4 3 3 4 3 3 3 3 3 4 3 4 2 3 3 3 

41 5 3 2 3 3 2 6 3 2 3 14 3 3 2 3 4 2 3 3 3 

42 3 3 5 3 3 4 2 3 3 5 3 3 2 3 3 3 3 3 3 3 

43 2 3 2 2 2 2 2 3 2 2 2 3 2 2 2 2 2 3 2 2 

44 17 16 2 3 3 3 3 3 3 3 3 2 3 2 2 3 3 3 3 2 

45 3 3 6 3 3 3 3 3 3 5 3 2 3 2 2 3 3 2 3 3 

46 3 3 4 2 3 3 4 2 3 2 3 3 2 3 3 3 2 2 2 2 

47 4 2 2 2 2 2 2 2 2 2 2 4 2 2 2 3 2 2 2 2 

48 2 2 4 15 2 2 3 2 2 4 2 3 2 3 2 2 2 2 2 2 

49 2 3 8 8 7 8 14 2 7 7 7 8 7 8 7 8 7 8 8 8 

50 4 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 3 

51 3 2 17 2 2 2 2 2 2 18 2 3 2 2 2 2 2 2 2 2 

52 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 1 2 2 2 

53 4 2 2 3 2 2 3 2 2 20 3 2 2 2 1 1 2 3 2 1 

54 6 3 5 3 3 2 2 3 3 5 3 3 3 3 2 3 3 3 2 3 

55 2 2 2 2 2 2 17 2 2 2 2 3 2 2 3 2 2 2 2 2 

56 5 2 1 2 2 2 3 2 3 2 2 3 1 2 2 3 2 2 2 2 

57 3 2 4 1 2 2 3 2 14 3 2 3 2 1 3 2 2 3 2 2 

58 2 15 16 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 

59 4 2 2 2 2 2 2 2 1 4 1 2 2 2 2 2 2 2 2 2 

60 2 2 4 2 2 2 2 2 2 16 3 2 2 2 2 2 2 2 2 2 

61 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 


94

A. Appendix 


62 6 4 18 2 3 3 2 4 3 3 3 3 2 3 2 3 2 3 3 2 

63 4 4 6 3 3 4 3 4 3 19 3 3 3 2 4 3 3 3 4 4 

64 3 2 3 3 16 15 2 3 15 2 2 2 3 3 3 2 2 3 3 3 

65 21 2 14 2 5 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2 

66 2 2 4 2 2 16 2 2 2 4 2 2 2 2 1 2 2 1 2 2 

67 2 2 1 2 2 2 3 2 2 2 2 3 1 2 2 1 2 2 2 1 

68 4 2 2 2 2 2 1 2 2 2 3 2 2 1 1 2 2 1 2 2 

69 2 15 7 2 3 15 2 2 1 17 2 3 2 2 2 2 1 2 2 2 

70 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 1 1 

71 4 8 1 2 2 2 2 8 2 2 2 2 2 2 2 2 3 1 2 2 

72 2 3 5 3 3 2 2 2 3 17 3 1 3 3 3 2 3 3 2 3 

73 15 1 2 2 2 2 2 2 2 2 2 3 2 3 1 2 2 2 2 2 

74 4 1 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 

75 16 16 5 4 3 3 2 3 3 6 3 7 3 3 3 3 3 3 3 3 

76 2 2 4 2 17 3 16 3 4 5 2 5 2 2 1 2 2 2 2 1 

77 4 5 2 2 2 2 2 2 2 1 4 2 4 1 4 4 8 4 5 4 

78 2 3 4 3 3 2 1 2 3 17 3 2 2 2 2 3 3 3 3 3 

79 2 2 1 2 1 2 2 1 2 3 2 1 2 2 2 3 2 2 2 2 

80 4 2 1 2 1 1 2 1 3 1 1 2 2 2 2 1 1 3 2 2 

81 3 3 5 6 2 3 2 3 2 5 3 2 3 3 2 3 3 3 3 2 

82 3 2 3 2 18 3 2 2 2 3 3 3 2 2 3 3 3 2 2 2 

83 4 2 2 3 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 

84 14 2 4 1 2 2 2 1 2 4 2 2 2 2 2 2 2 2 2 2 

85 6 3 3 3 3 3 4 3 3 3 3 4 3 4 3 3 3 3 3 3 

86 4 2 2 2 2 15 8 2 1 2 2 2 2 1 2 2 2 2 2 2 

87 3 3 5 2 2 3 2 4 2 5 3 3 3 3 3 2 2 2 2 3 

88 2 2 2 1 1 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 

89 19 3 3 4 3 4 3 4 3 5 3 3 3 3 4 3 3 4 3 5 

90 2 2 4 2 2 2 2 2 2 4 2 2 2 2 2 1 2 3 2 1 

91 3 3 5 3 3 3 3 3 3 3 4 3 3 4 3 3 3 3 4 3 

92 17 2 2 3 2 3 2 2 3 2 4 4 3 2 2 2 2 4 3 4 

93 1 1 3 2 2 7 2 2 2 4 2 2 2 2 2 2 2 2 1 2 

94 4 3 2 3 3 3 6 4 3 15 3 7 3 6 5 3 3 3 4 3 

95 4 2 2 2 2 2 2 2 3 2 2 2 2 3 2 2 2 2 2 2 

96 2 2 4 3 2 2 17 2 2 5 3 3 2 2 3 7 3 2 3 3 

97 4 3 3 5 4 3 3 3 3 3 4 3 3 3 3 3 3 3 4 3 

98 17 4 4 5 3 16 2 3 4 4 3 2 3 3 3 6 2 2 3 3 

99 2 15 4 2 2 2 1 2 2 4 2 2 2 2 2 2 2 2 2 2 

100 2 2 15 3 2 3 2 2 3 6 3 2 2 2 4 2 3 3 3 3 

101 6 3 3 3 2 4 3 2 3 3 5 3 3 5 4 5 4 4 4 4 

102 2 2 4 2 2 2 1 2 2 4 2 2 2 2 2 2 3 3 3 2 

103 3 3 3 3 3 3 3 2 3 3 5 2 2 2 2 3 3 6 5 3 

104 4 1 2 2 2 2 2 2 2 3 2 2 2 1 2 2 1 2 2 2 

105 2 2 4 1 2 2 2 2 2 4 2 2 2 2 2 2 1 2 2 2 

106 14 1 2 2 2 2 2 2 15 2 2 2 2 2 2 2 2 2 2 1 

107 4 2 2 2 2 2 2 1 2 1 2 2 2 2 2 2 2 2 2 1 

108 2 2 3 2 2 2 2 2 2 17 2 2 2 2 1 1 2 1 2 2 

109 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 

110 4 2 2 2 3 1 2 2 2 2 2 2 1 2 2 2 2 2 3 1 

111 3 3 4 3 15 3 3 2 2 4 2 3 3 3 3 2 2 2 3 4 

112 2 1 1 2 1 2 16 2 2 2 2 2 1 1 2 2 2 2 2 1 

113 4 2 3 2 3 2 2 3 2 2 1 4 3 1 4 2 4 2 2 2 

114 2 2 4 1 1 2 2 2 2 4 1 2 2 2 2 3 2 2 2 1 

115 3 3 3 2 4 20 5 4 3 3 3 3 3 2 3 3 3 3 3 3 

116 5 3 3 4 3 3 18 3 4 3 4 4 4 5 5 3 3 4 3 3 

117 1 2 4 2 1 1 1 2 1 4 2 2 2 3 2 2 2 2 2 2 

118 2 2 6 2 2 2 1 2 2 2 2 2 1 1 2 2 2 2 2 1 

119 3 2 2 2 2 2 1 2 1 2 2 2 3 2 2 2 2 3 2 2 

120 16 3 5 3 3 3 15 3 3 5 3 2 3 3 3 3 4 4 4 3 

121 2 19 2 2 3 3 2 3 3 4 4 3 3 2 2 3 2 3 3 2 

122 4 2 2 2 1 1 3 2 2 2 2 3 2 2 2 1 2 2 2 2 

123 3 3 19 3 3 3 3 3 3 5 3 2 3 3 3 5 3 4 3 3 

124 3 16 3 3 3 3 2 3 3 5 3 3 3 3 3 4 3 4 3 3 

125 18 15 3 3 3 2 3 15 3 2 3 2 2 4 2 3 2 2 2 3 

126 2 2 4 2 2 1 3 2 1 4 2 3 2 2 2 1 2 1 2 1 

127 3 3 3 3 3 3 3 3 3 3 3 3 2 4 4 7 3 3 7 3 

128 47 2 3 3 3 3 2 3 3 2 2 2 3 2 3 3 2 3 3 3 

129 2 2 4 2 2 2 2 2 2 17 2 2 2 2 2 2 2 2 2 2 

130 4 3 4 3 3 4 2 3 3 4 3 2 3 4 3 3 5 4 4 4 

131 7 2 2 3 3 15 2 3 3 3 2 3 2 2 3 3 2 2 3 2 

132 2 1 15 2 2 2 2 2 1 3 2 3 2 2 2 2 2 2 3 2 

133 3 7 3 2 3 3 3 2 4 3 3 3 3 3 3 14 3 3 5 3 

134 6 4 3 4 17 16 2 16 17 3 3 2 4 3 3 5 4 5 4 3 

135 2 1 4 2 2 2 1 2 2 4 2 2 2 2 2 2 1 2 2 2 

136 2 2 3 2 2 15 2 1 14 2 2 2 2 2 2 1 2 1 2 2 

95

A. Appendix 

A.3. Evaluation Data: Tracking of Whole Area Selection (384x288) 

We performed 20 test-runs with the implemented Java feature tracker, selecting the 

complete image region with 384x288 pixels. 10 runs were done with standard JVM 

options, 10 were done with the following tuning: -verbose:gc -Xms64m -Xmx256m 

-XX:NewRatio=2 -XX:+UseConcMarkSweepGC. 



1 55 61 135 112 57 126 147 133 120 108 71 117 128 67 67 69 71 124 133 70 

2 35 75 66 60 28 75 33 53 31 69 59 59 33 37 38 58 34 58 37 33 

3 32 49 53 53 30 49 38 58 31 56 47 49 42 43 43 49 44 47 42 42 

4 31 44 57 48 39 52 29 49 37 53 43 44 44 43 43 38 44 38 44 43 

5 24 34 31 30 23 29 28 35 27 28 39 35 29 29 40 35 29 37 29 35 

6 34 48 27 28 24 27 25 29 38 27 33 33 31 29 36 33 32 33 28 32 

7 25 25 29 24 23 27 27 24 25 31 31 29 32 32 34 28 28 32 32 27 

8 26 33 42 29 26 26 27 27 26 23 26 30 30 36 30 29 29 29 29 29 

9 27 21 24 21 32 20 21 21 22 21 32 28 28 37 37 28 29 28 28 27 

10 30 25 30 28 24 29 27 32 28 23 127 126 127 142 128 124 133 124 125 124 

11 21 25 24 23 26 21 24 20 23 20 31 30 24 30 34 36 30 25 29 30 

12 25 24 25 24 23 26 25 23 24 24 30 25 24 31 30 32 31 28 24 33 

13 30 26 26 23 39 24 25 31 26 26 36 36 37 37 36 36 38 39 36 35 

14 25 31 26 43 17 24 23 22 18 22 123 134 122 126 121 127 123 128 126 123 

15 23 20 20 20 26 20 21 20 21 19 31 30 29 28 28 29 30 29 29 30 

16 23 23 24 28 25 22 23 22 24 22 29 30 31 30 22 31 30 29 35 30 

17 19 23 23 19 39 19 21 23 19 18 25 26 34 30 31 30 25 25 25 25 

18 24 23 18 27 18 17 23 18 23 18 123 123 138 123 140 122 123 123 127 130 

19 23 20 24 24 23 20 21 21 21 22 29 31 35 30 29 31 30 33 30 29 

20 29 25 24 22 23 24 27 23 27 24 56 31 22 30 29 30 30 30 24 34 

21 26 18 20 19 27 18 19 22 19 18 23 24 25 25 26 24 27 24 23 23 

22 19 23 18 22 19 17 18 18 17 28 123 123 122 123 130 123 122 123 123 124 

23 26 24 24 24 25 21 22 25 22 37 36 31 30 32 30 37 25 29 34 30 

24 25 24 26 23 26 23 26 24 25 25 31 31 30 34 31 23 25 30 29 31 

25 31 22 23 19 32 22 22 26 22 27 39 38 39 38 39 38 39 39 37 38 

26 21 22 19 18 17 24 20 30 18 17 123 123 133 126 122 125 125 133 123 124 

27 22 20 22 22 21 18 19 34 20 20 28 29 23 29 28 28 28 27 29 28 

28 20 17 29 26 29 21 25 22 17 25 21 29 21 21 21 22 23 21 21 20 

29 17 18 19 20 16 16 19 16 16 16 25 24 25 23 26 24 24 22 24 22 

30 21 19 23 17 16 16 18 17 18 16 142 125 135 129 123 138 134 133 145 136 

31 22 26 23 21 19 20 27 20 22 20 28 28 28 28 29 28 29 29 28 28 

32 25 20 22 29 21 25 16 22 17 25 20 29 20 22 20 22 21 22 22 32 

33 18 27 16 18 23 16 21 16 17 16 22 24 27 23 22 23 23 27 24 24 

34 17 24 23 33 29 18 20 22 20 17 137 135 122 126 123 124 135 124 124 134 

35 23 23 23 27 18 21 22 21 22 20 28 29 28 29 30 28 28 28 28 27 

36 24 19 22 28 33 21 16 28 16 21 20 21 22 20 22 21 23 21 20 20 

37 19 16 17 17 16 16 16 17 17 16 22 23 23 22 22 23 23 23 28 23 

38 17 22 18 18 20 16 20 16 20 17 135 123 164 134 125 135 142 133 121 134 

39 22 24 19 20 18 19 17 19 17 20 28 28 28 28 28 27 28 27 27 28 

40 22 23 17 21 22 17 17 16 17 16 20 21 21 28 27 21 28 29 21 21 

41 17 19 16 20 16 16 18 16 23 16 23 23 23 23 23 22 23 23 22 22 

42 16 22 18 18 16 18 23 22 24 23 124 130 134 121 133 123 134 133 124 140 

43 22 17 23 26 23 25 16 25 16 22 22 28 28 28 27 22 28 27 28 27 

44 17 19 20 17 20 16 21 17 21 16 20 22 27 21 20 21 21 27 28 20 

45 17 19 31 25 16 17 33 16 34 17 22 28 23 23 23 23 28 22 23 22 

46 19 17 20 37 19 19 16 19 17 19 123 129 134 133 134 134 128 122 134 136 

47 23 17 22 18 26 17 17 20 17 21 28 29 28 27 29 27 28 27 27 28 

48 27 20 18 18 20 17 19 17 19 17 20 22 21 33 20 21 21 27 28 21 

49 17 27 27 28 27 23 29 23 27 23 30 24 23 22 22 23 23 22 23 22 

50 23 20 23 22 22 22 17 22 16 22 124 123 123 122 126 122 122 123 134 140 

51 25 21 17 17 26 16 16 16 17 16 28 31 28 27 27 28 28 32 28 27 

52 17 19 20 18 17 16 19 17 19 16 28 29 27 28 29 28 29 27 21 33 

53 22 26 19 22 17 19 22 18 31 18 28 28 23 23 23 23 23 23 23 23 

54 20 16 21 21 19 20 17 21 17 21 123 123 126 123 129 122 126 136 134 123 

55 22 16 18 18 25 17 16 21 17 17 27 28 28 35 29 28 29 32 52 27 

56 20 22 20 22 16 21 23 17 20 21 27 28 28 28 28 21 28 34 22 28 

57 18 23 19 19 21 18 22 19 22 18 24 23 23 23 23 32 24 22 29 34 

58 20 25 21 21 23 21 17 20 16 20 136 129 128 130 123 129 138 128 130 124 

59 23 26 17 17 22 16 17 20 17 16 29 27 28 27 33 28 29 27 27 27 

60 17 17 21 20 16 16 20 21 20 16 28 28 33 28 21 28 34 28 27 30 

61 21 18 19 18 24 18 22 18 22 19 22 23 23 23 28 22 23 22 22 22 


96

A. Appendix 


62 26 20 24 20 20 20 17 23 16 20 134 123 122 129 123 130 135 123 138 134 

63 23 20 18 18 22 16 17 17 16 17 27 28 26 27 27 28 28 28 28 28 

64 17 17 17 20 16 17 19 16 19 17 21 28 29 28 27 27 27 28 21 28 

65 19 25 19 19 25 18 21 18 22 17 22 22 23 22 25 22 24 23 23 23 

66 20 23 25 21 19 20 16 21 17 21 137 135 139 124 134 122 141 129 123 138 

67 25 22 17 17 22 17 17 20 16 17 28 28 28 28 31 28 28 27 28 28 

68 20 17 17 21 20 17 23 16 23 16 29 29 30 29 34 29 30 30 29 29 

69 17 18 19 20 17 19 21 19 22 19 23 23 26 23 23 22 23 23 23 22 

70 21 21 24 21 19 20 17 37 16 20 124 126 122 123 122 122 123 122 124 125 

71 27 17 17 21 26 16 17 17 16 16 34 28 27 27 27 28 28 27 27 31 

72 20 21 17 17 20 16 19 17 19 17 29 30 30 29 29 29 30 30 28 29 

73 20 18 18 18 21 18 22 22 22 22 22 22 22 22 22 22 23 22 22 21 

74 20 22 22 22 24 22 17 21 17 21 135 141 134 134 135 134 125 133 139 136 

75 22 16 33 21 22 15 25 16 17 16 27 27 32 28 29 27 27 28 28 27 

76 17 21 17 22 17 16 19 24 20 17 27 28 27 32 21 32 28 22 27 28 

77 21 20 19 21 25 19 21 19 22 19 22 24 23 22 23 23 23 22 23 23 

78 20 25 26 30 19 26 17 21 16 21 124 124 135 123 131 138 134 123 124 130 

79 25 16 17 17 23 17 17 17 16 16 27 28 27 27 27 28 29 30 28 27 

80 17 22 22 21 16 17 20 25 20 16 30 30 30 29 29 29 30 29 29 29 

81 17 20 24 20 17 18 21 20 22 20 22 28 23 23 23 23 23 22 22 23 

82 20 22 22 23 19 25 16 22 16 25 124 125 129 123 129 124 134 136 123 123 

83 26 17 17 17 22 16 17 20 18 18 27 27 28 29 28 28 29 28 28 27 

84 16 27 21 23 16 17 19 22 20 21 28 30 29 30 29 29 30 29 30 29 

85 16 19 22 19 21 19 22 18 21 18 21 22 22 22 22 22 22 22 23 22 

86 19 24 17 16 19 15 16 16 16 16 142 138 135 136 136 134 133 140 136 135 

87 17 17 16 22 18 15 16 21 24 16 28 27 27 31 30 27 28 31 27 27 

88 17 18 28 18 20 17 19 21 18 17 21 28 21 20 21 20 21 21 21 20 

89 18 23 25 22 21 21 16 21 16 22 22 23 22 22 22 22 22 22 29 26 

90 22 16 20 19 21 15 17 16 16 16 134 135 137 141 135 145 138 133 129 123 

91 16 17 18 17 17 16 18 16 19 16 26 27 28 27 28 21 27 27 27 22 

92 17 17 17 20 16 18 20 18 20 18 19 21 21 20 21 20 20 20 20 20 

93 27 25 21 21 19 21 19 21 19 21 21 23 22 22 23 23 22 22 22 21 

94 17 22 17 16 16 16 16 15 16 21 135 135 122 139 135 140 135 135 133 134 

95 17 21 16 25 16 17 18 16 18 17 27 29 27 22 28 28 31 27 28 27 

96 18 22 23 19 18 18 23 19 20 18 20 21 20 20 20 20 21 20 20 28 

97 20 16 20 17 23 15 16 15 17 21 21 23 21 22 27 22 23 23 21 27 

98 17 16 16 16 20 15 16 19 16 15 136 135 135 122 122 139 141 133 128 134 

99 17 20 19 22 17 18 19 18 18 22 27 27 31 28 28 28 27 28 27 29 

100 18 27 20 20 19 20 21 20 21 19 27 29 32 27 28 28 28 28 27 27 

101 21 17 18 17 25 16 17 17 17 22 22 23 23 23 23 23 23 22 22 25 

102 21 16 17 17 19 16 17 20 17 16 135 133 123 138 134 122 123 124 137 124 

103 21 19 18 23 17 19 18 18 19 17 28 28 27 22 28 27 29 28 28 27 

104 19 24 21 36 18 20 21 21 20 19 29 30 30 30 30 29 29 29 30 29 

105 22 20 18 17 20 16 23 16 17 22 22 23 23 34 23 22 23 32 23 22 

106 17 16 16 17 17 17 16 20 16 16 135 129 135 127 138 122 123 123 123 141 

107 17 22 22 19 17 18 18 19 18 16 27 28 28 27 27 28 28 28 32 34 

108 20 24 21 21 18 20 20 20 21 20 29 30 29 29 29 34 30 29 29 29 

109 32 19 17 38 24 16 16 17 17 22 22 29 23 23 23 23 23 23 23 22 

110 19 16 17 17 17 16 16 21 16 17 124 126 136 135 125 123 135 140 135 127 

111 20 18 19 19 17 18 18 18 19 16 27 28 28 27 27 28 28 27 28 28 

112 19 20 20 21 19 20 20 22 20 18 29 30 29 29 29 29 29 29 29 29 

113 21 21 18 17 24 16 17 16 17 20 22 23 22 22 23 22 22 22 22 23 

114 17 16 18 17 17 17 16 20 17 16 136 136 129 135 129 135 123 140 124 127 

115 18 18 23 19 17 19 18 18 19 16 27 28 28 27 28 35 28 27 27 27 

116 22 31 21 20 19 20 21 21 21 18 29 34 30 29 29 29 30 29 29 29 

117 24 16 16 17 21 16 17 17 17 20 22 23 23 23 23 22 22 23 22 44 

118 17 16 17 17 16 16 16 21 16 29 135 138 135 142 138 135 134 138 140 126 

119 17 19 23 19 16 18 19 19 19 22 27 27 28 27 27 30 27 27 27 27 

120 19 24 21 21 18 20 20 20 21 18 29 29 28 29 29 29 29 29 29 33 

121 21 16 18 21 21 16 16 16 17 20 22 23 23 22 23 23 23 27 22 23 

122 16 17 16 18 17 16 16 20 17 17 134 134 123 124 135 139 136 123 124 126 

123 17 17 18 19 16 18 18 18 19 16 22 27 26 29 22 22 23 27 27 26 

124 18 21 27 22 17 21 22 21 22 18 20 21 20 27 20 20 21 27 20 19 

125 21 17 20 17 21 15 16 17 16 21 21 22 22 23 23 22 23 22 23 23 

126 17 20 21 17 19 16 16 24 16 16 136 134 134 135 133 133 141 133 134 133 

127 20 19 20 22 23 18 19 35 18 16 22 27 22 22 23 27 23 29 22 26 

128 19 15 17 18 18 16 17 16 16 18 20 20 20 20 21 20 27 20 20 20 

129 16 16 19 16 16 15 16 16 16 21 21 22 23 22 22 26 23 22 22 22 

130 20 18 21 18 17 18 18 18 17 16 136 136 133 134 135 135 133 122 134 136 

131 22 27 25 22 18 22 21 26 21 15 22 22 22 22 22 27 28 27 23 26 

132 23 22 16 24 21 15 16 15 16 19 20 24 27 20 20 20 21 21 20 23 

133 17 16 16 21 16 16 20 16 16 15 21 22 22 22 22 23 23 22 26 21 

134 17 18 23 19 16 18 18 18 19 15 135 141 133 140 134 134 135 140 135 134 

135 19 16 17 17 18 17 16 17 17 18 27 27 27 28 28 27 27 27 27 27 

136 20 17 17 17 23 17 17 17 16 21 27 27 27 32 30 27 29 31 31 27 

97

A. Appendix 

A.4. Evaluation Data: Canny Preprocessing 

We performed 20 test-runs with the class PreprocessingTimeTest (which is responsible 

for the wrong frame order). 10 runs were done with standard JVM options, 10 were 

done with the following tuning: -verbose:gc -Xms64m -Xmx256m -XX:NewRatio=2 




3 202 205 229 217 225 203 218 222 214 202 343 412 345 352 345 348 342 346 358 357 

4 157 164 173 156 156 157 161 156 153 160 242 237 248 231 251 245 258 250 244 247 

5 147 152 147 148 148 152 148 152 149 148 167 212 170 170 176 170 170 165 164 185 

6 151 144 144 152 145 145 145 145 155 144 227 228 236 229 235 236 226 224 230 232 

7 152 151 149 153 159 148 152 148 149 148 226 221 220 222 220 222 230 223 222 217 

8 155 154 155 153 150 156 161 151 155 152 234 218 210 207 210 209 215 213 210 207 

9 144 558 145 148 146 146 144 150 144 148 214 210 213 208 194 218 210 207 209 207 

10 147 163 147 152 148 155 147 148 148 152 224 212 230 217 197 219 211 206 216 221 

1 152 152 146 151 151 148 146 146 146 148 211 212 213 208 216 216 214 210 211 209 

11 230 367 226 231 230 229 227 231 245 230 209 208 214 211 211 211 234 213 210 214 

12 151 273 150 147 147 148 152 148 148 147 210 210 216 213 211 208 214 208 210 209 

13 145 229 145 150 145 146 145 152 151 149 214 212 212 213 212 227 215 215 213 221 

14 145 203 144 144 143 145 144 145 143 148 211 208 210 212 192 211 216 209 213 223 

15 146 166 147 149 152 147 146 153 145 148 208 210 207 206 185 206 216 216 206 207 

16 152 146 146 151 149 147 153 147 147 149 206 212 214 206 213 207 210 207 213 206 

17 146 211 146 143 148 143 145 142 143 143 214 212 212 213 214 214 188 206 222 217 

18 150 160 150 151 144 144 148 145 149 145 208 207 214 208 213 207 191 208 206 208 

19 146 172 144 145 144 151 146 152 145 146 207 206 207 212 212 211 221 211 206 207 

20 147 148 148 177 148 149 146 153 147 151 208 207 209 209 207 208 211 209 210 210 

21 141 154 141 146 147 142 141 141 144 142 217 212 218 214 218 213 223 207 242 213 

22 146 171 148 151 146 146 146 145 168 145 214 211 208 207 208 213 216 209 206 207 

23 148 159 147 149 148 143 145 143 146 143 208 187 186 213 207 206 212 212 221 207 

24 155 376 157 148 147 147 151 148 157 148 208 187 189 207 207 207 208 206 208 213 

25 151 365 147 152 149 154 148 153 149 149 230 205 226 213 214 214 210 210 211 215 

26 143 532 138 144 143 149 147 144 231 145 209 208 211 208 207 186 210 209 185 208 

27 209 372 283 223 219 210 210 209 435 210 220 208 208 215 209 185 216 217 194 194 

28 177 345 173 149 151 146 147 146 156 146 210 208 211 213 208 218 208 209 221 194 

29 157 331 165 149 149 148 154 149 167 148 214 218 216 214 216 217 213 209 213 192 

30 143 316 142 142 143 144 142 148 142 145 209 208 208 208 214 207 214 209 212 224 

31 145 291 146 146 147 147 147 150 147 152 211 209 210 208 208 208 212 225 214 209 

32 145 259 146 147 173 150 145 144 147 144 209 207 209 215 209 210 209 207 208 214 

33 149 255 152 153 153 148 147 151 171 148 231 220 218 216 216 215 209 218 214 216 

34 152 239 149 156 154 148 150 149 148 148 213 209 209 209 207 207 212 209 208 209 

35 148 234 144 143 144 143 149 144 143 144 209 209 214 208 209 208 215 214 211 209 

36 150 191 189 146 145 146 153 156 146 145 211 212 210 209 208 212 208 211 210 209 

37 145 163 207 145 145 152 145 150 146 149 220 216 216 217 215 212 211 209 213 217 

38 147 171 199 151 149 148 147 148 148 150 210 209 213 207 208 212 212 208 207 208 

39 142 246 143 146 142 147 142 141 148 145 208 208 208 207 209 208 213 217 207 208 

40 153 150 148 146 154 147 149 163 147 146 209 207 207 207 209 208 207 207 189 212 

41 144 144 143 149 144 146 151 144 142 150 214 216 218 215 215 219 217 207 193 214 

42 148 147 298 147 148 150 148 176 148 144 208 208 207 212 210 193 210 212 223 212 

43 212 148 155 217 220 219 213 220 213 213 207 208 208 207 212 186 218 213 208 208 

44 154 173 193 152 144 148 147 147 152 149 209 211 213 209 208 185 212 227 207 206 

45 147 146 144 145 150 145 146 146 145 145 217 216 192 215 217 225 208 208 221 214 

46 150 151 145 146 145 145 150 147 145 145 209 209 186 207 211 214 211 210 208 208 

47 148 148 146 152 152 152 148 153 174 148 207 209 209 212 208 228 214 212 207 212 

48 142 141 144 143 142 147 144 142 143 146 213 208 237 209 208 186 212 208 207 208 

49 145 145 151 148 146 146 145 144 145 145 222 215 222 213 194 188 207 216 215 213 

50 144 158 146 148 143 143 144 148 145 143 209 207 207 211 184 212 214 210 211 212 

51 152 194 146 148 152 147 150 148 149 149 209 215 210 208 199 214 213 215 208 209 

52 157 267 149 149 149 149 152 149 149 148 212 208 208 209 207 217 209 208 212 186 

53 148 266 167 142 143 142 143 147 143 144 214 217 216 219 214 209 208 208 216 189 

54 147 157 151 145 144 150 144 145 146 154 210 214 214 209 210 212 210 210 208 227 

55 144 151 146 152 155 148 145 139 152 147 212 209 209 210 208 212 213 230 208 208 

56 153 149 150 155 151 148 150 148 148 149 212 207 208 210 207 211 208 194 208 186 

57 147 139 208 143 143 142 143 147 142 143 215 217 218 215 217 209 214 185 216 187 

58 212 147 146 214 214 213 222 218 241 217 210 209 217 210 209 212 211 186 210 207 

59 145 145 142 144 144 146 146 144 146 149 209 216 209 218 209 210 220 245 209 209 

60 153 149 153 151 148 154 148 149 154 148 213 210 213 209 207 212 209 194 208 186 

61 150 149 171 153 150 149 148 148 150 148 218 217 217 216 211 209 208 186 224 187 

62 148 144 142 149 149 142 144 145 144 145 209 211 209 209 218 213 215 186 208 222 


98

A. Appendix 


63 150 146 144 145 152 150 150 148 146 145 212 209 209 208 210 210 191 237 208 186 

64 145 146 146 145 145 146 146 153 146 146 212 208 207 209 208 208 186 208 207 187 

65 148 149 153 149 148 154 148 151 149 152 214 217 217 215 212 209 210 208 216 217 

66 142 214 135 145 142 141 141 141 147 142 209 209 208 208 208 214 207 186 209 211 

67 147 149 143 151 146 150 146 145 146 145 208 214 214 213 209 212 230 188 213 211 

68 147 144 144 144 149 143 147 143 143 172 213 209 208 209 210 186 210 224 207 209 

69 153 149 148 147 148 148 153 149 147 151 215 216 213 215 211 187 222 209 220 210 

70 149 149 151 149 150 150 149 153 149 153 210 208 186 209 213 208 209 215 208 210 

71 143 146 182 151 143 149 144 143 146 144 227 214 187 209 211 208 215 211 210 214 

72 144 146 212 149 146 147 144 144 150 146 212 208 221 212 208 209 208 214 209 214 

73 149 145 145 150 146 144 146 150 145 145 220 216 212 219 189 208 213 210 215 208 

74 219 147 148 214 213 213 218 214 212 212 207 209 212 208 193 217 209 207 211 211 

75 143 143 145 141 142 143 142 148 143 144 209 210 215 208 219 209 214 210 208 211 

76 146 148 150 146 146 150 146 147 146 150 213 208 208 209 208 208 208 212 207 209 

77 149 154 149 144 148 149 148 148 149 149 220 220 216 221 246 219 217 217 225 216 

78 153 152 152 164 153 152 151 152 152 151 213 217 223 214 220 220 215 216 213 228 

79 162 183 152 158 156 152 156 153 153 152 218 214 213 216 217 213 228 215 212 213 

80 149 255 149 153 157 148 152 150 149 151 218 212 215 212 217 218 213 217 216 213 

81 150 157 152 150 149 150 153 158 151 151 219 234 217 220 217 213 214 214 220 214 

82 150 159 155 150 150 162 152 151 152 151 209 212 212 217 218 218 216 213 212 218 

83 151 152 153 156 153 152 151 152 166 153 191 214 214 214 199 214 219 216 213 215 

84 150 146 145 151 148 147 147 146 153 147 195 217 213 214 190 216 217 217 217 213 

85 151 146 146 147 146 145 147 148 146 147 226 216 215 217 192 210 208 209 217 219 

86 144 145 146 144 146 144 149 147 143 145 209 210 208 215 232 217 211 213 209 214 

87 147 148 153 147 149 149 147 147 148 153 213 210 209 208 219 212 214 211 211 210 

88 148 153 213 155 148 154 149 148 150 148 212 208 208 209 207 209 208 208 208 209 

89 210 143 146 215 216 208 208 214 214 209 214 218 218 218 211 209 208 209 215 248 

90 148 145 145 146 149 145 146 149 144 144 210 211 204 209 215 218 212 209 212 213 

91 150 166 146 145 146 149 155 148 145 144 209 209 186 208 208 209 220 215 209 187 

92 149 151 152 147 146 148 147 152 146 148 212 213 189 210 207 208 208 210 210 186 

93 141 148 148 142 141 148 142 142 143 146 194 220 218 215 216 209 209 209 219 229 

94 145 217 146 149 146 148 146 146 156 146 187 209 213 212 220 217 210 212 214 210 

95 145 143 143 148 143 143 143 144 144 143 213 210 209 213 208 207 213 213 208 209 

96 151 148 147 147 153 147 149 149 146 148 207 208 208 209 208 209 208 208 208 209 

97 148 149 148 154 149 148 154 149 148 150 228 215 216 216 212 213 246 214 221 208 

98 143 144 147 141 143 144 144 148 143 146 208 212 215 209 214 214 210 208 207 211 

99 150 145 149 149 145 150 145 146 146 146 211 188 213 211 209 211 224 212 209 214 

100 145 145 146 149 146 145 145 147 151 148 209 186 208 210 208 204 209 208 208 218 

101 156 148 148 148 149 148 149 148 148 149 219 228 211 194 217 192 208 209 217 209 

102 148 143 141 144 143 147 144 144 142 144 210 209 215 191 213 192 211 212 215 216 

103 146 148 216 146 145 146 150 148 145 148 209 217 215 224 209 216 214 215 213 209 

104 145 150 149 143 143 145 144 145 144 149 210 211 209 209 208 187 220 209 237 210 

105 216 149 149 220 216 221 216 216 217 216 215 217 212 215 212 186 208 210 214 209 

106 151 150 148 152 152 148 157 150 155 148 210 213 213 208 218 202 187 209 209 219 

107 147 143 143 152 148 142 144 144 143 143 212 210 215 209 212 211 264 211 209 212 

108 146 217 146 148 145 145 150 146 144 144 209 187 209 212 209 209 222 209 209 214 

109 150 149 151 148 146 147 146 150 147 146 228 190 216 218 214 212 240 213 219 209 

110 150 152 154 148 155 154 149 149 150 153 208 210 210 224 215 225 248 210 208 225 

111 146 146 147 150 147 146 145 145 156 146 213 213 214 211 213 216 231 221 213 213 

112 154 151 149 151 151 155 151 151 152 150 216 219 212 212 211 213 223 214 214 217 

113 153 149 181 150 149 149 149 150 148 149 224 216 216 240 215 212 220 213 218 213 

114 152 156 154 152 153 152 156 154 151 154 214 214 216 213 217 218 234 213 212 218 

115 150 156 156 150 151 152 151 151 151 156 215 213 225 216 215 212 271 225 218 215 

116 152 151 147 155 152 158 154 152 152 149 218 213 213 215 213 215 333 213 215 219 

117 150 149 148 160 152 150 150 149 153 148 220 221 222 232 195 220 225 215 219 218 

118 153 150 150 150 156 151 151 150 150 150 219 217 213 217 197 218 214 213 212 219 

119 158 153 221 153 153 152 165 154 152 153 214 214 214 213 219 213 405 221 212 213 

120 149 147 152 145 149 148 146 158 212 148 213 213 217 214 211 214 213 222 219 213 

121 215 149 151 217 215 222 215 216 150 221 202 220 222 221 215 213 220 214 219 215 

122 150 148 149 153 149 147 147 147 152 148 191 211 213 214 213 224 212 212 213 219 

123 150 218 147 141 152 153 149 147 147 147 218 188 210 209 186 210 209 217 209 214 

124 153 149 147 150 148 149 149 148 148 148 214 186 211 210 186 208 211 209 213 214 

125 143 144 150 142 143 144 146 146 142 146 215 224 232 215 225 209 217 253 215 209 

126 138 149 150 146 143 146 145 146 145 150 210 212 209 213 214 214 213 209 215 216 

127 151 147 146 153 146 150 146 145 146 145 210 210 219 214 210 213 208 223 208 209 

128 149 147 147 154 149 149 148 147 176 148 213 209 208 195 208 209 209 208 210 212 

129 145 142 142 143 149 142 145 143 149 143 215 217 218 195 211 210 214 214 214 212 

130 150 146 145 146 145 145 151 153 147 145 214 208 208 187 218 215 208 204 208 216 

131 149 145 147 142 146 145 143 154 143 145 211 209 217 226 208 208 207 191 213 209 

132 147 154 147 147 147 147 146 148 148 153 214 209 210 215 210 213 215 188 212 210 

133 147 148 149 151 150 153 153 148 150 148 226 237 235 222 218 221 224 229 219 225 

134 142 142 209 148 144 145 142 142 147 142 208 214 208 208 213 214 206 208 207 214 

135 150 145 145 146 151 151 146 146 145 146 213 209 220 209 209 209 208 217 213 216 

136 151 147 148 147 146 146 152 147 214 149 213 187 213 216 209 213 210 208 214 209 

2 214 154 153 214 214 214 214 219 151 215 228 190 216 194 187 214 217 213 227 211 

99

A. Appendix 

A.5. Evaluation Data: Sobel Preprocessing 

We performed 20 test-runs with the class PreprocessingTimeTest (which is responsible 

for the wrong frame order). 10 runs were done with standard JVM options, 10 were 

done with the following tuning: -verbose:gc -Xms64m -Xmx256m -XX:NewRatio=2 




3 2 2 2 2 2 3 2 2 2 2 2 2 2 2 1 2 1 2 2 5 

4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

5 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 0 1 1 1 

6 2 1 2 1 1 2 1 2 1 1 1 12 1 1 1 1 1 1 1 11 

7 6 5 5 5 4 5 5 5 5 4 4 5 15 4 5 5 4 5 5 4 

8 2 2 2 2 2 2 1 2 2 2 2 2 13 13 2 2 2 2 3 2 

9 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 

10 1 1 0 1 1 1 0 1 1 1 1 1 0 1 1 11 1 1 1 1 

1 1 1 1 0 0 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 

11 1 1 0 1 0 1 1 1 0 1 1 0 0 1 1 1 1 1 0 1 

12 1 0 1 1 1 1 1 0 1 0 1 0 1 0 1 1 1 1 1 1 

13 1 1 1 1 1 1 1 1 1 0 1 2 1 1 1 1 1 2 1 1 

14 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

15 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 

16 1 3 1 1 2 2 2 1 2 6 2 1 13 2 2 1 2 2 2 2 

17 1 1 1 0 1 1 1 1 1 1 1 5 4 1 1 1 0 0 1 1 

18 2 2 1 1 2 2 2 2 1 2 3 3 2 2 3 3 3 13 3 2 

19 1 2 1 1 2 1 2 1 1 3 1 1 1 1 1 1 1 1 1 1 

20 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 12 1 1 1 1 

21 1 0 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 

22 1 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 0 12 1 

23 2 1 1 2 2 2 1 2 1 1 2 1 1 2 13 2 2 1 1 1 

24 1 1 0 0 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1 

25 1 1 1 1 1 1 1 1 1 2 1 1 1 5 1 1 1 1 1 1 

26 1 1 0 1 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 1 

27 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1 

28 1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 0 

29 1 1 1 1 1 1 1 0 0 1 1 1 0 0 1 0 0 1 1 1 

30 0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 1 

31 1 0 0 1 1 1 1 1 0 0 1 0 0 1 0 1 1 1 1 1 

32 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 11 12 1 2 1 

33 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 

34 1 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 0 0 0 0 

35 1 1 1 0 0 1 1 0 1 1 1 1 0 0 1 1 1 0 1 1 

36 0 1 0 0 1 0 0 1 1 1 1 1 0 1 1 0 1 12 0 1 

37 4 1 1 1 0 1 0 0 1 1 0 0 1 0 1 1 1 0 0 0 

38 1 1 0 0 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 

39 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 

40 0 1 1 0 0 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 

41 1 1 1 1 0 0 1 1 1 0 1 0 5 0 1 1 0 1 1 0 

42 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 12 1 1 1 1 

43 1 1 1 0 0 0 0 0 0 1 0 1 1 1 11 1 1 1 1 1 

44 1 1 1 1 0 1 1 0 0 1 1 12 1 1 1 1 1 0 0 1 

45 1 1 1 1 1 0 1 1 1 0 1 0 0 1 1 1 1 0 1 1 

46 1 1 1 1 1 1 0 1 1 1 1 2 1 1 1 1 0 1 1 1 

47 1 0 0 1 1 1 1 1 1 0 1 2 1 1 0 1 1 1 1 1 

48 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 

49 1 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 12 1 0 0 

50 1 1 1 1 1 0 1 0 1 0 1 1 1 1 1 0 1 1 1 1 

51 1 1 1 3 1 0 1 3 3 3 1 1 1 0 1 0 0 1 0 1 

52 1 3 2 1 3 3 3 1 1 8 0 1 1 1 1 1 0 1 0 0 

53 1 0 0 1 1 1 1 0 1 1 1 1 1 5 1 0 0 0 5 1 

54 1 1 0 0 0 1 0 1 1 0 0 1 0 0 1 1 1 1 0 1 

55 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 

56 1 1 1 0 0 1 1 0 0 1 0 1 1 0 1 1 1 1 1 1 

57 1 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

58 1 1 1 0 1 0 1 1 1 1 1 1 12 1 1 0 1 1 1 0 

59 1 0 1 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 

60 1 1 1 1 2 1 1 1 1 1 1 0 0 1 1 1 1 1 1 2 

61 1 0 1 1 1 1 1 1 1 0 0 1 0 1 2 0 12 1 1 0 

62 1 1 0 1 1 1 1 1 1 0 1 1 0 0 13 0 1 1 0 11 


100

A. Appendix 


63 1 1 1 1 1 0 0 1 1 1 0 0 1 1 0 0 1 0 1 1 

64 9 10 6 6 10 6 6 6 5 6 3 3 3 9 4 14 4 3 3 3 

65 0 1 0 1 0 0 1 1 1 1 1 1 1 0 1 0 1 1 0 0 

66 1 1 1 1 1 1 2 1 2 2 1 1 1 0 1 1 1 1 1 0 

67 1 1 1 1 1 1 0 1 0 1 1 0 1 0 0 1 0 1 1 1 

68 1 1 1 1 0 0 1 1 0 0 1 1 1 0 1 1 0 1 0 0 

69 3 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 

70 1 1 1 1 1 1 1 1 1 0 5 1 1 1 1 1 1 1 1 0 

71 1 1 0 0 0 0 0 1 1 0 1 0 0 1 1 1 0 1 1 1 

72 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 4 1 1 0 0 

73 0 1 0 0 1 1 1 0 1 1 0 4 1 5 1 1 1 1 1 0 

74 1 1 0 0 1 0 0 0 1 1 4 0 1 1 1 1 1 1 16 4 

75 1 1 1 1 0 1 0 1 1 0 0 1 1 0 15 1 4 1 4 0 

76 1 0 1 0 0 0 1 1 1 0 1 1 4 1 1 1 1 1 1 0 

77 1 0 0 0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 

78 1 1 1 0 1 0 1 0 1 1 1 1 2 1 1 1 0 1 1 1 

79 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 1 1 0 1 

80 6 1 1 1 1 1 0 0 1 0 1 1 1 1 0 1 1 0 0 1 

81 0 1 0 0 1 0 1 0 0 1 0 1 1 0 0 1 1 1 1 1 

82 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 0 0 1 0 

83 1 1 1 1 1 1 1 0 1 0 1 1 11 1 1 1 1 0 0 1 

84 1 0 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 1 0 1 

85 1 1 1 0 1 1 0 1 0 0 0 0 1 1 1 0 1 0 0 0 

86 0 0 0 0 1 1 0 1 0 0 0 0 1 1 1 1 1 0 1 1 

87 1 0 1 1 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 

88 1 1 1 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 0 1 

89 1 0 1 1 0 1 1 1 1 1 1 14 1 1 1 0 1 0 1 1 

90 0 1 0 0 0 1 1 1 1 1 0 12 1 1 1 1 0 1 1 1 

91 0 0 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 0 1 

92 1 1 0 0 1 1 1 0 1 1 1 1 1 1 0 0 0 0 0 1 

93 1 0 0 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 1 0 

94 1 1 1 0 1 0 1 1 1 1 1 0 0 1 0 1 1 0 1 1 

95 2 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0 1 

96 0 1 1 1 1 1 1 0 1 1 1 11 0 1 1 1 0 1 1 0 

97 0 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 

98 1 0 1 2 0 0 0 1 1 1 1 0 0 0 2 0 1 1 1 0 

99 1 1 0 0 1 1 0 0 0 1 1 0 1 1 1 1 1 1 11 1 

100 1 0 0 0 1 1 0 1 0 0 0 1 1 1 1 1 1 0 12 1 

101 0 1 0 1 0 1 1 0 0 1 1 0 1 0 1 1 0 1 1 1 

102 1 1 1 0 0 0 1 0 0 0 1 0 1 1 0 4 0 11 1 1 

103 0 0 1 1 1 1 0 1 1 0 1 3 1 14 0 0 0 3 0 1 

104 1 1 1 0 1 0 1 1 0 0 4 1 0 1 0 1 1 1 3 0 

105 0 1 1 1 0 0 1 1 1 1 0 0 3 0 7 1 4 0 0 1 

106 1 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 1 1 0 1 

107 1 2 1 1 1 2 1 1 1 2 1 2 2 2 2 1 1 1 1 1 

108 0 0 1 1 1 1 1 1 0 0 1 0 17 0 1 1 1 1 0 1 

109 1 1 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 

110 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 1 0 1 

111 0 2 2 1 2 1 1 1 0 1 1 1 1 0 0 0 0 1 1 1 

112 0 1 0 0 1 1 0 1 0 0 1 1 0 0 1 0 0 1 1 1 

113 1 1 1 0 0 1 0 1 1 0 1 0 1 0 3 0 2 1 1 1 

114 0 1 0 0 0 1 1 0 1 0 1 0 1 1 1 0 0 1 1 2 

115 1 1 1 1 1 1 0 1 0 1 1 0 1 1 1 0 0 1 1 0 

116 1 0 1 1 1 1 1 1 1 1 0 3 1 1 1 1 0 1 1 1 

117 1 1 0 0 0 0 1 1 1 0 2 0 0 1 0 1 0 2 13 1 

118 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0 2 1 0 0 

119 1 0 1 1 1 1 0 1 0 0 1 0 1 0 1 1 1 1 1 0 

120 0 0 0 1 0 1 1 0 0 1 1 0 1 1 0 1 1 1 0 2 

121 1 1 1 1 1 9 1 0 0 1 1 0 0 0 3 0 3 1 1 1 

122 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1 0 0 1 1 0 

123 0 1 0 1 0 0 1 1 0 0 1 0 1 1 1 1 2 0 1 1 

124 0 1 1 0 0 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 

125 1 0 0 1 0 0 1 1 0 1 1 1 2 0 1 1 1 1 0 0 

126 1 1 0 1 0 0 1 0 0 1 1 1 1 1 0 1 1 1 1 1 

127 3 1 0 1 0 1 0 0 1 1 0 0 1 1 1 2 1 1 0 1 

128 1 0 0 1 1 1 0 0 1 1 0 4 1 2 0 1 1 2 1 11 

129 0 0 0 1 0 1 0 0 1 0 2 3 1 1 1 1 1 0 14 0 

130 0 1 0 1 1 1 0 0 1 0 1 0 3 0 3 0 2 1 0 1 

131 1 0 1 0 0 1 0 1 1 1 0 1 12 1 1 1 1 1 1 1 

132 1 0 0 1 1 0 0 1 0 0 0 12 0 1 1 0 0 0 1 1 

133 0 0 1 1 0 0 1 1 1 0 0 1 1 0 1 1 1 1 1 1 

134 1 1 0 1 1 1 1 0 1 0 0 0 1 0 1 1 1 1 1 1 

135 1 1 0 1 1 0 0 1 1 0 1 0 0 1 0 0 11 0 1 1 

136 3 0 1 1 1 1 1 1 1 0 1 1 0 0 1 1 0 11 0 1 

2 1 1 1 1 1 0 0 0 1 1 0 0 1 0 1 1 0 1 0 0 

101

Diploma Thesis - Bad Request - Fachhochschule Vorarlberg

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?