Diploma Thesis - Bad Request - Fachhochschule Vorarlberg
Diploma Thesis - Bad Request - Fachhochschule Vorarlberg
Diploma Thesis - Bad Request - Fachhochschule Vorarlberg
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Moment-based<br />
Facial Feature Tracking<br />
using Java<br />
<strong>Diploma</strong> <strong>Thesis</strong> in the Degree Program<br />
iTec - Information and Communication Engineering<br />
Manuela Hutter<br />
011 0109 038<br />
Supervisor: Dipl.-Ing. (FH) Walter Ritter<br />
Dornbirn, August 2005<br />
<strong>Fachhochschule</strong> <strong>Vorarlberg</strong> GmbH
Eidesstattliche Erklärung<br />
Ich erkläre hiermit ehrenwörtlich, dass ich die vorliegende Arbeit selbständig angefer-<br />
tigt habe. Die aus fremden Quellen direkt oder indirekt übernommenen Gedanken sind<br />
als solche kenntlich gemacht. Die Arbeit wurde bisher keiner anderen Prüfungsbehörde<br />
vorgelegt und auch noch nicht veröffentlicht.<br />
Acknowledgments<br />
Manuela Hutter (Dornbirn, August 2005)<br />
Many people helped me with this work in one way or another. I especially thank<br />
Walter Ritter, my supervisor, for his patience and help in all concerns. I am grateful<br />
to my introductory supervisor Miglena Dontschewa, who provided the initial idea for<br />
this thesis and enthused me for it; to Guido Kempter, who supported me in a diffi-<br />
cult phase of the project and assisted my statistical analyses; and to Avinash Manian,<br />
who gave me a helping hand with the data analysis in SPSS (thanks for your patience).<br />
I thank Colin Gregory-Moores and Lisa Newman for helping me with the basic structure<br />
of my English writing; Regine Bolter, the head of the study program, for giving me<br />
important hints to get on the right track; and Justin Zobel for writing the most helpful<br />
book about “writing for computer science”[Zobel, 2004]. Thanks to Wolfgang Mähr<br />
for proofreading and making helpful suggestions, and to my brother Matthias Hutter<br />
for collecting statistical data. Last but not least, thanks to my parents, Christine and<br />
Josef Hutter, for their personal and financial commitment.<br />
For scientific work with ethical awareness and without animal abuse.<br />
The use of registered names, trademarks etc. in this material does not imply, even in the absence of a specific statement, that<br />
such names are exempt from the relevant protective laws and regulations and therefore free for general use.<br />
i
Zusammenfassung<br />
Diese <strong>Diploma</strong>rbeit stellt ein plattform-unabhängiges, in Java entwickeltes Programm<br />
für die Gesichtsbewegungserkennung vor. Trackingalgorithmen, die markante Punkte<br />
im menschlichen Gesicht lokalisieren und verfolgen, sind eine wichtige Grundlage für<br />
viele unterschiedliche, darauf aufbauende Anwendungen: in der 3D-Modellanimation<br />
werden Punkte für die Gesichtsanimation eines Charakters benötigt; Analysen von<br />
menschlichen Emotionen verwenden die Punkte für automatische Klassifikation der<br />
Gesichtsmimik; und alternative Benutzerschnittstellen können Gesichtsbewegungen<br />
als Basis ihrer Funktionsweise benutzen. Zahlreiche Forschungsarbeiten beschreiben<br />
Bemühungen im Bereich der Erkennung von Gesichtsbewegungen. Trotzdem sind<br />
praktikable Lösungen selten. Es wurde nur eine Anwendung auf dem Markt gefunden,<br />
welche das Trackingproblem in Echtzeit und ohne physische Markierungen auf dem<br />
untersuchten Gesicht löst; es funktioniert allerdings nur auf Windows-Plattformen.<br />
Die entwickelte Java-Anwendung kann auf allen Plattformen verwendet werden, auf<br />
denen eine ‘Java Virtual Machine’ installiert ist. Sie benutzt eine Trackingmethode,<br />
die auf Bildmomenten und einer ‘Binary Space Partitioning’-Datenstruktur basiert,<br />
und einen Canny-Kantendetektor für die Datenaufbereitung verwendet. Die Software<br />
arbeitet mit Video-Eingangsdaten, ohne Markierungen auf dem betrachteten Gesicht.<br />
Sie hat eine modulare Programmstruktur, welche die Verwendung und den Austausch<br />
von externen Bibliotheken zulässt. Derzeit werden das ‘Java Media Framework’ für die<br />
Extrahierung der Video-Frames, und entweder ‘Java2D’ oder ‘Java Advanced Imag-<br />
ing’ für die Bildaufbereitung verwendet. Das Programm kann relevante Merkmale<br />
in vorausgewählten Bildregionen finden. Obwohl die extrahierten Punkte nicht mit<br />
standardisierten Gesichtsparametern wie den MPEG-4 ‘Facial Animation Parameters’<br />
übereinstimmen, zeigen 2 untersuchte Beispielpunkte bemerkenswerte Korrelationen<br />
von bis zu 98% im Vergleich zu manuell ermittelten Punkten; das Erkennen von<br />
Gesichtsmerkmalen auf einem vorverarbeiteten Bild dauert zirka 5 ms. Nachdem der<br />
Tracking-Prozess abgeschlossen ist, können die gefundenen Punkte in einer Ausgabe-<br />
datei gespeichert werden, um sie für nachfolgende übergeordnete Aufgaben verfügbar<br />
zu machen.<br />
ii
Abstract<br />
In this thesis, we present a platform-independent program for facial feature tracking,<br />
implemented in Java. Facial feature tracking algorithms, which locate and pursue dis-<br />
tinctive points in a human face, are an important basis for many different high-level<br />
tasks: 3D model animation needs feature points for moving the model’s facial fea-<br />
tures; programs that analyze human emotions use the points for automatic emotion<br />
recognition; and facial movements may provide a basis for alternative user interfaces.<br />
Numerous papers describe research efforts in the field of facial feature tracking. Nev-<br />
ertheless, practicable solutions are rare. We found only one application on the market<br />
that solves the tracking task in realtime and without physical markers on the tracked<br />
face. However, it only works on Windows platforms. The implemented Java tracking<br />
program can be used on all platforms that have a ‘Java Virtual Machine’ installed. It<br />
uses a tracking method based on image moments and a ’Binary Space Partitioning’<br />
data structure, the input data is prepared by a Canny edge detection mechanism.<br />
The software works on video input, without markers on the processed face. It has<br />
a modular program structure that allows for the use and interchange of external li-<br />
braries. Currently, it uses the ‘Java Media Framework’ for video frame extraction,<br />
and either ‘Java2D’ or ‘Java Advanced Imaging’ for preprocessing. The program is<br />
able to find relevant feature points in preselected image regions. While the extracted<br />
points are not in accordance with point definition standards like the MPEG-4 ‘Facial<br />
Animation Parameters’, 2 tested sample points show remarkable correlations of up to<br />
98% in comparison to manually ascertained points; the computation time of feature<br />
points on a preprocessed image region lies around 5 ms. After the tracking process,<br />
the extracted points can be saved to an output file in order to make them available<br />
for subsequent higher level tasks.<br />
iii
Contents<br />
Introduction 1<br />
1. State of the Art 5<br />
1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br />
1.2. Basic Tracking Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br />
1.3. Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8<br />
1.3.1. Optical Flow Techniques . . . . . . . . . . . . . . . . . . . . . . 8<br />
1.3.2. Active Contours (Snakes) . . . . . . . . . . . . . . . . . . . . . 12<br />
1.3.3. Image Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . 16<br />
1.4. Commercial Implementations . . . . . . . . . . . . . . . . . . . . . . . 22<br />
1.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26<br />
2. Algorithms in Consideration 27<br />
2.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br />
2.2. Testing Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br />
2.2.1. Testing Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br />
2.2.2. Input Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28<br />
2.3. Testing Snake Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 29<br />
2.4. Testing Image Moments . . . . . . . . . . . . . . . . . . . . . . . . . . 34<br />
2.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35<br />
3. Input Data and Its Preparation 36<br />
3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36<br />
3.2. Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36<br />
3.2.1. Data Format Prerequisites . . . . . . . . . . . . . . . . . . . . . 36<br />
3.2.2. Video Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41<br />
3.2.3. Video Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . 41<br />
3.3. Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42<br />
iv
Contents<br />
3.3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42<br />
3.3.2. Edge Detection Algorithms . . . . . . . . . . . . . . . . . . . . 44<br />
3.3.3. Edge Detector Realization . . . . . . . . . . . . . . . . . . . . . 47<br />
3.3.4. Further Improvements . . . . . . . . . . . . . . . . . . . . . . . 48<br />
3.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49<br />
4. Programming 50<br />
4.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50<br />
4.2. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50<br />
4.2.1. Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50<br />
4.2.2. Basic Application Flow . . . . . . . . . . . . . . . . . . . . . . 54<br />
4.2.3. Tracking Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 55<br />
4.3. Implementation Process . . . . . . . . . . . . . . . . . . . . . . . . . . 60<br />
4.3.1. Working Environment . . . . . . . . . . . . . . . . . . . . . . . 60<br />
4.3.2. Difficulties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61<br />
4.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63<br />
5. Evaluation 64<br />
5.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64<br />
5.2. Program Abilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64<br />
5.3. Tracking Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65<br />
5.3.1. Test Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68<br />
5.3.2. Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68<br />
5.3.3. Statistical Evaluation . . . . . . . . . . . . . . . . . . . . . . . 69<br />
5.4. Time Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77<br />
5.4.1. Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77<br />
5.4.2. Statistical Evaluation . . . . . . . . . . . . . . . . . . . . . . . 77<br />
5.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79<br />
Conclusions 80<br />
Bibliography 82<br />
Glossary 86<br />
v
Contents<br />
A. Appendix 92<br />
A.1. Evaluation Data: Coordinates of Corners of the Mouth . . . . . . . . . 92<br />
A.2. Evaluation Data: Tracking Mouth Area Selection (60x20) . . . . . . . 94<br />
A.3. Evaluation Data: Tracking of Whole Area Selection (384x288) . . . . . 96<br />
A.4. Evaluation Data: Canny Preprocessing . . . . . . . . . . . . . . . . . . 98<br />
A.5. Evaluation Data: Sobel Preprocessing . . . . . . . . . . . . . . . . . . 100<br />
vi
Introduction<br />
In this thesis, we develop a modular Java application that is able to detect and track<br />
facial features in video input data. The program finds distinctive points in the first<br />
frame of the input video and tracks them in subsequent video frames. Such facial fea-<br />
ture trackers are needed in different fields of computer vision. One area of application<br />
includes facial expression recognition, classification and detection of emotional states,<br />
where feature points can be used for an automated recognition process. Another area<br />
of application is information based encoding and compression; extracted feature in-<br />
formation could for example be used for low-bandwith video chats where only the<br />
movement information has to be transmitted. In terms of model acquisition and an-<br />
imation, feature points are needed for moving the character’s facial features. Facial<br />
movements may also provide a basis for alternative user interfaces that, for example,<br />
allow handicapped people to operate computers with facial expressions.<br />
This project started because there is no freely available, undisclosed, and platform<br />
independent program that is able to track feature points out of an input video. Various<br />
papers deal with the tracking of facial movements, using different techniques to localize<br />
and follow facial features. These works describe their approach in a more or less trans-<br />
parent way, they mostly mention that they implemented an example prototype, but<br />
they do not provide implementation details or even source code. We assume that they<br />
mainly use C or C++ as programming languages, at least no paper was found that<br />
explicitly uses Java for feature tracking. Despite the number of available approaches,<br />
only one solution, the commercial product VeeAnimator, which was developed for 3D<br />
model animation, currently solves the facial movement tracking task without the need<br />
of physical markers on the tracked face. In contrast to this application, we wanted<br />
to implement a free feature tracker, with the aim to be based on a comprehensible<br />
tracking method, to function on different platforms, and with disclosed sources and<br />
documentation. It should help to understand tracking algorithms, and evaluate the<br />
feasibility to implement a feature tracker in Java.<br />
1
Introduction<br />
In this paper, we first investigate and evaluate various movement tracking algorithms<br />
according to cost factors and their comprehensibility and practicability. Existing track-<br />
ing algorithms work with a range of different approaches. Techniques based on active<br />
contours can track deformable objects. However, they require manual initialization,<br />
and, due to their complexity, implementation as well as computation times may be too<br />
high. Local approaches, such as optical flow, surpass active contours concerning their<br />
complexity and computation times. Still, they have some limitations, as they may<br />
for instance compute more pixels than strictly necessary. Other algorithms designed<br />
for basic shape tracking, like the moment-based approach by Rocha et al. [2002], may<br />
not be applicable for facial movement tracking because they require clear, not frayed<br />
objects in the input images.<br />
Besides the comprehensibility and practicability of the algorithms, we also evaluate<br />
the quality of the obtained output. The extracted feature points may either follow<br />
defined standards, or they may not be consistent and well-positioned enough to fit into<br />
these norms. Figure 0.1 shows a possible tracking output, where the feature points<br />
are valid in the respect that they hold essential positions on the contour of the mouth.<br />
However, one feature point cannot permanently be defined as a certain facial feature<br />
point (the right corner of the mouth, for example), since its position may not be close<br />
enough to a predefined feature location, or the point’s position may unexpectedly<br />
change in subsequent video frames.<br />
(a) (b)<br />
Figure 0.1.: Not standardized output of facial feature localization. The tracked points<br />
(a) may not correlate with standardized points like the MPEG-4 Facial<br />
Animation Parameters (FAP) (b) [Antunes Abrantes and Pereira, 1999].<br />
2
Introduction<br />
For evaluation and decision making, we used modified versions of existing Java code<br />
that implement the algorithms in consideration. We then selected one algorithm that<br />
scored well in the evaluation process to be translated into Java for the usage in the<br />
final program. The chosen approach is based on a shape tracking algorithm by Rocha<br />
et al. [2002]. It uses image moment calculations and a ’Binary Space Partitioning’ data<br />
structure (described in Section 1.3.3) in order to find object positions and orientations.<br />
The algorithm is straightforward and shows small computation times. However, the<br />
output data may not be applicable to standards.<br />
The quality and the condition of the input data are very important factors for the<br />
algorithm to function and produce good results. To enhance these factors, we define<br />
certain prerequisites, select an appropriate Java library for the technical realization,<br />
and describe the preparation of the data. We therefore use the Java Media Frame-<br />
work (JMF) for video frame acquisition, and a Java2D based Canny edge detection<br />
mechanism for the data preprocessing.<br />
The resulting Java program is able to execute simple tracking tasks on preselected<br />
facial regions. For convenience, the users can perform the preselection, and control the<br />
video flow using a Graphical User Interface (GUI). Because this work only deals with<br />
the basic tracking process and leaves out important procedures (like the determination<br />
of face position and orientation), the problem area was reduced and special prerequi-<br />
sites were added (as described in Section 3.2). The architecture of the Java tracker has<br />
a modular design with 3 exchangeable components: the part responsible for the video<br />
frame acquisition, the preprocessing mechanism and the tracking implementation (see<br />
Section 4.2).<br />
The evaluation of the implemented application shows that the bottleneck in terms<br />
of computation times is the currently used Canny preprocessing technique. Investiga-<br />
tions on two exemplary feature points, the corners of the moth, demonstrate that the<br />
moment-based tracking algorithm is producing results similar to manually ascertained<br />
points. Variations are mainly caused by ragged edges in the preprocessed images.<br />
3
This paper is split up into five chapters:<br />
Introduction<br />
Chapter 1 informs about the basic tracking process and the current state of research<br />
on face localization and movement tracking. It gives an overview of commercial solu-<br />
tions and established algorithms in that field and makes a comparison of their range<br />
of application, their strengths and weaknesses.<br />
Chapter 2 goes into detail on preselected algorithms and analyzes the choice of the<br />
moment-based approach to be implemented in the final program. It describes sample<br />
implementations that we used for decision making, and reflects the test runs that we<br />
made in order to come to a decision.<br />
Chapter 3 defines the required input data format for the program, the prerequisites<br />
and the preparation of videos for the tracking process. We compare techniques to<br />
read videos and split them up into single images, and we select the most appropriate<br />
option for our aims. For that purpose, we define video information constraints, like a<br />
convenient and constant position and orientation of the face in all video frames. After<br />
having the right input data, with the face in the right place, we need to transform the<br />
images into edge images to be applicable to the tracking algorithm.<br />
Chapter 4 describes the code development of the feature tracker. It illustrates the<br />
program architecture with its general structure and the basic application workflow. In<br />
that context, we also describe the implementation of the tracking algorithm in detail.<br />
In the second part of this chapter, we briefly outline the working environment, and<br />
state difficulties that arouse during the implementation process.<br />
Chapter 5 shows the achieved results of the Java feature tracker and evaluates them<br />
according to the correctness of 2 calculated feature points, the corners of the mouth,<br />
and the required preprocessing and calculation time.<br />
4
1. State of the Art<br />
1.1. Overview<br />
In order to select the most appropriate facial feature tracking method for the Java<br />
implementation, we inspect the basic tracking process and look at the way it is imple-<br />
mented by different algorithms. We group these algorithms into 3 categories: optical<br />
flow techniques, active contour models (snakes), and moment-based shape tracking.<br />
For every group, we first state general definitions and properties in order to explain<br />
the mode of operation. We then give examples of approaches that use an algorithm of<br />
this group for facial feature tracking, and evaluate their feasibility for the Java feature<br />
tracker. In this context, we also introduce two commercial solutions, which unfortu-<br />
nately do not provide technical background information.<br />
Facial feature tracking is not to be mistaken with face tracking, which aims to locate<br />
the complete face inside a video sequence, characterizing its position and orientation,<br />
but often not evaluating further details inside the tracked face. It may however be<br />
a preparatory step for, or mixed with, facial feature tracking. Several authors have<br />
dealt with the face tracking problem ([Krüger et al., 2000; Sahbi and Boujemaa, 2002;<br />
Wu et al., 1999]).<br />
Within the field of computer vision, “recognition of the facial expressions is a very<br />
complex and interesting subject where there have been numerous research efforts”<br />
[Goto et al., 1999]. Most of these works are based on a similar process workflow, with<br />
videos and/or video streams as input data (see Section 1.2). However, they differ in<br />
their complexity, the initialization strategy and output quality, and in the way how<br />
they deduct facial feature points and movements. As it is the goal to find a straight-<br />
forward tracking algorithm, we examine existing algorithms according to cost factors,<br />
their comprehensibility and practicability. One group of techniques, based on snakes,<br />
can track deformable objects, but requires manual initialization and is computationally<br />
5
1. State of the Art<br />
expensive. It may also require extensive training and implementation periods. Other<br />
approaches, such as optical flow, surpass more complex methods concerning compu-<br />
tation speed, but “provide a low-level problem characterization and suffer from some<br />
drawbacks” [Rocha et al., 2002]: They might, for example, “integrate more pixel than<br />
strictly necessary” [Dellaert and Collins, 1999]. Basic shape tracking methods may<br />
not be applicable for facial movement tracking because of their need of clear, singular<br />
objects in the input images. All examined works provide a theoretical and mathe-<br />
matical description of the developed algorithm, but they lack an illustration of the<br />
programmatic implementation, flow diagrams, or code snippets. The commercial so-<br />
lutions provide very little technical background information, the tracking methodology<br />
is not disclosed. We found two products that are able to track facial features, only<br />
one of them solves this task without the need of physical markers on the tracked face.<br />
1.2. Basic Tracking Process<br />
Most of the work in facial feature tracking is based on a similar basic process workflow.<br />
As illustrated in Figure 1.1, the process works on input data from standard video or<br />
web cameras. This data may underlie some constraints, such as the distance between<br />
the tracked face and the camera, or lightning conditions (described in Section 3.2 for<br />
the Java feature tracker). The core tracking procedure consists of a number of steps.<br />
Most feature trackers have a preprocessing step, where the video data is prepared to<br />
be applicable to the tracking algorithm. The image sequences may be smoothed, con-<br />
verted into a different color model, or object details may be accentuated (see Section<br />
3.3). As facial feature tracking methods mostly assume a certain size or orientation<br />
of the tracked objects, the face has to be detected, located, and probably transformed.<br />
Due to time constraints, this project does not deal with face localization and bridges<br />
this gap with input data prerequisites. Having the preprocessed facial data in the<br />
right position in the image sequence, the feature tracking algorithm can set to work.<br />
Some algorithms, like the motion-based Java tracker or Snake-based methods, require<br />
pre-selection of feature points or feature regions. In case the implemented Java pro-<br />
gram, this step could be automated in further development steps. After the tracking<br />
procedure, all methods provide more or less standardized facial parameters, 2D or<br />
3D, depending on the tracker application. These parameters may then be used for<br />
facial model animation, or for high level face processing tasks such as facial expression<br />
recognition, face classification or face identification.<br />
6
1. State of the Art<br />
Figure 1.1.: Basic facial feature tracking workflow. All non-grey elements are part of<br />
the implemented Java feature tracker.<br />
Gorodnichy [2003] has illustrated a similar “hierarchy of face processing tasks”. He<br />
does not state input and output data, but goes into detail with the face localization<br />
as a preliminary step, and he lists a range of higher level tasks. Facial feature tracking<br />
is not mentioned in his his illustration, he seems to include this procedure in a step<br />
called “Face Localization (precise)”.<br />
In the following section, we describe a set of well-established tracking methodologies<br />
and their practice in the illustrated facial feature tracking workflow.<br />
7
1.3. Algorithms<br />
1.3.1. Optical Flow Techniques<br />
Definitions and Properties<br />
1. State of the Art<br />
Optical flow is a concept for considering the motion of objects within a visual represen-<br />
tation, where the motion is typically represented as vectors originating or terminating<br />
at pixels in a digital image sequence. Every pixel in an optical flow image is repre-<br />
sented by a motion vector that indicates the direction and the intensity of motion in<br />
this point. The work of Beauchemin and Barron [1995] extensively describes optical<br />
flow techniques. Figure 1.2, taken from their work, illustrates the computation of<br />
optical flow.<br />
(a) (b)<br />
Figure 1.2.: One frame of an image sequence (a) and its optical flow (b) Beauchemin<br />
and Barron [1995].<br />
An optical flow algorithm “estimates the 2D flow field from image intensities”[Cutler<br />
and Turky, 1998]. In the survey of Cédras and Shah [1995], the methods are di-<br />
vided into four classes: differential methods, region-based matching, energy-based,<br />
and phase-based techniques:<br />
“Differential methods compute the velocity from spatiotemporal derivates<br />
of image intensity. Methods for the computation of first order and sec-<br />
ond order derivates were devised, although estimates from second order<br />
approaches are usually poor and sparse. In region-based matching, the<br />
velocity is defined as the shift yielding the best fit between image regions,<br />
according to some similarity or distance measure.<br />
8
1. State of the Art<br />
Energy-based (or frequency-based) methods compute optical flow using<br />
the output from the energy of velocity-tuned filters in the Fourier domain,<br />
while phase-based methods define velocity in terms of the phase behavior<br />
of band-pass filter output, for example the zero crossing techniques.”<br />
Optical Flow for Feature Tracking<br />
Cohn et al. An example of how optical flow techniques are used in facial feature<br />
tracking is described by Cohn et al. [1998]. In their work, they manually select feature<br />
points in the first frame. Each of these points is then defined as the center of a 13x13<br />
pixel flow window. The position of all feature points is normalized by automatically<br />
mapping them to a standard face model based on three facial feature points: the<br />
medial canthi of both eyes and the uppermost point of the philtrum (see Figure 1.3).<br />
Figure 1.3.: Standard face model according to Cohn et al. [1998]. Medial canthus:<br />
inner corner of the eye, philtrum: vertical groove in the upper lip.<br />
A hierarchical optical flow method is used to automatically track feature points in<br />
the image sequence. The displacement of each feature point is calculated by subtract-<br />
ing its normalized position in the first frame from its current normalized position.<br />
The resulting flow vectors are concatenated to produce a 12-dimensional displacement<br />
vector in the brow region, a 16-dimensional displacement vector in the eye region, a<br />
12-dimensional displacement vector in the nose region, and a 20-dimensional vector in<br />
the mouth region (see Figure 1.4). The technique is based on the Facial Action Coding<br />
System (FACS), a widespread method for measuring and describing facial behaviors<br />
developed by Ekman and Friesen [1978] in the 1970s. Facial activities are described in<br />
terms of a set of small, basic actions, each called an Action Unit (AU). The AUs are<br />
based on the anatomy of the face and occur as the result of one or more muscle actions.<br />
9
1. State of the Art<br />
Figure 1.4.: Feature point displacements. Change from neutral expression (AU 0)<br />
to brow raise, eye widening, and mouth stretched wide open (AU<br />
1+2+5+27). Lines trailing from the feature points represent replacement<br />
vectors due to expression Cohn et al. [1998].<br />
Essa and Pentland The work of Essa and Pentland [1997] describes another facial fea-<br />
ture tracking method based on optical flow. They base their work on a self-developed,<br />
“extending FACS”, encoding system. They analyzed image sequences of facial expres-<br />
sions and probabilistically characterizing the facial muscle activation associated with<br />
each expression. This is achieved using a detailed physics-based dynamic model of the<br />
skin and muscles coupled with optical flow in a feedback controlled framework. They<br />
call this analysis control-theoretic approach, which produces muscle-based representa-<br />
tions of facial motion (Figure 1.5 shows an example).<br />
(a) (b)<br />
Figure 1.5.: A motion field for the expression of smile from optical flow computation<br />
(a) mapped to a face model using the control-theoretic approach (b) [Essa<br />
and Pentland, 1997].<br />
10
Evaluation<br />
1. State of the Art<br />
The approach by Cohn et al. uses the FACS for feature tracking. This system has<br />
been widely used for controlling computer animation, but was not intentionally devel-<br />
oped for this purpose. The intended goal was to “create a reliable means for skilled<br />
human scorers to determine the category or categories in which to fit each facial be-<br />
havior” (http://face-and-emotion.com/dataface/facs/description.jsp). Essa<br />
and Pentland [1997] state in their work, that<br />
“it is widely recognized that the lack of temporal and detailed spatial in-<br />
formation (both local and global) is a significant limitation to the FACS<br />
model. [...] Additionally, the heuristic ‘dictionary’ of facial actions origi-<br />
nally developed for FACS-based coding of emotion has proven to be difficult<br />
to adapt to machine recognition of facial expression”.<br />
The results of the method show that the accuracy is between 83% and 92% compared to<br />
previous tests and results of human testers, depending on the region. The authors find<br />
one reason for the lack of 100% agreement is “the inherent subjectivity of human FACS<br />
coding, which attenuates the reliability of human FACS codes” [Cohn et al., 1998].<br />
Two other possible reasons were the “restricted number of optical flow feature windows<br />
and the reliance on a single computer vision method”. In contrast to this approach<br />
by Cohn et al., the solution by Essa and Pentland specifically deals with the facial<br />
expression recognition. The work describes a complete tracking framework, which<br />
includes a physics-based dynamic model for skin and muscles-description, something<br />
that is not intended for the Java tracker. Both algorithms do not mention complexity<br />
or computation time for the tracking process.<br />
11
1.3.2. Active Contours (Snakes)<br />
Definitions and Properties<br />
1. State of the Art<br />
Active contour models, commonly called snakes, are energy-minimizing curves that<br />
deform to fit image features. Snakes, first introduced by Kass et al. [1988], “lock on to<br />
nearby minima in the potential energy generated by processing an image. (This energy<br />
is minimized by iterative gradient descent [...]) In addition, internal (smoothing) forces<br />
produce tension and stiffness that constrain the behavior of the models; external forces<br />
may be specified by a supervising process or a human user” [Ivins and Porrill, 1993].<br />
Figure 1.6 shows the basic functionality of a closed snake.<br />
Figure 1.6.: A closed snake. The snake’s ends are joined so that it forms a closed<br />
loop. Over a series of time steps the snake moves into alignment with the<br />
nearest salient feature [Ivins and Porrill, 1993].<br />
Snakes are applied to a range of different image processing problems. They sup-<br />
port the detection of lines and edges, but can also be used for stereo matching or for<br />
segmenting image sequences. Snakes have often been used in medical research appli-<br />
cations, and motion tracking systems use them to model moving objects. The main<br />
limitations of the models are that they “usually only incorporate edge information<br />
(ignoring other image characteristics) possibly combined with some prior expectation<br />
of shape; and that they must be initialized close to the feature of interest if they are<br />
to avoid being trapped by other local minima”[Ivins and Porrill, 1993] 1 .<br />
1 An overview of John Ivins’ publications about snakes is available at http://www.computing.edu.<br />
au/~jim/snakes.html<br />
12
1. State of the Art<br />
A snake (V ) is an ordered collection of n points in the image plane:<br />
V = {vi, . . . , vn} (1.1)<br />
vi = (xi, yi), i = {i, . . . , n}<br />
The points in the contour iteratively approach the boundary of an object through the<br />
solution of an energy minimizing problem. For each point in the neighborhood of vi,<br />
an energy term is computed<br />
Ei = αEint(vi) + βEext(vi) (1.2)<br />
where Eint(vi) is an energy function dependent on the shape of the contour, and<br />
Eext(vi) is an energy function dependent on the image properties, such as the gradient,<br />
near point vi. α and β are constants providing the relative weighting of the energy<br />
terms. Ei, Eint, and Eext are calculated using matrices. The value at the center of each<br />
matrix corresponds to the contour energy at point vi. Other values in the matrices<br />
correspond (spatially) to the energy at each point in the neighborhood of vi. Each<br />
point vi is moved to the point v ′ i , corresponding to the location of the minimum value<br />
in Ei. This process is illustrated in Figure 1.7. If the energy functions are chosen<br />
correctly, the contour V should approach the object boundary and stop when done so.<br />
Figure 1.7.: An example of the movement of a point vi in a snake. The point vi is<br />
the location of minimum energy due to a large gradient at that point<br />
[Mackiewich, 1995].<br />
13
Snakes for Feature Tracking<br />
1. State of the Art<br />
The work of Terzopoulos and Waters [1993] describes a hybrid method, where shape<br />
models and snakes are taking part in the tracking process. Face models are set up,<br />
which are then tracked by snakes. The approach incorporates many complex proce-<br />
dures, described by the authors as following:<br />
“An approach to the analysis of dynamic facial images for the purposes<br />
of estimating and resynthesizing dynamic facial expressions is presented.<br />
The approach exploits a sophisticated generative model of the human face<br />
originally developed for realistic facial animation. The face model which<br />
may be simulated and rendered at interactive rates on a graphics work-<br />
station, incorporates a physics-based synthetic facial tissue and a set of<br />
anatomically motivated facial muscle actuators. The estimation of dynam-<br />
ical facial muscle contractions from video sequences of expressive human<br />
faces is considered. An estimation technique that uses deformable contour<br />
models (snakes) to track the nonrigid motions of facial features in video<br />
images is developed. The technique estimates muscle actuator controls<br />
with sufficient accuracy to permit the face model to resynthesize transient<br />
expressions.”<br />
Figure 1.8 illustrates how snakes are used in this work.<br />
(a) (b)<br />
Figure 1.8.: Snakes and fiducial points used for muscle contraction estimation: neutral<br />
expression (a) and surprise expression (b)<br />
14
Evaluation<br />
1. State of the Art<br />
Snakes are mostly used in combination with other methods, as they require pre-<br />
initialization close to the feature of interest. A big disadvantage of the snake algorithm<br />
is that it is easily mislead if the edge is uncontinuous. Xie and Mirmehdi [2003] call<br />
this characteristic weak edge:<br />
“Despite their significant advantages, geometric snakes only use local in-<br />
formation and suffer from sensitivity to local minima. Hence, they are<br />
attracted to noisy pixels and also fail to recognize weaker edges for lack<br />
of a better global view of the image. The constant flow term can speed<br />
up convergence and push the snake into concavities easily when gradient<br />
values at object boundaries are large. But when the object boundary is<br />
indistinct or has gaps, it can also force the snake to pass through the<br />
boundary.”<br />
They developed an improved edge algorithm, called RAGS, that is able to undergo<br />
this problem. It works with “extra diffused region force which delivers useful global<br />
information about the object boundary and helps prevent the snake from stepping<br />
through”[Xie and Mirmehdi, 2003]. Figure 1.9 shows improvement with RAGS.<br />
(a) (b)<br />
Figure 1.9.: Weak-edge leakage. A regular snake leaks out of a weak edge (a); RAGS<br />
snake converges properly using its extra region force (b).<br />
Snakes have a great potential to work well in a tracking environment. However, the<br />
weak-edge leakage problem and the complexity of the algorithm argues against the use<br />
of snakes.<br />
15
1.3.3. Image Moments<br />
Definitions and Properties<br />
1. State of the Art<br />
In order to define its basic position, size, and orientation, a binary or greyscale image<br />
object can be approximated by a best-fitting ellipse. This ellipse is defined by the<br />
centroid, major and minor axis, and the angle of the major axis with the x-axis.<br />
These values are calculated using image moment functions. Figure 1.10 shows example<br />
moment calculations for a binary image object (the black pixels in the illustration). a,<br />
b and θ, and the resulting ellipse are illustrated in the image. The following paragraphs<br />
derive and explain the functions necessary for the calculation of the best-fitting ellipse.<br />
m00 = 5<br />
m10 = 15, m01 = 15<br />
m20 = 49, m02 = 47, m11 = 43<br />
c = (3, 3)<br />
θ = 31.7 ◦ , a = 1.84, b = 0.70<br />
Figure 1.10.: Example for moment calculations and shape representation.<br />
The image ellipse is represented by the semi-major axis a, the semi-minor<br />
axis b and the orientation angle θ.<br />
General Moment Definition A grayscale image can be seen as a two-dimensional<br />
density distribution function, written in the form of f(x, y), where the function value<br />
represents the intensity of a pixel at the position (x, y). A general definition of two-<br />
dimensional (p + q) order moments is then given by the following equation:<br />
Φpq =<br />
�∞<br />
�∞<br />
−∞ −∞<br />
Ψpq(x, y) f(x, y) dx dy p, q = 0, 1, 2, 3... (1.3)<br />
where Ψpq is a continuous function of (x, y), known as the moment weighting kernel<br />
or the basis set. The indices p, q usually denote the degrees of the coordinates (x, y),<br />
as defined inside the function Ψ. For example, a zeroth order moment is given by<br />
16
1. State of the Art<br />
p = 0 and q = 0. Applied to an image, the intensity function f(x, y) is bounded,<br />
and therefore the integrals in equation 1.3 are finite. In consequence, the general<br />
two-dimensional moment function can also be written in the form<br />
��<br />
Φpq = Ψpq(x, y) f(x, y) dx dy p, q = 0, 1, 2, 3... (1.4)<br />
ζ<br />
where ζ represents the image region, that is the number of foreground pixels in the<br />
image. Detailed moment function descriptions can be found in the book “Moment<br />
Functions in Image Analysis” [Mukundan and Ramakrishnan, 1998].<br />
Geometric Moments “Geometric moments are the simplest among moment func-<br />
tions, with the kernel function defined as a product of the pixel coordinates.” [Mukun-<br />
dan and Ramakrishnan, 1998, p. 9]. Compared with more complex weighting kernels,<br />
geometric moments are easy to perform and implement. They are also called Carte-<br />
sian moments, or regular moments. Equation 1.5 shows the two-dimensional geometric<br />
moment function, referred to as mpq.<br />
��<br />
mpq =<br />
ζ<br />
x p y q f(x, y) dx dy p, q = 0, 1, 2, 3... (1.5)<br />
In this equation, the basis set is defined as x p y q (compare to equation 1.3).<br />
As the number of values in the image region is discrete and finite, the integral can<br />
be replaced by a summation to make it easier to compute. The equation can then be<br />
written as<br />
mpq = �<br />
A<br />
x p y q f(x, y) dx dy p, q = 0, 1, 2, 3... (1.6)<br />
where A is the number of pixels in the image region.<br />
Moments that are calculated from a binary (or silhouette) image are called silhouette<br />
moments. The pixels of a binary image can only adopt the values 0 and 1. If a pixel<br />
is part of an image region, it is set to 1. If it belongs to the background, its value is<br />
0. For silhouette moments, the image region ζ only contains the pixels with value 1,<br />
17
1. State of the Art<br />
and the equation can be written in the form<br />
��<br />
mpq =<br />
Shape Representation Using Moments<br />
ζ<br />
x p y q dx dy p, q = 0, 1, 2, 3... (1.7)<br />
A set of low order moments can be used to describe the shape of image regions.<br />
Geometrical properties like the image area, the center of mass and the orientation<br />
can be defined by using moments of zeroth, first and second order. The moment<br />
of zeroth order (m00) represents the total intensity of an image. If the image is<br />
binary, m00 represents the image area, that is the number of foreground pixels. The<br />
intensity centroid can be calculated by combining first order moments m10, m01 with<br />
the moment of order zero. The first order moments “provide the intensity moment<br />
about the y-axis and x-axis of the image” [Mukundan and Ramakrishnan, 1998, p.<br />
12]. For example, m10 on a silhouette image sums up all the x-coordinates of the<br />
image region. The centroid c = (xc, yc) is given by<br />
xc = m10<br />
m00<br />
, yc = m01<br />
. (1.8)<br />
For a silhouette image, c represents the geometrical center of the image region, also<br />
called the center of mass.<br />
Central moments shift the reference system to the centroid to make the moment<br />
m00<br />
calculations independent of the image area position. They are defined as<br />
��<br />
µpq =<br />
ζ<br />
(x − xc) p (y − yc) q f(x, y) dx dy p, q = 0, 1, 2, 3... (1.9)<br />
As the image region remains unchanged during the transformation and the pixel co-<br />
ordinates are in equal shares on both sides of the reference system, we have<br />
µ00 = m00; µ10 = µ01 = 0. (1.10)<br />
According to equation 1.9, the image area is traversed twice for central moment cal-<br />
culations, as the centroid is determined before µpq can be calculated. The work of<br />
18
1. State of the Art<br />
Rocha et al. [2002] avoids the double traversation. It uses the following equations for<br />
the calculation of the second order central moments:<br />
µ20 = m20<br />
− x<br />
m00<br />
2 c<br />
µ11 = m11<br />
− xcyc<br />
m00<br />
µ02 = m02<br />
− y<br />
m00<br />
2 c<br />
(1.11)<br />
(1.12)<br />
(1.13)<br />
The second order moments are “a measure of variance of the image intensity distri-<br />
bution about the origin. The central moments µ20, µ02 give the variances about the<br />
mean (centroid). The covariance is given by µ11.” [Mukundan and Ramakrishnan,<br />
1998, p. 12]. The second order central moments can also be seen as moments of inertia<br />
with the coordinate axes moved to have the intensity centroid as their origin. If these<br />
so-called principal axes of inertia are used as the reference system, they make the<br />
product of inertia component (µ11) vanish. The moments of inertia (µ20, µ02) of the<br />
image about this reference system are then called the principal moments of inertia.<br />
We can use these moments to provide useful descriptors of shape. The work of Morse<br />
[2004] gives a good description of these techniques:<br />
“Suppose that for a binary shape we let the pixels outside the shape have<br />
value 0 and the pixels inside the shape value 1. The moments µ20 and<br />
µ02 are thus the variances of x and y respectively. The moment µ11 is the<br />
covariance between x and y [...] . You can use the covariance to determine<br />
the orientation of the shape.”<br />
The covariance matrix C is<br />
C =<br />
�<br />
µ20 µ11<br />
µ11 µ02<br />
�<br />
(1.14)<br />
By finding the eigenvalues and eigenvectors of C and looking at the ratio of the eigen-<br />
value, we can determine the eccentricity, or elongation, of the shape. The direction<br />
of elongation can then be derived using the direction of the eigenvector whose corre-<br />
sponding eigenvalue has the largest absolute value.<br />
19
The eigenvalues of C are defined as<br />
1. State of the Art<br />
I1 = (µ20<br />
�<br />
+ µ02) + (µ20 − µ02) 2 + 4µ 2 11<br />
�<br />
2<br />
I2 = (µ20 + µ02) −<br />
(µ20 − µ02) 2 + 4µ 2 11<br />
The semi-major axis a and the semi-minor axis b can then be calculated as<br />
2<br />
(1.15)<br />
a = � 3 ∗ I1; b = � 3 ∗ I2. (1.16)<br />
These axis-calculations are derived from the paper by Rocha et al. [2002]. Other au-<br />
thors described a and b differently ([Mukundan and Ramakrishnan, 1998, p. 14],[Sonka<br />
et al., 1999, p. 258]). During the implementation phase, testing results were most ap-<br />
propriate with the usage of the stated formulas.<br />
The orientation angle θ of one of the principal axis of inertia with the x-axis is given<br />
by<br />
Image Moments for Feature Tracking<br />
θ = 1<br />
2 tan−1<br />
� �<br />
2µ11<br />
. (1.17)<br />
µ20 − µ02<br />
The work of Rocha et al. [2002] introduces a moment-based object tracking method<br />
where the object in the binary image is approximated by best-fitting ellipses. Binary<br />
Space Partitioning (BSP), a method for recursively subdividing a space into convex sets<br />
by hyperplanes, is used for the approximation. Each node of the BSP tree represents<br />
a part of the image object, described by its best-fitting ellipse.<br />
(a) (b) (c)<br />
Figure 1.11.: Object fitting by 2 k ellipses at each level. Construction of the BSP tree<br />
at level 0 (a), level 1 (b), and the result of level 3 (c) [Rocha et al., 2002].<br />
20
1. State of the Art<br />
As illustrated in Figure 1.11, the algorithm starts by calculating the ellipse of the<br />
root node (level 0). Then, the image region is divided along the minor axis, and the<br />
child nodes are created, each incorporating the pixels on one side of the splitting axis.<br />
This subdivision is repeated until a certain predefined tree depth is reached where the<br />
ellipses sufficiently approximate the image shape (see (c) in Figure 1.11).<br />
The approach by Rocha et al. [2002] was designed for basic shape tracking purposes,<br />
with only one simple object on the image region. It is not yet used and evaluated for<br />
more complex tasks, such as a facial feature tracking. As stated by the authors,<br />
“problems that we did not address in this paper are occlusion, tracking of multiple<br />
objects and motion discontinuities. Future work will go in these directions”[Rocha<br />
et al., 2002]. We did not find any further papers that base their work on this moment<br />
tracking algorithm.<br />
Evaluation<br />
Despite its simple approach, the proposed ellipse approximation method of Rocha<br />
et al. [2002] surprises with the quality of the achieved results. The paper is described<br />
in a very legible way, and the results are illustrated graphically. Therefore the work<br />
presages a straightforward implementation. The algorithm is not yet tested on multiple<br />
objects, but with an appropriate region selection on the preprocessed face images, we<br />
can simplify the object structures in order to make them applicable to the tracking<br />
procedure. The paper does not state processing times, but the design of the algorithm<br />
permits to expect short operating times.<br />
21
1.4. Commercial Implementations<br />
Overview<br />
1. State of the Art<br />
The number of facial movement tracking software on the market is still very limited.<br />
During inquiry, we have found two products. X-IST FaceTracker by the German com-<br />
pany noDNA (http://www.nodna.com/FaceTracker.26.0.html) and VeeAnimator<br />
by Vidiator Technology (USA, http://www.vidiator.com/facestation.php). Both<br />
of them keep the technical specification short and do not provide information on what<br />
tracking methods and algorithms have been used.<br />
The basic operating sequence is the same for the two systems, even though they differ<br />
in some key factors. Both of them take video streams as input data, are able to process<br />
and transfer in realtime, and provide data for proprietary 3D animation software.<br />
However, only VeeAnimator can operate without physical markers on the tracked<br />
person. They also differ in the scope of supply, hardware requirements and integration<br />
with proprietary 3D animation software, where the German product is ahead. Still<br />
surprising is the fact that VeeAnimator, which gets by without any physical markers,<br />
is about a fourth the price of the X-IST FaceTracker.<br />
X-IST FaceTracker<br />
The X-IST FaceTracker is characterized by a head-mounted video camera, required<br />
facial markers and lighting conditions, and the support for range of different 3D an-<br />
imation formats. In contrast to the VeeAnimator, X-IST FaceTracker uses its own<br />
proprietary headset for video recording (see Figure 1.12). The camera on this headset<br />
is near infrared sensitive, with PAL or NTSC video output and adjustable camera<br />
focus. It has a near infrared dimmable light source built into. Currently, X-IST works<br />
on Microsoft Windows 2000, it will be available for Windows XP Professional in future.<br />
22
1. State of the Art<br />
It works with infrared reflective markers on the face of the tracked person, which<br />
are then recognized by the tracking software. To detect these markers correctly, the<br />
studio environment has to be kept in fluorescent (cold) light, without daylight or<br />
other warm light sources such as halogene or light bulbs. It provides drivers for 3D<br />
animation programs (Alias Mocap, Famous3D, 3ds Max, FBX), a Portable Control<br />
Unit (PCU) and a Software Development Kit (SDK) for 3rd party integration. The<br />
package with the headset system and the provided software costs e 6.999, without<br />
required additional hardware and drivers.<br />
Figure 1.12.: The X-IST FaceTracker. With the provided headset (on the left) it is<br />
possible to create facial animations.<br />
VeeAnimator (formerly FaceStation)<br />
VeeAnimator stands out with the ability to track in realtime, without the use of<br />
physical markers and with standard hardware components, which makes the tracking<br />
process simple in execution.<br />
It is “a suite of software applications that allow you to animate heads and faces<br />
in Discreet’s 3ds max or Alias|Wavefront Maya” [vidiator, 2004] that uses a normal<br />
video camera. The camera does not have to be head mounted and, in contrast to<br />
the X-IST FaceTracker, whole head movements are recorded. The software places 22<br />
virtual markers at key positions on the face. The movement of these markers is then<br />
‘tracked’ from each video frame to the next to generate facial animation data. This<br />
data is used to animate a model in the 3D animation package.<br />
23
1. State of the Art<br />
At any given video frame, the face is analyzed into a mixture of 16 different facial<br />
expression elements (including smile, frown, lip pucker, vowel sounds, raised eyebrows,<br />
closed eyelids). These facial expression elements can then be used for animation, for<br />
example to drive a set of morph targets with the defined expressions. The software<br />
additionally provides audio (speech) analysis tools that can be used to refine lip move-<br />
ments. The big advantage of VeeAnimator is that is does not need any additional<br />
hardware or special lighting. Soft diffused illumination on the actors face, from what-<br />
ever light source, is sufficient for the program to work satisfactorily. Figure 1.13 shows<br />
a tracking example with this software, taken from the VeeAnimator demonstration<br />
video 2 .<br />
(a) (b)<br />
Figure 1.13.: VeeAnimator in action. The tracked feature points (a), the real-life per-<br />
son (right) and its virtual reality equivalent during realtime tracking (b).<br />
VeeAnimator contains 4 parts: FaceLifter tracks prerecorded computer video files,<br />
FaceTracker does realtime tracking on video streams, FaceDriver is the 3ds Max or<br />
Maya plug-in component, and the Avatar Editor creates fully textured head models.<br />
Comparison<br />
Table 1.1 on page 25 gives a summarizing overview over the mentioned two programs.<br />
They differ in a lot of points, especially in prerequisites and the required hardware.<br />
Especially the comparison of the number of supported feature points of these two ap-<br />
plications is interesting.<br />
2 http://www.vidiator.com/demos/facestation/FSDemoFinal_small.wmv<br />
24
Components<br />
1. State of the Art<br />
X-IST FaceTracker V 4.5 VeeAnimator<br />
Included Software package, headset<br />
system, cables and marker<br />
tape<br />
Software package<br />
Required PCI Framegrabber Card Ordinary digital video<br />
Optional Drivers/Converters for 3rd<br />
Requirements<br />
party animation software;<br />
PCU; SDK<br />
Software Windows 2000<br />
(XP in progress)<br />
camera, ‘Alias Maya 3D’ or<br />
‘Autodesk 3ds Max’<br />
Windows 2000/XP;<br />
Maya 4.5 or 5.0 /<br />
3ds Max 4.26, 4.3, 5.0, 6.0<br />
Clock rate ≥800 MHz prerecorded: ≥700 MHz,<br />
Hardware 20 GB HD, 128 MB RAM,<br />
Specification<br />
2D Graphics Card XGA, 1<br />
PCI Slot<br />
Feature Points up to 36 (typically 15) 22<br />
Physical Markers Yes No<br />
Environment no daylight/warm light;<br />
fluorescent (cold) light only<br />
Tracking Rates 25/50 fps (PAL),<br />
30/60 fps (NTSC)<br />
realtime: ≥2.0 GHz<br />
200 MB HD,<br />
Maya 3D / 3ds Max<br />
requirements<br />
soft defused illumination<br />
30/60 fps (NTSC)<br />
Price e 6,999.00 �1,995.00<br />
Table 1.1.: Comparison of commercial products<br />
25
1.5. Summary<br />
1. State of the Art<br />
Many researchers have already developed facial feature tracking algorithms, describing<br />
their work with different levels of detail. The examined approaches based on optical<br />
flow use the FACS, which suffers from major drawbacks because of the lack of spatial<br />
information. Other methods that work with snakes combine various tracking and loca-<br />
tion techniques and are hence more complex. Moreover, snakes can suffer the leaking<br />
edge problem, which worsens the result dramatically. The investigated image moment<br />
technique is straightforward, but may not be applicable to complex tracking tasks. No<br />
paper states information about the used programming language, and sources are not<br />
freely available on the Internet. We therefore assume that all works are implemented<br />
with a platform dependent language like C++, which may have advantages in the<br />
required processing time, but raises constraints in the portability and the ease of use.<br />
26
2. Algorithms in Consideration<br />
2.1. Overview<br />
In Chapter 1 we summarized different approaches for facial feature tracking. We<br />
showed that FACS-based methods have difficulties as they were not originally devel-<br />
oped for machine recognition. The motion-based approach surprised to be straight-<br />
forward and comprehensible. Other algorithms have been computationally expensive<br />
or seem not to be straightforward to implement. According to their description in pa-<br />
pers and their predicted practicability, we selected two tracking procedures for a closer<br />
examination: active contour models (snakes) and moment-based tracking. Both ap-<br />
proaches need manual or automated initialization in the first video frame. Snakes need<br />
an initial contour and therefore exact feature points for processing, the moment-based<br />
solution works on the complete picture, but needs initialization of feature regions be-<br />
cause it can only recognize single objects. The required preprocessing steps for the two<br />
methods are also similar. They both work on binary edge images, but will presumably<br />
produce better results on grayscale edge images, where the edge intensity varies and<br />
and therefore also weaker edges can be handled. The two algorithms mainly differ in<br />
their implementation complexity and processing time. This factor is investigated in<br />
this chapter.<br />
2.2. Testing Method<br />
2.2.1. Testing Tool<br />
For testing the practicability and performance of the algorithms in consideration, we<br />
have used and extended Java code examples which already implement the required<br />
functionality. These examples are programmed as plugins for the Java based image<br />
processing tool ImageJ. It is a public domain program, available at http://rsb.info.<br />
nih.gov/ij/.<br />
27
2. Algorithms in Consideration<br />
On the homepage, the program is described as following:<br />
“ImageJ is [...] inspired by NIH Image for the Macintosh. It runs, either as<br />
an online applet or as a downloadable application, on any computer with a<br />
Java 1.1 or later virtual machine. Downloadable distributions are available<br />
for Windows, Mac OS, Mac OS X and Linux. [...]<br />
ImageJ was designed with an open architecture that provides extensibil-<br />
ity via Java plugins. Custom acquisition, analysis and processing plugins<br />
can be developed using ImageJ’s built in editor and Java compiler. User-<br />
written plugins make it possible to solve almost any image processing or<br />
analysis problem.”<br />
At the time of inquiry, ImageJ was available in version 1.33, which had errors in<br />
working with Java 1.5 on Linux 1 and was therefore used with Java 1.4.2. The recent<br />
ImageJ version 1.34 works fine with Java 1.5.<br />
2.2.2. Input Data<br />
For the following tests we used a binary edge image of a human face. We therefore<br />
extracted a video frame that shows a face in neutral position in the middle of the im-<br />
age. This enables us to have clearly identifiable facial features, represented by edges,<br />
which eases the selection of feature region and therefore the correct comparison of the<br />
output. Moreover, we approximate the test situation to the conditions of the final<br />
Java tracking program. In order to transform the video frame into the correct format,<br />
we converted the color image into a grayscale image and processed it with a Canny<br />
edge detector.<br />
In the following sections we describe the results of the ImageJ feature tracking<br />
plugins.<br />
1 java.lang.NullPointerException is thrown during image window initialization.<br />
28
2.3. Testing Snake Algorithms<br />
2. Algorithms in Consideration<br />
We have found two ImageJ plugins that implement Snake algorithms, which both<br />
work on grayscale images: Jacob’s SplineSnake implementation, and the snake plugin<br />
by Boudier.<br />
SplineSnake The SplineSnake implementation of Jacob et al. [2004] allows to select<br />
any required image region by drawing a path onto the source image. Points on this<br />
path, which have a preset distance between each other, are called knots and are the<br />
initialization for the snake algorithm. Additionally, the user can specify constraint<br />
knots that have to be passed by the final snake. All adjustable parameters are de-<br />
scribed at http://ip.beckman.uiuc.edu/Software/SplineSnake/usage.html, the<br />
values in Table 2.1 are directly used by the Snake algorithm:<br />
Parameter Default<br />
Image energy: proper linear combination of gradient and region<br />
energies can result in better convergence. The right combina-<br />
tion depends on the image.<br />
“100% Region”<br />
Maximum number of iterations. 2000<br />
Size of one step during optimization. 2.0<br />
Accuracy to which the snake is optimized. 0.01<br />
Smoothing radius of the image smoothing procedure that is<br />
computed before running the snake algorithm.<br />
Spring weight: specifies how the constraint knots are weighted. 0.75<br />
Table 2.1.: SplineSnake parameters.<br />
For testing, we have drawn a nearly rectangular path around the mouth, with a<br />
knot distance of 5 pixels. During the testing process, we varied the step size and the<br />
number of iterations. Satisfying results were possible with a step size of 10, and 200<br />
iterations. With 50 iterations (as in (b) and (c)), we were not able to approximate the<br />
mouth contour close enough. A step size of more than 10 did not enhance the process.<br />
Results are illustrated in Figure 2.1.<br />
29<br />
1.5
(a)<br />
(c)<br />
2. Algorithms in Consideration<br />
Figure 2.1.: Overview of SplineSnake results. The initial selection (a), SplineSnake<br />
with step size 1.0 and 50 iterations (b), step size 10.0 and 50 iterations<br />
(c), and SplineSnake with step size 10.0 and 200 iterations.<br />
SplineSnake cannot omit small sources of interference in its processing. A tracking<br />
example with distracting pixels is illustrated in Figure 2.2.<br />
Figure 2.2.: SplineSnake interference. The final snake (the inner red line) is not able to<br />
ignore single interfering pixels on the right side of the upper lip contour.<br />
The plugin delivers information about the processing time and the resulting snake<br />
knots. Table 2.2 shows a result of SplineSnake test cycles. For these results, we tested<br />
with different manually drawn mouth selections, 200 snake iterations and a step size<br />
of 10.0. Other values were not changed from default. The average processing time<br />
after 20 cycles was 2.42 seconds, with 26.3 resulting knots and 4.35 curve-describing<br />
samples per knot.<br />
30<br />
(b)<br />
(d)
2. Algorithms in Consideration<br />
Requiring a tracking program that is able to work close to realtime, these tracking<br />
times are not supportable. However, we have to notice that the tracking times of<br />
subsequent video frames could be reduced by initializing the snake with the parameters<br />
of the preceding frame. The initial snake would then be close to the final snake, and<br />
therefore less cycles (presumably < 10) have to be processed.<br />
no. knots samples/knot time<br />
1 29 4 2.280<br />
2 27 4 2.587<br />
3 29 4 3.846<br />
4 25 4 2.674<br />
5 32 3 4.133<br />
6 27 5 2.844<br />
7 24 5 3.684<br />
8 17 6 0.616<br />
9 23 5 1.353<br />
10 27 4 2.207<br />
11 25 4 1.579<br />
12 25 5 0.849<br />
13 25 3 2.027<br />
14 27 5 2.051<br />
15 25 4 1.543<br />
16 25 4 2.662<br />
17 22 4 1.233<br />
18 35 5 4.131<br />
19 29 4 3.263<br />
20 28 5 2.833<br />
26.3 4.35 2.420<br />
Table 2.2.: SplineSnake: Results<br />
31
2. Algorithms in Consideration<br />
Boudier Snake Plugin The second ImageJ plugin for snakes is written by Thomas<br />
Boudier. It is available at http://www.snv.jussieu.fr/~wboudier/softs/snake.<br />
html. For testing, we used the default parameters listed in Table 2.3.<br />
Parameter Value<br />
Gradient threshold 20<br />
Regularization 0.10<br />
Number of iterations 200<br />
Step result show 5<br />
Alpha-Canny-Deriche 1.00<br />
Table 2.3.: Bodier snake parameters<br />
For a comparison to the SplineSnake plugin, we chose to use a rectangular initial<br />
selection. As illustrated in Figure 2.3, the success of the snake procedure greatly<br />
depends on this initial selection. During testing, a change of the selection by one pixel<br />
resulted in extreme outgrowths of the resulting snake.<br />
1(a)<br />
2(a)<br />
Figure 2.3.: Overview of snake results. 1: a Selection of (231, 208, 59, 16) (a) delivers<br />
1(b)<br />
2(b)<br />
good results (b), 2: an enlargement of the region width by 1 pixel (a) has<br />
significant negative effects (b).<br />
Table 2.4 shows testing results that were made with this snake plugin. The values<br />
specified represent the rectangular selection on the edge image round the mouth region<br />
32
2. Algorithms in Consideration<br />
(position on x/y-axes, width and height). A checkmark in the last column indicates<br />
whether the result is satisfying (that is the snake bounds the mouth region), or leaked<br />
out over a big part of the displayed face.<br />
As the results show, the plugin delivers a successful result in only about one third of<br />
the testcases. In the last row we took the average of all selections as snake initialization<br />
values, which also lead to a negative outcome. If the region selection is closer to the<br />
mouth contour, for example with an elliptical selection, the algorithm works more<br />
reliably.<br />
no. x y w h result<br />
1 236 215 54 9 ✗<br />
2 236 210 54 16 ✗<br />
3 235 210 55 15 ✗<br />
4 234 207 56 20 ✗<br />
5 233 211 57 17 ✗<br />
6 233 211 54 18 ✗<br />
7 233 208 60 18 ✗<br />
8 233 208 58 17 ✗<br />
9 232 209 61 21 ✗<br />
10 232 209 61 19 ✗<br />
11 232 209 60 17 ✓<br />
12 231 208 64 21 ✗<br />
13 231 208 62 19 ✓<br />
14 231 208 60 16 ✗<br />
15 231 208 59 16 ✓<br />
233 209 58 17 ✗<br />
Table 2.4.: Snake: Results<br />
33
2.4. Testing Image Moments<br />
2. Algorithms in Consideration<br />
An ImageJ moment calculation implementation was found at http://rsb.info.nih.<br />
gov/ij/plugins/moments.html, which was apparently integrated into ImageJ ver-<br />
sion 1.34 2 . The plugin calculates image moments from rectangular image selections<br />
up to the 4th order, and calculates the elongation and orientation of objects. The<br />
implementation allows a mapping of image intensity values before the moments are<br />
calculated. For that purpose, it uses the equation<br />
pi,j = f ∗ (pi,j − c) (2.1)<br />
where pi,j is the intensity value of the pixel. Factor f and cutoff c can be specified<br />
manually in the user interface. This mapping allows the user to specify another back-<br />
ground color than black (by setting the cutoff accordingly), and to process images with<br />
a different color range (by changing the factor). The plugin provides tabular output<br />
of the moment calculations, the results are not illustrated in the image. The calcu-<br />
lations and the provided source code still give a good overview on how the moment<br />
calculations work. The implementation is straightforward and very comprehensible.<br />
It executes the following steps:<br />
Step 1: Compute moments of order 0 and 1.<br />
Step 2: Compute coordinates of the centroid.<br />
Step 3: Compute moments of orders 2, 3, and 4.<br />
Step 4: Normalize 2nd moments and compute the variance around the centroid.<br />
Step 5: Normalize 3rd and 4th order moments and compute the skewness (symmetry)<br />
and kurtosis (peakedness) around the centroid.<br />
Step 6: Compute orientation and eccentricity.<br />
Source: Awcock [Awcock, 1995, pp. 162–165]<br />
In the case of a moment-based facial feature tracker, moment calculations above the<br />
2nd order are not necessary. Step 5 and 6 can therefore be left out. Note that the<br />
image pixels have to be traversed twice (in step 1 and step 3), which increases the<br />
complexity of g(n) = (n) for a region with n foreground pixels by factor 2.<br />
2 Measurements in ImageJ (‘Analyze→Set Measurements...’ and ‘Analyze→Measure’)<br />
34
2. Algorithms in Consideration<br />
For testing purposes, we changed the plugin code so that it displays processing time<br />
information. This information shows calculation times between 10 ms and < 1 ms for a<br />
60x20 pixel selection. The more often the plugin is executed, the less calculation time<br />
is needed. JVM caching may be responsible for that behavior. This time information<br />
cannot be compared one-to-one to the data produced by the snake code. The plugin<br />
calculates moments of higher orders, which are not necessary for a feature tracker.<br />
Still, this procedure does not have an influence on the complexity of the algorithm, as<br />
no additional traversation of the image pixels is necessary. The complexity will change<br />
with the implementation of the BSP tree structure, as the moment information has<br />
to be calculated for every tree node. It is then g(n) = O(log(d) ∗ n) for a tree depth<br />
of d. Still, the processing times are far shorter than those of the snake plugin, and we<br />
assume that this will also be the case if Java tracker is based on moments.<br />
2.5. Summary<br />
We have shown that the performance and reliability of snake algorithms strongly de-<br />
pends on the adjustment of its parameters and the initialization of snake knots. The<br />
accurate selection of the image area and snake parameters has been problematic and<br />
challenging during the test phase. Small changes of region selections have caused in-<br />
comprehensible huge differences in the processing results. Calculation times of about<br />
2.5 seconds per execution seem to be too high for an application that aims to work<br />
close to realtime. The time could be reduced in subsequent frames by initializing the<br />
snake knots with the knots of the previous frame. Then the major execution time<br />
would only accrue in the first video frame. In contrast, moment calculations proved<br />
to be straightforward, fast and comprehensible. In addition to the results of our tests,<br />
the paper about tracking with moments calculation [Rocha et al., 2002] describes an<br />
exact course of action, which gives a clear path on how to proceed and therefore eases<br />
future work. For that reason we decided to implement a moment-based Java feature<br />
tracker.<br />
The next section describes the necessary prerequisites and preparations that have<br />
to be made, so that a moment-based tracking algorithm can work satisfactorily.<br />
35
3. Input Data and Its Preparation<br />
3.1. Overview<br />
In order to be applicable to the selected tracking algorithm, the input video has to<br />
be read and transformed into a proper format and quality. In this chapter, we divide<br />
this procedure into 2 steps. First, we read the input data and extract the video<br />
frames. Therefore, we specify prerequisites according to the used reading technique<br />
and the presumed video quality, and select sample input videos to be used in the<br />
development process. In the second step, we process the data in order to enhance<br />
the image features. We discuss the requirements of the tracking algorithm, and state<br />
how we meet these requirements with edge detection algorithms. Well-established<br />
algorithms are explained as well as the Java libraries that can facilitate this step.<br />
3.2. Prerequisites<br />
3.2.1. Data Format Prerequisites<br />
Media API Selection<br />
In order to specify the video format for the program, we have to know what Application<br />
Programming Interface (API) is used for the media handling. The API should be able<br />
to incorporate time-based media into the implemented Java application. It should be<br />
a platform independent and pure Java library to avoid Java Native Interface (JNI) for<br />
dumping into native media API’s. We investigated the following possibilities:<br />
� JMF, developed by Sun Microsystems and IBM<br />
� Quicktime for Java, developed by Apple<br />
� MPEG-4 Toolkit, developed by IBM<br />
Implementing a media import from scratch is not stated as an option, as the complexity<br />
of the task and the necessary implementation time do not meet the time constraints<br />
of the project.<br />
36
3. Input Data and Its Preparation<br />
Java Media Framework The JMF 2.0 API is Sun Microsystem’s freely available<br />
API that enables the presentation of time-based media. It provides support for cap-<br />
turing and storing media data, controlling the type of processing, and performing<br />
custom processing on media data streams. In addition, JMF 2.0 defines a plug-in<br />
API that enables the programmer to customize and extend JMF functionality. The<br />
current JMF 2.1.1e Reference Implementation supports the media types and formats<br />
listed in Table 3.1 1 . The list of formats is limited and, due to the latest JMF release<br />
date in March 2001, does not contain currently well-established formats like MPEG-4;<br />
MPEG-1 is only supported in the platform specific performance packs. Therefore the<br />
pure Java version of JMF is not able to decode MPEG-1 videos. Moreover, differ-<br />
ent authors, for example Davison [2005], describe the framework as buggy. Search-<br />
ing for JMF on Sourceforge 2 only returns a handful of sparsely active projects that<br />
are dealing with JMF’s video functionality. However, sticking with JMF despite its<br />
limited collection of supported media formats and codecs is, according to Adamson<br />
(http://www.oreillynet.com/pub/wlg/2933), still the most practical all-java op-<br />
tion.<br />
The JMF API Guide [JMF] describes the basic working model as following:<br />
“A data source encapsulates the media stream much like a video tape and<br />
a player provides processing and control mechanisms similar to a VCR.<br />
Playing and capturing audio and video with JMF requires the appropriate<br />
input and output devices such as microphones, cameras, speakers, and<br />
monitors.”<br />
A lot of implementation examples are freely available for JMF, for example in Sun’s<br />
JMF Forum (http://forum.java.sun.com/forum.jspa?forumID=28). An ImageJ<br />
plugin for JMF is available at http://rsb.info.nih.gov/ij/plugins/jmf-player.<br />
html.<br />
1 found at http://java.sun.com/products/java-media/jmf/2.1.1/formats.html<br />
2 One of the largest collections of Open Source software: http://www.sourceforge.net<br />
37
Media Type Cross Platform<br />
3. Input Data and Its Preparation<br />
Version<br />
Solaris/Linux<br />
Performance Pack<br />
Windows<br />
Performance Pack<br />
AVI (.avi) read/write read/write read/write<br />
Cinepak D D,E D<br />
MJPEG (422) D D,E D,E<br />
RGB D,E D,E D,E<br />
YUV D,E D,E D,E<br />
VCM 3 - - D,E<br />
HotMedia<br />
(.mvr)<br />
read only read only read only<br />
IBM HotMedia D D D<br />
MPEG-1 Video<br />
(.mpg)<br />
Multiplexed<br />
System stream<br />
Video-only<br />
stream<br />
QuickTime<br />
(.mov)<br />
- read only read only<br />
- D D<br />
- D D<br />
read only read only read only<br />
Cinepak D D,E D<br />
H.261 - D D<br />
H.263 D D,E D,E<br />
JPEG (420, 422,<br />
444)<br />
D D,E D,E<br />
RGB D,E D,E D,E<br />
D: format can be decoded and presented<br />
E: media stream can be encoded in the format<br />
read: media type can be used as input (read from a file)<br />
write: media type can be generated as output (written to a file)<br />
Table 3.1.: JMF 2.1.1 - Supported Video Formats<br />
3 VCM - Window’s Video Compression Manager support. Tested for these formats: IV41, IV51,<br />
VGPX, WINX, YV12, I263, CRAM, MPG4.<br />
38
3. Input Data and Its Preparation<br />
Quicktime for Java QuickTime for Java (QTJava) brings together the QuickTime<br />
movie player and the Java programming language. As a result, it is possible for<br />
Java applications to play QuickTime movies, edit and create them, capture audio and<br />
video, and perform 2D and 3D animations. QTJava provides a basic set of function-<br />
ality across all platforms that support Java and QuickTime. It is currently in version<br />
6.4, which works with Java 1.4.1. “The previous version of QTJava supported J2SE<br />
1.4.1, but only on Windows” 4 . QTJava wins in the supported media types, as it can<br />
play all types supported by the current QuickTime version. These formats include<br />
MPEG-4, Flash 5, H.261, H.263, H.264, DV and DVC Pro NTSC, DV PAL and DVC<br />
Pro PAL. Iverson describes the media playback handling with QTJava in his book<br />
“Mac OS X for Java Geeks” [Iverson, 2003, p. 154]. He praises the “rich range of<br />
supported media types”, but claims the API to be “still relatively C-like”.<br />
QTJava consists of two layers 5 :<br />
� A core layer which provides the ability to access the complete QuickTime API<br />
� An application framework for easy integration into Java applications. It includes:<br />
1. Integration of QuickTime with the Java Runtime. This includes sharing<br />
display space between Java and QuickTime and sharing events from Java<br />
with QuickTime.<br />
2. A set of classes that simplifies the effort required to perform common tasks<br />
while providing an extensible framework that application developers can<br />
customize to meet their specific requirements.<br />
The Java method calls are claimed to provide very little overhead to the native call;<br />
they do parameter marshalling and check the result of the native call for any error<br />
conditions. The major limitation of QTJava is that QuickTime is only supported for<br />
Windows and Mac platforms. As this project claims to develop a platform-independent<br />
software that runs on all platforms that provide a Java Virtual Machine (JVM) 1.4.1<br />
or higher, this library cannot be used. Moreover, we want to omit that additional<br />
programs have to be installed in order to make the Java tracker work correctly.<br />
4 see http://developer.apple.com/quicktime/qtjava/<br />
5 as described in an Apple developer article at http://developer.apple.com/quicktime/qtjava/overview.html<br />
39
3. Input Data and Its Preparation<br />
IBM Toolkit for MPEG-4 The IBM Toolkit for MPEG-4 is currently in version 1.2.4,<br />
which is usable for Java 1.1 up to 1.5. It consists of a set of Java classes and APIs<br />
with five sample applications: three cross-platform playback applications, and two<br />
tools for generating MPEG-4 content for use with MPEG-4-compliant devices. These<br />
applications are the following:<br />
� AVgen: a simple, easy-to-use GUI tool for creating audio/video-only content for<br />
ISMA- or 3GPP-compliant devices<br />
� XMTBatch: a tool for creating rich MPEG-4 content beyond simple audio and<br />
video<br />
� M4Play: an MPEG-4 client playback application<br />
� M4Applet for ISMA: a Java player applet for ISMA-compliant content<br />
� M4Applet for HTTP: a Java applet for MPEG-4 content played back over HTTP.<br />
Since the toolkit is Java based, the client applications and the content creation appli-<br />
cations are cross-platform and will run on any Java-supporting platform. Its minimum<br />
requirement is a Java SDK with Swing, for higher performance and more capabilities<br />
SDK version 1.4 or above is recommended. More details can be found at the project<br />
homepage 6 . The major disadvantage of the IBM Toolkit is that is not freely avail-<br />
able. It is possible to download a 90 days trial license, commercial licenses cost from<br />
�500 upwards. Furthermore, it is limited to the playback of MPEG-4 movies, which<br />
decreases the range of possible input data.<br />
MPEG-4 Video for JMF The MPEG-4 Video for JMF is a freely available plug-in<br />
that enables decoding of MPEG-4 videos in Java, independent of the IBM Toolkit for<br />
MPEG-4. This plug-in allows for the decoding of MPEG-4 video streams, which are<br />
created with any encoder that supports the MPEG-4 Simple Profile. The decoder of<br />
MPEG-4 Video for JMF can be used on any JMF-enabled platform. In order to func-<br />
tion, it needs JMF 2.1.1 and all the JMF requirements. “The implementation is 100%<br />
pure Java and has undergone special optimizations to ensure adequate performance”<br />
(http://www.alphaworks.ibm.com/tech/mpeg-4).<br />
6 see http://www.alphaworks.ibm.com/tech/tk4mpeg4. Implementation demos are available at<br />
http://www.research.ibm.com/mpeg4/Demos/index.htm<br />
40
3. Input Data and Its Preparation<br />
Selection The JMF was selected for the Java motion tracker, as it is the only freely<br />
available solution that works on a Java-enabled machine without further requirements.<br />
We are aware that data format and implementation problems could arouse due to the<br />
development status of the library. If MPEG-4 support is needed at a later point, the<br />
IBM-plugin can be used to enhance the JMF functionality.<br />
3.2.2. Video Quality<br />
Due to the selected tracking algorithm and missing preliminary tracking stages, the<br />
Java tracker has certain requirements to the input videos. For the desired quality of<br />
the tracking algorithm, it is necessary that the frame sequence is continuous, which<br />
should be given if the video is directly recorded with about 25 frames per second.<br />
The lighting conditions should be diffused soft light, frontal on the tracked face. As<br />
discussed in Section 1.2, the Java tracker leaves out the face tracking step. Therefore<br />
the face has to be in the middle of the image, the size and orientation have to stay<br />
almost constant.<br />
3.2.3. Video Samples<br />
For testing purposes, we used video material from the University of Tübingen (http:<br />
//vdb.kyb.tuebingen.mpg.de). On their homepage, they describe the setup for video<br />
recording:<br />
“The video cameras were arranged in a semi-circle around the subject at<br />
a distance of roughly 1.3m [as shown in Figure 3.1]. Each camera was<br />
centered on the subject and leveled. The cameras recorded 25 frames/sec<br />
in full PAL video resolution (786*576, non-interlaced). In order to facilitate<br />
the recovery of rigid head motion, the subject wore a headplate with 6 green<br />
foam markers attached to it.<br />
Figure 3.1.: Top view of camera layout used for recordings (taken from http://vdb.<br />
kyb.tuebingen.mpg.de).<br />
41
3. Input Data and Its Preparation<br />
Each recording contains one isolated action unit, repeated three times, with<br />
a (close to) neutral expression in between. For each action unit, there are<br />
six video files (one for each camera). Each video file has identical length<br />
and starts at exactly the same time. The videos were converted from raw<br />
single chip CCD data to RGB using a Bayer filter, then encoded as MPEG1<br />
using mpeg2enc.”<br />
The chosen MPEG-1 format, which has a well-defined specification with little or no<br />
unsupported variations and few incompatibilities between encoders and decoders. It<br />
is freely distributable, and players are available on all platforms. According to our<br />
constraints, we used camera positions C and D, as they provide faces in an almost<br />
frontal view.<br />
3.3. Preparation<br />
3.3.1. Overview<br />
After we have decided on how to read videos and extract frames, we now have to<br />
prepare the data in a way that the picture information is applicable to the track-<br />
ing algorithm and robust against small perturbations in the input data. The feature<br />
extraction process should be stable against small changes in illumination, viewing<br />
direction, and deformations of the objects in the environment. Otherwise, if small<br />
changes in any of these quantities lead to large changes in the position of facial feature<br />
points, the interpretation these points would be difficult.<br />
As described in Section 1.3.3, the moment-based tracking approach works by deter-<br />
mining the figure’s position and orientation. Therefore the algorithm needs<br />
� a binary or grayscale image and<br />
� well-defined and possibly only sparsely ragged image areas that are silhouetted<br />
against a defined background.<br />
It has to to be assured that the input data satisfies these needs. We identified two<br />
types of applicable preprocessed images: filled image regions and unfilled edge images.<br />
Figure 3.2 shows an example of these two types on a mouth region.<br />
42
3. Input Data and Its Preparation<br />
(a) (b)<br />
Figure 3.2.: Two types of binary image regions applicable for the tracking algorithm.<br />
A binarized mouth region, displayed as a filled figure (a) and with detected<br />
edges (b).<br />
Tests on these examples show that the moment-based calculation of the best fitting<br />
ellipse has similar results for both image types: The orientation angle θ differs by 0.1%<br />
between the filled region and the edge image. The ellipse axes a and b vary by 3.75<br />
and 0.77 pixels in a total area of 60x20 pixels.<br />
A disadvantage of filled regions is that the number of foreground pixels that have<br />
to be processed by a tracking program is considerably larger than in edge images.<br />
Moreover, the tracked centroids will be located in the middle of the lips, which makes<br />
it impossible to track the facial feature contours. In order to create the region images,<br />
thresholding could be used as a preparation technique. However, it has to be done<br />
differently for each image region, as, for example, the mouth region has a different<br />
color and hue distribution than eye regions. In contrast, an edge detection mechanism<br />
significantly reduces the number of foreground pixels, and the subsequent tracking<br />
algorithm can presumably locate points directly on the contour. The amount of data<br />
present in the edge map is reduced compared to the original image, which leads to a<br />
better performance of the overall system. Edge detection is the most common method<br />
for feature extraction in machine vision, the number of edge detection algorithms is<br />
enormous. Therefore we decided to use an edge detection mechanism for the prepro-<br />
cessing of the input frames.<br />
43
3.3.2. Edge Detection Algorithms<br />
3. Input Data and Its Preparation<br />
Edge detectors are “used to locate changes in the intensity function; edges are pixels<br />
where this function (brightness) changes abruptly” [Sonka et al., 1999, p. 77].The<br />
purpose is to convert the large array of brightness values that comprise an image<br />
into a compact, symbolic code. The goal is to determine the location of brightness<br />
discontinuities in the image. In order to detect such brightness changes in the in-<br />
tensity function, the edge detector algorithms mostly approximate the first or second<br />
derivative of the image function (see Figure 3.3).<br />
Figure 3.3.: Function f(x) with intensity change, its first derivative f ′ (x), and second<br />
derivative f ′′ (x).<br />
We selectively inspected 4 edge detection algorithms, which are commonly used<br />
and promise satisfying results: Prewitt and Sobel operators, Laplacian of Gaussian,<br />
and the Canny edge detection. In the next sections, we briefly describe the different<br />
algorithms.<br />
Prewitt and Sobel Edge Detectors Prewitt and Sobel operators are using filters for<br />
the estimation of local gradients that approximate the first derivative. The gradient<br />
is estimated in 8 possible directions (for a 3x3 convolution mask).<br />
44
3. Input Data and Its Preparation<br />
The first three masks for the Prewitt operator are<br />
h1 =<br />
⎡<br />
⎢<br />
⎣<br />
⎤<br />
⎡<br />
⎤<br />
⎡ ⎤<br />
1 1 1<br />
0 1 1<br />
−1 0 1<br />
⎥<br />
⎢<br />
⎥<br />
⎢ ⎥<br />
0 0 0 ⎦ , h2 = ⎣−1<br />
0 1⎦<br />
, h3 = ⎣−1<br />
0 1⎦<br />
. (3.1)<br />
−1 −1 −1<br />
−1 −1 0<br />
Accordingly, the Sobel operators are defined as<br />
−1 −2 −1<br />
−1 0 1<br />
⎡<br />
⎤<br />
⎡ ⎤<br />
⎡ ⎤<br />
1 2 1<br />
0 1 2<br />
−1 0 1<br />
⎢<br />
⎥<br />
⎢ ⎥<br />
⎢ ⎥<br />
h1 = ⎣ 0 0 0 ⎦ , h2 = ⎣−1<br />
0 1⎦<br />
, h3 = ⎣−2<br />
0 2⎦<br />
. (3.2)<br />
−2 −1 0<br />
−1 0 1<br />
The other masks can be determined by rotating the matrices of equation 3.1 and 3.2.<br />
Sobel and Prewitt filters are very similar, Sobel puts more weight on the central row<br />
and column. Its simplicity and the good results make the Sobel operator a popular<br />
edge detection mechanism. The main disadvantage of the first derivative operators<br />
is “their dependence on the size of the object and sensitivity to noise” [Sonka et al.,<br />
1999, p. 83].<br />
Laplacian of Gaussian The Laplacian of Gaussian (LoG) is combining the Laplace<br />
convolution operator with a Gaussian smoothing. The Laplace operator is approxi-<br />
mating the second derivative, which only returns the gradient magnitude and not the<br />
direction. For 4-neighborhoods and 8-neighborhoods, the 3x3 masks are defined as<br />
⎡ ⎤<br />
⎡ ⎤<br />
0 1 0<br />
1 1 1<br />
⎢ ⎥<br />
⎢ ⎥<br />
h1 = ⎣1<br />
−4 1⎦<br />
, h2 = ⎣1<br />
−8 1⎦<br />
, (3.3)<br />
0 1 0<br />
1 1 1<br />
If the Laplace operator is used separately, it responds doubly to some edges in the<br />
image. Together with the Gaussian smoothing, it is able to retrieve good results. The<br />
advantage of this approach compared to classical first derivative edge operators is that<br />
a larger area surrounding the current pixel is taken into account.<br />
Canny Edge Detector Canny’s aim was to discover the optimal edge detection al-<br />
gorithm. His parameter definition for an optimal algorithm consists of 3 criteria:<br />
45
3. Input Data and Its Preparation<br />
� good detection<br />
A low error rate. Occurring image edges are not dismissed by the algorithm.<br />
� good localization<br />
Well localized edges, being on the same position as the occurring edges.<br />
� minimal response<br />
One given edge is marked once, and image noise does not create false edges.<br />
The Canny operator works in a multi-stage process. First, the image is smoothed<br />
by Gaussian convolution. Then a simple 2D first derivative operator is applied to<br />
the smoothed image to create edges on regions of the image with high first spatial<br />
derivatives. In this step, the gradient magnitude is calculated in both x and y direction,<br />
and is thereafter combined into one edge image. The algorithm then tracks along the<br />
edges, a process known as non-maximal suppression. The tracking process is controlled<br />
by two thresholds: T 1 and T 2 with T 1 > T 2. Tracking only begins if the starting<br />
point has a value higher than T 1. Tracking then continues in both directions until the<br />
intensity value falls below T2. This method helps to ensure that noisy edges are not<br />
broken up into multiple edge fragments. The final step uses heuristic thresholding to<br />
keep only edge information and eliminate data that was wrongly identified. Figure 3.4<br />
shows the multi-stage edge detection process.<br />
Figure 3.4.: Multi-stage canny edge detection process.<br />
According to this process, the effect of the canny operator is influenced by 3 pa-<br />
rameters: the width of the Gaussian convolution mask and the thresholds T 1 and T 2.<br />
Increasing the width of the Gaussian mask “reduces the detector’s sensitivity to noise,<br />
at the expense of losing some of the finer detail in the image. The localization error in<br />
the detected edges also increases slightly as the Gaussian width is increased” [Fisher<br />
et al., 1994]. Example illustrations of Canny edge detection results with different con-<br />
volution masks in comparison to other edge detection mechanisms can be found in the<br />
work of Burger and Burge [2005, pp. 111–125].<br />
46
3.3.3. Edge Detector Realization<br />
3. Input Data and Its Preparation<br />
In order to realize the edge detection mechanisms in Java, we need backing frameworks<br />
that provide Image information and ease the image processing. We therefore used<br />
Java2D, which is part of the Java 2 Platform Standard Edition, and Java Advanced<br />
Imaging (JAI), which is an additional, freely available at the Java Sun homepage. In<br />
the following paragraphs, we shortly describe how these libraries can be used for edge<br />
detection.<br />
Java2D Edge Detectors Java2D does not provide a predefined edge detection mech-<br />
anism. In order to implement an edge detector, actual pixel values of an AWT image<br />
have to be processed. There are two ways to access these individual pixel values.<br />
The image manipulation features in AWT are primarily aimed at modifying individ-<br />
ual pixels as they pass through a ’filter’. A stream of pixel data is sent out by a<br />
ImageProducer, passes through the ImageFilter, and onto an ImageConsumer. The<br />
pre-defined ImageFilter subclass for processing individual pixels is the RGBImage-<br />
Filter. As the data is pushed out by the producer, this model is known as the push<br />
model. An alternative approach is to use the PixelGrabber class to collect all the pixel<br />
data from an image into an array, where it can then be conveniently processed. In<br />
this case, use must also be made of MemoryImageSource to funnel the changed array’s<br />
data as a stream to a specified ImageConsumer. Figure 3.5 shows an overview of the<br />
2 pixel acquisition methods.<br />
(a) (b)<br />
Figure 3.5.: Workflow of fetching individual pixels with Java2D.<br />
47
3. Input Data and Its Preparation<br />
JAI Edge Detectors JAI provides the GradientMagnitude operation, an edge de-<br />
tector that computes the magnitude of the image gradient vector in two orthogonal<br />
directions. It performs two convolution operations on the source image by detecting<br />
edges in horizontal and vertical direction. The algorithm then calculates the gradient<br />
norm of the two intermediate images.<br />
The result of the GradientMagnitude operation may be defined as<br />
dst[x][y][b] = � (SH(x, y, b)) 2 + (SV (x, y, b)) 2 (3.4)<br />
where SH(x, y, b) and SV (x, y, b) are the horizontal and vertical gradient images gen-<br />
erated from band b of the source image by correlating it with the supplied orthogonal<br />
gradient masks. The default masks for the GradientMagnitude perform a Sobel edge<br />
enhancement.<br />
3.3.4. Further Improvements<br />
The Canny edge detection mechanism gives good results for the preprocessing. Still,<br />
this method could be improved by face specific preprocessing. Different image features<br />
have a different edge intensities. The eye region, for example, provides a wide range<br />
of hard gradient transitions. The mouth region does not have these clear boundaries,<br />
and especially the lower edge of the lower lip is often lost. An improved preprocessing<br />
algorithm could assume the image regions of facial features and treat these regions<br />
with a different intensity of the Gauss smoothing. With this technique, the mouth<br />
region edges could be improved, and interferences in less important regions, like the<br />
cheeks, could be omitted. Another possibility could be to add feature specific checks<br />
after the edge detection process. These checks could then determine if an important<br />
facial edge is missing and could close this gap by reanalyzing the input data. A<br />
straightforward way to improve the results for the mouth region could be to weight<br />
the red channel higher during the grayscale image production, as the most recognizable<br />
difference between the mouth and the alongside skin parts is in the intensity of the<br />
red channel.<br />
48
3.4. Summary<br />
3. Input Data and Its Preparation<br />
In order to fulfill the requirements of the tracking algorithm, the input data has to be<br />
read and manipulated in a proper way. We therefore examined different APIs to grab<br />
frames from video input data, and decided to work with JMF, as it is available on<br />
different platforms, is freely available, and promises to be practicable. We also defined<br />
basic preconditions to the video data, and selected MPEG-1 sample data for testing<br />
purposes. In order to be applicable for the subsequent tracking algorithm, the video<br />
frames are processed with an edge detection mechanism, where we decided to look at<br />
both Java2D and JAI based functionality, preferably using a Canny edge detection<br />
method, as it shows the best tracking results. In the next section we go into detail<br />
with the code development of the Java tracker.<br />
49
4. Programming<br />
4.1. Overview<br />
After selecting the necessary libraries and methodologies, we can now illustrate the<br />
development of the Java tracker. We therefore describe the architecture, and state<br />
implementation details. We chose a modular program structure to ease the exchange<br />
of components and to separate the performed tasks. The graphical representation<br />
of tracking information is strictly separated from the data representation. All these<br />
design decisions are described in Section 4.2. Additionally, we explain the basic appli-<br />
cation flow and the implementation of the tracking algorithm. Section 4.3 then gives<br />
an insight of the implementation process. It describes the working environment and<br />
states problems that arouse during the coding phase.<br />
4.2. Architecture<br />
4.2.1. Structure<br />
The Java feature tracker is split up into 5 packages, grouped by the tasks of con-<br />
taining classes. These packages are the GUI, the graphical data representation layer,<br />
the domain layer, the data storage layer and the controlling and connecting classes.<br />
Figure 4.1 illustrates the implemented classes and according packages, the following<br />
paragraphs describe the functionality of each group.<br />
Controlling and connecting classes<br />
The controlling and connecting classes are responsible to establish and manage the<br />
communication between other packages, and therefore also manage the instantiation<br />
of important facade objects. For that purpose, the package has to provide constant<br />
values for the exchange of states between packages.<br />
50
4. Programming<br />
The Main class, the startup object for the program, is part of this package. It creates<br />
the main GUI object and the TrackerController, and connects the two instances by ex-<br />
changing a StateChangedListener. The TrackerController is an interface that receives<br />
commands and propagates them to underlying domain classes. The TrackerFacade is<br />
the implementing class that accomplishes this task. The StateChangedListener is an<br />
interface that manages the communication from lower layers to the user interface and<br />
logfiles.<br />
Graphical User Interface<br />
The GUI holds the Swing interface that is presented to the user. It is implemented<br />
in the TrackerWindow class, which is created and linked to the remaining applica-<br />
tion. The TrackerWindow additionally encapsulates an implementation of the State-<br />
ChangedListener interface. This subclass is responsible to log application messages or<br />
print them to the user interface.<br />
Graphical Data Representation<br />
The graphical data representation provides functionality to generate a drawable rep-<br />
resentation of the BSP tracking tree, and presents all video and tracking information<br />
to the user. When the main GUI class requests the video visualization and tracking<br />
component from the TrackerController, a TrackerComponent interface is returned.<br />
It is implemented by the TrackerPanel, which manages the current image selections,<br />
the correct display of the current video frame, and the according tracking data. For<br />
each selection, the TrackerComponent holds a BSPTree2D, the visual equivalent to a<br />
BSPTree, and the Extrema2D, that is the leaf nodes with minimum or maximum x-<br />
or y-values. Both classes implement the interface Drawable, which eases the presen-<br />
tation of the shapes onto a Graphics context. They are compositions of Figures2D,<br />
an interface which is implemented by Points2D in the case of Extrema2D, and by<br />
BSPFigure2D in the case of BSPTree2D. The latter holds drawing information of the<br />
ellipses, centroids, major and minor axes, represented by the subclasses of BSPFig-<br />
ure2D, separately for leafs and inner nodes of the BSPTree.<br />
51
Domain Layer<br />
4. Programming<br />
The domain layer encapsulates the core functionality of the Java feature tracker. It<br />
consists of 3 main parts: the classes responsible for video frame extraction, for pre-<br />
processing of frames, and for the tracking process itself. In each of these parts, the<br />
responsible class is created by a factory which returns an instance of the specific<br />
interface or abstract class. The TrackerFacade then communicates with these inter-<br />
faces. For the extraction of the video frames, the FrameExtractorFactory creates an<br />
instance of the FrameExtractor interface. For preprocessing, the PreprocessorFactory<br />
creates the Preprocessor. The core element of the program, the feature tracking im-<br />
plementation, is created by the RegionTrackerFactory, which returns a subclass of the<br />
RegionTracker. This class builds the tree of BSPNodes and returns the root node.<br />
Storage Layer<br />
The storage layer is responsible to store important tracked data for further usage, like<br />
for evaluation or higher-level tasks. The data is therefore saved to a TrackedRegion<br />
and returned to the TrackerFacade. The facade then forwards this information to the<br />
implementation of the TrackedDataController that manages the received data and files<br />
it to a specified location.<br />
52
domain<br />
Main<br />
><br />
StateChangedListener<br />
gui<br />
FrameExtractorFactory<br />
><br />
FrameExtractor<br />
FrameAccessPrePerformed FrameAccess<br />
TrackerWindow<br />
MyStateChangedListener<br />
bsp<br />
TrackerComponent<br />
TrackerPanel<br />
BSPCentroids2D<br />
JMFSnapper<br />
BSPNode<br />
4. Programming<br />
*<br />
BSPEllipses2D<br />
AboutDialog<br />
BSPTree2D<br />
BSPFigure2D<br />
PreprocessorFactory<br />
><br />
Preprocessor<br />
JAIEdgeDetector<br />
RegionTrackerFactory<br />
RegionTracker<br />
RGBRegionTracker<br />
SimpleFileFilter<br />
Figure2D<br />
* *<br />
TrackerFacade<br />
uses JMF, JAI uses JMF uses JMF<br />
uses JAI<br />
><br />
Drawable<br />
BSPMajorAxes2D BSPMinorAxes2D<br />
><br />
TrackerController<br />
PreprocessorException<br />
CannyEdgeDetector<br />
Figure 4.1.: Class Diagram<br />
53<br />
BinaryInvRegionTracker<br />
*<br />
Extrema2D<br />
Points2D<br />
ViewPreferences<br />
data<br />
><br />
TrackedDataController<br />
TrackedDataHandler<br />
TrackedRegion
4.2.2. Basic Application Flow<br />
4. Programming<br />
After describing the involved classes of the Java feature tracker, we now look at the<br />
way how the important classes communicate during runtime. The basic workflow is<br />
mainly managed by the TrackerController. As illustrated in Figure 4.2, the class re-<br />
ceives all calls from the GUI, such as the openVideo, playVideo or process commands.<br />
The TrackerController then propagates the command to the responsible class.<br />
During the openVideo command, the FrameExtractor is called, which extracts frames<br />
from the video input data. If a new image frame is available, the StateChangedListener<br />
is notified, and this object then updates the TrackerComponent, the visual component<br />
that displays video frames in the user interface. The tracker component is also re-<br />
sponsible to call the Preprocessor and request the preprocessed image. Other video<br />
playback functions trigger a similar process.<br />
process is called to execute the actual feature tracking. For that purpose, the method is<br />
redirected to the RegionTracker, which then creates the BSP tree for the current video<br />
frame. The TrackedDataController is responsible for saving the tracked information<br />
to a file.<br />
Figure 4.2.: Overview of the basic application workflow<br />
54
4.2.3. Tracking Algorithm<br />
4. Programming<br />
The basic application workflow, as described in the previous section, has one core<br />
element, the RegionTracker. It is responsible to produce feature points out of a pre-<br />
processed image. It therefore creates a hierarchical BSP tree, where each node holds<br />
shape information for a certain part of the image. This process is based on the work<br />
of Rocha et al. [2002], as described in Section 1.3.3. Most of the tracking procedure is<br />
implemented in the class BSPNode that is responsible for the creation of a BSP tree.<br />
The basic procedure of the BSPNode works in three steps:<br />
1. Add image foreground pixels to the BSPNode (see procedure 4.2).<br />
2. Calculate orientation values of the added points (see procedure 4.3).<br />
After this step, the flag isCalculated is set to true, so that subsequent procedures<br />
can verify that the calculation step has not been left out.<br />
3. Create a BSP tree by subdividing the current node (see procedure 4.4). Return<br />
the current node as root.<br />
In order to save image information, the BSPNode holds three arrays: X and Y<br />
for the position of point P (x, y), and array V for the image intensity value of the<br />
point. If the algorithm only works with binary images, the intensity values in V<br />
is set to 1. The number of pixels that were already added to the node is held in<br />
ipix. The image moments are named mpq with p and q set to 0, 1, 2 (see the moment<br />
calculations in procedure 4.2). Additionally, every BSPNode holds a reference to<br />
the StateChangedListener (called listener) to propagate informations or errors to the<br />
user, and to a TrackedRegion (called trackedRegion) to permanently save tracking<br />
information. The tracking procedure is started by the class RegionTracker, which<br />
currently traverses all pixels of a rectangular image raster and adds all foreground<br />
pixels (that is pixels with a non-zero intensity value) to the root node (see procedure<br />
4.1).<br />
55
Functions of Class RegionTracker<br />
4. Programming<br />
The abstract class RegionTracker is responsible to trigger the feature tracking pro-<br />
cess. In its method createBSPTree, the class creates the root node of the BSP tree,<br />
and carries out all steps that are necessary for this node. The class is derived by<br />
the BinaryInvRegionTracker, which works with binary images. It therefore takes the<br />
intensity value of the image pixel at band 0 of the raster and inverts it. (In the pre-<br />
processed image, all image pixels that belong to the image area are black (=0). After<br />
inverting, the value of image area pixel is 1).<br />
Procedure 4.1: createBSPTree(levels)<br />
1. Create a new BSPNode N with the following parameters:<br />
(a) The image raster r. It contains the pixels of a certain region of the image.<br />
(b) The maximum number of non-zero pixels nmax, in this case the number of<br />
pixels in the raster (widthr ∗ heightr).<br />
(c) A listener and a trackedRegion for feedback and data storage purposes.<br />
2. For each position (i, j) in the raster, do the following:<br />
(a) Let (xmin, ymin) be the position of the upper left raster point in the image.<br />
Fetch rxmin+i,ymin+j, the pixel value at postion (xmin + i, ymin + j) in the<br />
raster.<br />
(b) If ri,j �= 0, add point (i, j) to the node N.<br />
3. If at least one point was added to the root node, do:<br />
(a) Call function calculateValues() of node N.<br />
(b) Call function subdivide(levels-1) of node N.<br />
4. Return N<br />
56
Functions of Class BSPNode<br />
4. Programming<br />
After having described the initial function calls to the BSPNode, we now look at the<br />
inside of these functions. The 3 substantial functions of the BSPNode are addPoint,<br />
calculateValues and subdivide. By splitting up the tracking process into these methods,<br />
it is possible to add region points independently to the current node. During the<br />
initialization of a new child node, the points that will be added to this node are<br />
not yet known. Moreover, the independent calculation of localization values allows<br />
for a method execution only if it is really needed, and for additional checks between<br />
calculation and subdivision.<br />
Procedure 4.2: addPoint(x, y, val)<br />
1. Add the point to the arrays: Xipix ← x, Yipix ← y and Yipix ← val<br />
2. Increase the number of pixels: ipix ← ipix + 1<br />
3. Add up to the moments:<br />
(a) m00 ← m00 + val<br />
(b) m10 ← m00 + x ∗ val<br />
(c) m01 ← m00 + y ∗ val<br />
(d) m11 ← m00 + x ∗ y ∗ val<br />
(e) m20 ← m00 + x 2 ∗ val<br />
(f) m02 ← m00 + y 2 ∗ val<br />
57
4. Programming<br />
Procedure 4.3: calculateValues()<br />
1. Check if m00 is 0. If so, no pixels are set in this node and all further calculations<br />
are skipped. Return false in this case.<br />
2. Calculate the image centroid c(xc, yc):<br />
(a) xc ← m10<br />
m00<br />
(b) yc ← m01<br />
m00<br />
3. Calculate the second order central moments µ20, µ11 and µ02:<br />
(a) µ20 ← m20<br />
− x<br />
m00<br />
2 c<br />
(b) µ11 ← m11<br />
− xcyc<br />
m00<br />
(c) µ02 ← m02<br />
m00<br />
− y2 c<br />
4. Calculate θ. Two special cases have to be treated separately: µ11 = 0 and<br />
m20 = m02. The treatment of these cases was determined by program test runs.<br />
(a) If µ11 is 0, set θ as following:<br />
– If µ02 < µ20: θ ← 0<br />
– Else: θ ← π<br />
2<br />
(b) If m20 = m02 do:<br />
– If µ11 < 0: θ ← π<br />
4<br />
– Else: θ ← 3π<br />
4<br />
(c) Else, do the default calculation:<br />
θ ← tan−1 � �<br />
2µ11<br />
/2<br />
µ20 − µ02<br />
Note that Math.atan2(y, x) should be used instead of Math.atan( y<br />
x )<br />
in Java. Otherwise, a sign error could lead to an angle rotated 90�.<br />
5. Set the flag isCalculated to true.<br />
6. Return true.<br />
58
4. Programming<br />
Procedure 4.4: subdivide(levels)<br />
1. Check if flag isCalculated is true. Otherwise, stop the subdivision.<br />
2. If the current node is a leaf node, that is if no levels to compute are left<br />
(levels = 0), add the point to the trackedRegion and exit.<br />
3. Create the child nodes C1 and C2.<br />
The constructor parameter nmax (the maximum number of non-zero-pixels) is<br />
set to the number of non-zero pixels of the current node (ipix).<br />
4. Calculate the orthogonal angle to θ (the angle of the x-axis with the minor axis):<br />
θ⊥ = (θ + π<br />
2 ) mod π<br />
5. Iterate over all points. For every point P = (Xi, Yi) with the intensity value Vi,<br />
proceed as described:<br />
(a) If the point is the current centroid (Xi = xc and Yi = yc), add P to both<br />
child nodes.<br />
(b) Otherwise, divide the image area along the minor axis of the best fitting<br />
ellipse. For this, the reference system is shifted to have the centroid as the<br />
origin. Then the angle θP ′, the angle of the shifted point P ′ with the x-axis,<br />
is computed. The difference between θP ′ and θ⊥ is then taken to decide if<br />
the point is added to C1 or C2 (see Figure 4.3):<br />
1. Calculate θP ′. The y-value of P ′ is mirrored along the x-axis to correspond<br />
to the standard Cartesian coordinate system:<br />
π<br />
θP ′ ← 2 − atan2(xp − xc, −(yp − yc))<br />
2. Calculate the difference angle β: β ← θP ′ − θ⊥<br />
3. Verify that β is between −π and +π:<br />
If β ≤ −π, then β ← 2π + β<br />
4. If β ≤ 0 or β = π, then add point P to C1.<br />
5. If β ≥ 0, then add point P to C2.<br />
(c) Check that the current point was added in at least one child node.<br />
6. For both child nodes C1 and C2, call the function calculateValues(). If it returns<br />
true, call the function subdivide(levels-1). Otherwise, set the child node to null.<br />
(As it is an empty node, it is not used any more).<br />
59
4. Programming<br />
(a) (b)<br />
Figure 4.3.: Angle calculation for raster subdivision. The best-fitting ellipse and the<br />
according splitting axis are defined for an image area (a). To decide what<br />
side of the splitting axis point P belongs to, the coordinate system is<br />
shifted to have the centroid C as origin (b).<br />
4.3. Implementation Process<br />
After showing the architecture, the basic workflow, and details of the tracking algo-<br />
rithm, we now describe the process of implementation. We therefore state the working<br />
environment for the development as well as the problems that arouse during the im-<br />
plementation of the previously described architecture.<br />
4.3.1. Working Environment<br />
The programming was done on a SuSE Linux platform, with Eclipse SDK in version<br />
3.1.0 for the Java development. NetBeans 4.1 was used for building the GUI. Code<br />
was written compatible with Java 1.4, but it was tested with both Java 1.5.0 01 and<br />
1.4.2 08 on Linux, and 1.4.2 on Windows. The JMF was used in the 2.1.1e Linux<br />
performance pack version (and the Windows performance pack for Windows testing),<br />
as the reading of the MPEG-1 files does not work with the pure java crossover version<br />
of JMF. Poseidon for UML Community Edition 3.0.1 was used for the initial class<br />
design and code generation.<br />
60
4.3.2. Difficulties<br />
4. Programming<br />
During the development process of the Java movement tracker, we had to deal with<br />
some difficulties. The major drawbacks and delays originated in three main parts of the<br />
architecture: the implementation of the tracker algorithm, the video frame extraction,<br />
and the preprocessing methods.<br />
Algorithm<br />
The main problem during implementation of the tracking algorithm was the correct<br />
calculation of the orientation angle, and to find a straightforward way to subdivide the<br />
current image area into two child areas along the minor axis. The calculation of the<br />
angle θ required special treatment because the method sometimes delivers the sought<br />
angle rotated 90�. We found a hint that this problem can arise if the inverse tangent<br />
is calculated with Math.atan instead of Math.atan2. Even though we implemented<br />
this change, the problem is still not solved in all cases. Furthermore, there are two<br />
special cases where the standard angle calculation formula does not work: if µ11 = 0,<br />
or if µ20 = µ02. We solved this problem by manually testing various cases with<br />
different angles for these exceptions. Hence, we came up with values for θ that deliver<br />
satisfactory results. Though, we did not prove these values. For the area splitting, we<br />
first worked with linear equations in the form y = kx+d, and in the point-vector form.<br />
Both methods required additional conversions using the (invert) tangent. After some<br />
test runs we came up with the solution described in procedure 4.4. It is based on a shift<br />
of the coordinate system to have the centroid as its origin. Then, the difference angle<br />
between the θ and the angle between axis and shifted point P ′ is taken for decision<br />
making (see Fgure 4.3 for an illustration).<br />
Frame Extraction<br />
For the video frame extraction, we currently use a modified version of the sample class<br />
FrameAccess.java that is provided in the JMF guide [JMF, p. 54]. However, this code<br />
completely traverses the video, and start/stop functionality is only possible with rough<br />
workarounds. Other possibilities, like buffering all images in the cache is not possible<br />
due to limited memory.<br />
61
4. Programming<br />
Caching images as files is too slow and generates too much file IO. We then tried a so-<br />
lution based on the class Seek.java 1 . It uses the FramePositioningControl helper class<br />
to access single video frames. However, this code did not work with our input videos<br />
(with both Linux and Windows JMF performance pack versions), as it returns 0 as a to-<br />
tal number of video frames. A ray of hope is the JMFSnapper implementation provided<br />
by Davison (http://fivedots.coe.psu.ac.th/~ad/jg/ch283/index.html). It is<br />
described in a draft chapter of the book “Killer Game Programming in Java”[Davison,<br />
2005] and explains a solution without using the FramePositioningControl. It works<br />
fine and is fast, but it is not yet completely integrated in the framework of the Java<br />
feature tracker. This would be a possibility for further enhancements.<br />
Edge detection<br />
The first aim was to implement the Canny edge detection algorithm using JAI. De-<br />
scriptions on how to proceed were vague and did not give enough help for the cod-<br />
ing. We found a project, called Beeegle, that uses a JAI Canny implementation,<br />
but the downloadable source code is incomplete and defective (http://beeegle.nl/<br />
modules/sections/index.php?op=listarticles&secid=2). Consultation with the<br />
authors showed that the code is not used any more and will therefore not be updated<br />
or corrected. Hence, we reverted to an implementation that is provided in a Java forum<br />
(http://forums.java.sun.com/thread.jspa?threadID=546211&start=45&tstart=<br />
1) and adapted it to fit into the program architecture. Later on we tried to imple-<br />
ment a second edge detection mechanism, a simple Sobel operator proposed by the<br />
JAI framework. If the method processes an image that is fetched with JAIs fileload<br />
method, the operator is very fast and delivers good results. Integrated into the Java<br />
feature tracking, the function did not work correctly. The process was very slow and<br />
delivered a binary image, even though a grayscale image was expected. After in-<br />
quiry, we found out that the image type of the BufferedImage differs in the two cases.<br />
In the latter case, the image is fetched from an AWTImage, which returns the type<br />
TYPE 3BYTE BGR. Then the method getAsBufferedImage of the PlanarImage re-<br />
quires 1 to 3 seconds for processing. This problem could not be solved for the time<br />
being.<br />
1 an official JMF solution provided on the Java Sun homepage (http://java.sun.com/products/<br />
java-media/jmf/2.1.1/solutions/Seek.java).<br />
62
4.4. Summary<br />
4. Programming<br />
The architecture of the Java feature tracker is split up into 5 packages, showing a<br />
modular structure with 3 exchangeable parts, responsible for frame extraction, pre-<br />
processing and feature tracking. Because of a factory-based construction of these<br />
parts, each of them can be replaced and therefore allows for a comparison of differ-<br />
ent approaches. This is not only important for future enhancements, but also for the<br />
development process when problematical code could easily be exchanged. Difficulties<br />
mainly arouse in 3 major parts. Angle calculations and subdivision needed special<br />
treatment and increased testing; JMF was not as comfortable as it probably claims to<br />
be; and JAI is not yet used for preprocessing, even though the basic JAI edge detection<br />
mechanism would be faster.<br />
63
5. Evaluation<br />
5.1. Overview<br />
To evaluate the quality of the developed Java feature tracker, we first look at the<br />
basic program functionalities and then focus on two main aspects: the quality of the<br />
extracted feature points, and the time consumption. In a first step, we evaluate the<br />
abilities that users have via the user interface. Then we rate extracted feature points of<br />
a mouth region, and compare the elapsed time for the preprocessing and the tracking<br />
methods in different circumstances.<br />
5.2. Program Abilities<br />
The developed Java feature tracker is able to find facial feature points of manually<br />
preselected feature areas in a sequence of video frames. In a series of steps, the users<br />
first open a video, and set selections for the areas to process on either the original or<br />
the preprocessed video frame. By selecting the process-button, the program starts the<br />
creation of the BSP tree, memorizes the result and presents it frame-by-frame in the<br />
GUI. After running through the video frames, the users can save the feature points to<br />
a Comma Separated Values (CSV) file.<br />
The users can select or unselect various tracking information for visualization. They<br />
can look at algorithm-specific data like the 16 tracked centroid points, and the cor-<br />
responding ellipses and ellipse axes. 4 (or more) of these 16 points are then called<br />
features, which are the points with the biggest or smallest x/y-values. These feature<br />
points can be viewed separately in the GUI. The user interface allows for basic flow<br />
control of the video playback, but has some limitations. Play, stop and the display<br />
of the next frame work fine. However, pausing a video is faulty, and the display of<br />
the previous frame is not implemented. Moreover, the program architecture provides<br />
customizing options, like the color and stroke size of the feature display, or the se-<br />
lection of the preprocessor and the frame extractor. A GUI for these features is not<br />
implemented yet, but can be added with little additional effort.<br />
64
5.3. Tracking Quality<br />
5. Evaluation<br />
First observations of the tracked feature points show that preselections on single areas<br />
deliver acceptable results. The most accurate output was achieved for an eyebrow<br />
selection. Mouth selections have good results except for the lowest point of the lower<br />
lip, where the preprocessing is not able to find a continuous edge. Since there is no link<br />
between feature calculations of two subsequent frames, the location of this point may<br />
flip horizontally from the left side of the mouth in frame n to the right side in frame<br />
(n + 1). Eye and nose selections require exact area preselection, as nearby edges may<br />
disturb the tracking process. However, in contrast to the snake algorithm, a selection<br />
of a larger area without disturbing pixels does not change the result, as background<br />
pixels (0-value pixels in binary images) do not have an influence on the calculation.<br />
Calculated points of the implemented Java feature tracker do not necessarily match<br />
with standardized feature points, as the algorithm has no knowledge about the under-<br />
lying image section. It therefore processes every image region selection in an equal way,<br />
without knowing if the produced feature point is, for example, the corner of the mouth.<br />
Figures 5.1 and 5.2 show feature points that were produced by the Java feature<br />
tracker. All test runs illustrated in Figure 5.1 produced satisfactory outcome. Image<br />
(a) and (b) have only minor deviations, the left corner of the mouth in image (a), for<br />
example, is slightly too low (that is a too high y-value). Images (c), (d), (e), and (f)<br />
show the problem with the lowest point of the lower lip. The point is either on the<br />
left or on the right side of the desired centered position. Results on other face regions<br />
are illustrated in (g) and (h). Figure 5.2 shows a whole-face feature tracking process,<br />
which works with 6 image selections. As illustrated in (b), the 16 tracked points per<br />
selected regions correctly approximate the contour of the underlying facial feature. (d)<br />
shows the Canny preprocessed image, with the discontinuity of the lower line of the<br />
lower lip.<br />
65
5. Evaluation<br />
(a) (b)<br />
(c) (d)<br />
(e) (f)<br />
(g) (h)<br />
Figure 5.1.: Tracking results for selective regions.<br />
66
5. Evaluation<br />
(a) (b)<br />
(c) (d)<br />
Figure 5.2.: Tracking result for 6 area selections: 2x eyebrow, 2x eye, nose, mouth.<br />
The produced features are in (a) and all centroids of the leaf-nodes in the<br />
BSP tree in (b). The preselected image areas are visible in (c), (d) shows<br />
the outcome in the preprocessed Canny edge image.<br />
67
5.3.1. Test Data<br />
5. Evaluation<br />
For the statistical evaluation, we have selected a video that shows a mouth movement.<br />
From our test data (described in Section 3.2.3) we have chosen the recording of AU<br />
23, described as Lip Tightener, performed by the facial muscle Orbicularis oris 1 .<br />
5.3.2. Technique<br />
For the mouth region testing, we perform a mouth region selection of 60x20 pixels,<br />
starting at point (233, 219). We examine the corners of the mouth, as these two<br />
points are best comparable and straightforward to examine. The left point (from<br />
the viewers perspective) is called point1, its coordinates are (x1, y1); the right point<br />
is called point2, with the coordinates (x2, y2). Three different parties have collected<br />
data: Two human testers manually examined the two features. The first tester is the<br />
developer of the Java tracker, female, 21 years old (called man1 from now on). Tester 2<br />
is an unprejudiced male, 15 years (called man2). The third input comes from the data<br />
extracted by the algorithm (called algo). Figure 5.3 shows two resulting frames, frame<br />
number 106 with almost congruent results (0 or 1 pixel difference), and frame number<br />
88 with the most dissimilar tracking points (up to 5 pixels difference) in the described<br />
testcase.<br />
(a) (b)<br />
Figure 5.3.: Good and bad results: a frame with almost identical tracking points (a),<br />
and the frame with the biggest differences (b). A white pixel is the selec-<br />
tion by man1, blue by man2, and green is calculated by algo.<br />
For the statistical calculations and diagram extraction we used Gnumeric in version<br />
1.2.13, OpenOffice.org 2.0 beta, and SPSS 11.<br />
1 the overview of action unit description can be found at http://www.cs.cmu.edu/afs/cs/project/<br />
face/www/facs.htm, the recent AU manual is available at http://face-and-emotion.com/<br />
dataface/facs/new_version.jsp<br />
68
5.3.3. Statistical Evaluation<br />
5. Evaluation<br />
According to the testcase description in Section 5.3.2, we performed a test run with the<br />
implemented Java feature tracker and collected the manual capturing of the 2 human<br />
testers. The output is 3 different sources for both x- and y-value of each corner of the<br />
mouth. Figure 5.4 illustrates the result of the data ascertainment (see Appendix A.1<br />
for all values).<br />
point1<br />
point2<br />
Figure 5.4.: Positions of the corners of the mouth.<br />
In order to examine the correctness of the tracked feature points, we focus on 4<br />
aspects: First, we look at the absolute values and compare coordinate positions as well<br />
as curve progressions. Then we examine the quality of the program output relative to<br />
the manual tracking data, where we look at the curve progression and the relationhip<br />
between the curves.<br />
69
5. Evaluation<br />
Coordinate Position Looking at the means over all x/y values, it shows that espe-<br />
cially the algorithm-calculated coordinates of point2 differ from the manual selections<br />
(see Figure 5.1). The x-value is too high (too far right in the image region), and the<br />
y-value too low (too high in the image region).<br />
x1 y1 x2 y2<br />
algo 243.03 220.03 284.07 217.59<br />
man1 243.2 219.43 282.75 219.51<br />
man2 243.2 219.43 282.75 219.51<br />
Table 5.1.: Mean position of x/y coordinates.<br />
The source of this inaccuracy is most likely to be found in the preprocessing. As<br />
illustrated in Figure 5.5, the mouth contours of the preprocessed image are ragged<br />
and uncontinuous. This image also shows why the value of y2 is too high: a false edge<br />
outside the corner of the mouth is visible in the preprocessed image.<br />
(a) (b)<br />
Figure 5.5.: Preprocessing of mouth region. The original image selection (a), and the<br />
preprocessed version that produced ragged edges (b).<br />
Looking at the minima and maxima over all tracked frames, we see that the algo-<br />
rithm data has more outliers than the manually determined data. For example, the<br />
y-coordinate of point2 has a minimum of 215 where the manual testers reach values<br />
of 218 and 219. The maximum of this point is not higher, as the algorithm generally<br />
produces too low y-values (see Table 5.2).<br />
70
5. Evaluation<br />
Minima Maxima<br />
x1 y1 x2 y2 x1 y1 x2 y2<br />
algo 210 218 277 215 251 224 289 219<br />
man1 236 218 278 218 249 221 288 221<br />
man2 237 219 278 219 251 222 288 222<br />
Table 5.2.: Minima and maxima of x/y coordinates.<br />
We assume that both average and min/max values would improve in case of a clearer<br />
and continuous edge detection.<br />
Curve Progression As a next step, we want to examine if the curve progression is<br />
continuous, or if the coordinate values change strongly from one frame to the next. We<br />
therefore determine the squared difference between two subsequent video frames (see<br />
Figure 5.6) and calculate the sum over all 136 frames, as well as the maximum and<br />
mean value (see Table 5.3). The curve progression of algo tends to be more erratic and<br />
oscillative than of man1 and man2. The maximum oscillation is in any case produced<br />
by algo. Taking the sum over all values, algo has significantly higher values. For the<br />
x-coordinates, man2 delivers significantly better results, man1 is closer to the values<br />
of the algorithm. In terms of the y-coordinates, algo is significantly worse than both<br />
manual testers (see Table 5.3 for the numbers).<br />
x1 y1 x2 y2<br />
man1 man2 algo man1 man2 algo man1 man2 algo man1 man2 algo<br />
sum 197 116 220 29 16 124 108 78 211 27 16 54<br />
max 25 9 25 1 1 16 9 16 16 1 1 9<br />
avg 1.46 0.86 1.63 0.21 0.12 0.92 0.8 0.58 1.56 0.2 0.12 0.4<br />
Table 5.3.: Sum, maximum and average of d 2 on subsequent frames.<br />
71
point1<br />
point2<br />
5. Evaluation<br />
Figure 5.6.: d 2 of subsequent video frames for both x and y coordinate of point1 and<br />
point2.<br />
Curve Relationship In order to determine how the data of algo, man1 and man2 are<br />
related to each other, we first calculate the correlation of the curves in Figure 5.4.<br />
Correlations x1 y1 x2 y2<br />
man1 – man2 0.97148 0.84148 0.97423 0.84805<br />
man1 – algo 0.97919 0.7235 0.96173 -0.073<br />
algo – man2 0.98083 0.67658 0.95887 -0.17251<br />
standard derivation 0.00499 0.08496 0.00816 0.56269<br />
Table 5.4.: Correlation results.<br />
72
5. Evaluation<br />
These correlations show good results for the algorithm in respect to the manual<br />
testers. The x-value of point1 shows correlations around 97-98%, and a standard<br />
deviation lower than 0.5%. Only the y-value of point2 does not have satisfactory<br />
results. The manual testing values do not correlate with the output of the algorithm,<br />
the standard deviation is over 56%. Table 5.4 shows all results. In addition to the<br />
correlation, we look at the similarity by calculating the square of the frame-by-frame<br />
difference between the testers and performing a t-test on this data. Table 5.5 shows<br />
the result of this approach.<br />
point1 x1 y1<br />
T P (T
5. Evaluation<br />
The output of the t-test has to be treated with caution, as the precondition of having<br />
normal distributed values is not fulfilled [Smith, 2002] (both Kolmogorov-Smirnov<br />
and Shapiro-Wilk tests return significances below 0.000). The reason why we still<br />
calculated the t-test is that the curves are predominantly bell-shaped, and that we<br />
expect a normal distribution with a greater sample size. Still, the t-test inspects<br />
the x/y-coordinates individually, but does not deal with the overall quality of the<br />
two feature points. The Analysis of Variance (ANOVA) calculation in the following<br />
paragraph tries to fill this gap.<br />
Relative Point Position In order to compare the position of the automatically de-<br />
tected feature points with the points determined by the manual tester, we take a closer<br />
look at the (squared) distance between curves of different testers. Table 5.6 shows the<br />
sum of all d 2 ’s, as well as minima and maxima of all tracked frames. The distances<br />
between the curve of one manual tester and the algorithm are significantly higher than<br />
between the manual testers. man1 is still closer to algo than man2.<br />
man1 – man2 x1 y1 x2 y2<br />
sum 189 251 114 224<br />
maximum 9 9 4 4<br />
average 1.39 1.85 0.84 1.65<br />
man1 – algo x1 y1 x2 y2<br />
sum 223 166 453 779<br />
maximum 9 9 16 25<br />
average 1.64 1.22 3.33 5.73<br />
algo – man2 x1 y1 x2 y2<br />
sum 192 183 323 1593<br />
maximum 16 9 9 36<br />
average 1.41 1.35 2.38 11.71<br />
Table 5.6.: Sum, maximum and average of d 2 of different tester’s values.<br />
74
5. Evaluation<br />
For examining the overall performance of the two algorithm-tracked feature points<br />
in respect to the manual testers, we calculate ANOVA for the d 2 values described<br />
above. In order to clean up the data and make the data more likely to be normal<br />
distributed, we calculate the extrema and removed them from the data set. Table<br />
5.7 shows the output of this extrema calculation. Note that point2 has less extrema<br />
(except the x-coordinate of d 2 (man1 − algo)), but with higher values (up to >= 36).<br />
We therefore assume that point1 has more outliers than point1.<br />
x1 y1 x2 y2<br />
d 2 (man1 − algo) – 16 >= 4 24 >= 9 2 >= 25<br />
d 2 (man2 − algo) 22 >= 4 22 >= 4 13 >= 9 5 >= 36<br />
Table 5.7.: Extrema of d 2 between the algorithm and each of the manual testers.<br />
After removing the extrema from the data list, we recalculate ANOVA on the<br />
cleaned-up data set. According to a test of homogeneity of variances, the new data set<br />
is not significantly homogenous (with a significance of 0.000). In order to evaluate the<br />
overall performance of the feature points, we calculate a contrast test where we divide<br />
the d 2 -results into 2 groups and compare their mean values. We perform 3 groupings:<br />
all values of man1 (d 2 of x1, y1, x2, and y2) compared to all values of man2; all d 2 of x-<br />
coordinates (of both testers) compared to the d 2 ’s of y-coordinates; and the distances<br />
of point1 compared to the distances of point2. The results are illustrated in Table<br />
5.8. It shows that man2 has a bigger spread as man1, the difference to the algorithm<br />
is larger. The difference between the means of all y-values and those of all x-values<br />
is 11.417, so it is likely that there is an overestimation in the vertical direction. The<br />
spread between all point1-values and all point2-values is 16.529, so point2 is more<br />
likely to be overestimated.<br />
Contrast Spread Std. Error t<br />
avg(valman2) – avg(valman1) 3.997 0.9399 4.253<br />
avg(valys) – avg(valxs) 11.417 0.9399 12.147<br />
avg(valpoint2s) – avg(valpoint1s) 16.529 0.9399 17.586<br />
Table 5.8.: Contrast tests of d 2 between the algorithm and each of the manual testers.<br />
75
5. Evaluation<br />
After looking at the overall success of the feature points, we finally create the<br />
overestimation-table 5.9, where we can compare d 2 of one coordinate to the value<br />
of each other coordinate. The table shows that d 2 (y2) of man2-algo has the biggest<br />
difference to all other values, it has the largest spread and therefore varies most during<br />
the feature tracking process. x1 and y1 have the same spread, their error rate is likely<br />
to be similar.<br />
man1-algo man2-algo<br />
d 2 (x1) d 2 (y1) d 2 (x2) d 2 (y2) d 2 (x1) d 2 (y1) d 2 (x2) d 2 (y2)<br />
man1- d 2 (x1) – 1.081 -3.801 1.062 1.062 -9.147<br />
algo d 2 (y1) -1.081 – -1.495 -4.882 -1.116 -10.228<br />
d 2 (x2) 1.495 – -3.387 1.440 1.440 -8.733<br />
d 2 (y2) 3.801 4.882 3.387 – 4.826 4.826 3.766 -5.346<br />
man2- d 2 (x1) -1.026 -1.440 -4.826 – -1.061 -10.172<br />
algo d 2 (y1) -1.026 -1.440 -4.826 – -1.061 -10.172<br />
d 2 (x2) 1.116 -3.766 1.061 1.061 – -9.111<br />
d 2 (y2) 9.147 10.228 8.733 5.346 10.172 10.172 9.111 –<br />
Table 5.9.: Tamhane post-hoc test on d 2 . Mean difference (row − column) where it is<br />
significant in the 0.05 level.<br />
76
5.4. Time Usage<br />
5.4.1. Technique<br />
5. Evaluation<br />
For testing, we use the same input data as in Section 5.3.2. We investigate the time<br />
consumption of the two methods that are mainly responsible for the tracking process<br />
and may consume the most time. The first procedure is used for returning the feature<br />
points (see procedure 4.1 on page 56), the second is responsible for the edge detection.<br />
For testing the tracking method, we use the standard program, whereas in case of<br />
the preprocessing we use a testclass, called PreprocessingTimeTest, as it minimizes<br />
additional overhead. The time information is then extracted from the logfile.<br />
5.4.2. Statistical Evaluation<br />
Feature Tracking<br />
For the comparison of the implemented feature tracking algorithm, we examine 2 test-<br />
cases: a complete image region selection and a 60x20 pixels mouth region selection.<br />
During the first test run, we found that the results are disturbed by garbage collection<br />
latencies. Hence, we tune the JVM parameters to allow for parallel processing during<br />
garbage collection. The JVM used parameters are: -verbose:gc -Xms64m -Xmx256m<br />
-XX:NewRatio=2 -XX:+UseConcMarkSweepGC. Figure 5.7 shows the tracking time<br />
output for both region selections (all tracking time information can be found in ap-<br />
pendix A.2 and A.3).<br />
(a) (b)<br />
Figure 5.7.: Tracking time consumption for the complete region selection (a) and a<br />
60x20 mouth selection (b).<br />
77
5. Evaluation<br />
Remarkably, the JVM tuning considerably improves the tracking of the complete<br />
region, but has the opposite effect on the smaller selection. The tracking times lay<br />
between 3.94 ms (5.2 ms JVM tuned) for the mouth region, and 51.76 ms (21.4 ms<br />
JVM tuned) for the whole-image selection.<br />
Preprocessing<br />
To test the preparation of images, we compare two preprocessing implementations: the<br />
currently used Canny edge detector, and a standard JAI edge detection mechanism,<br />
which is based on the Sobel operator. As in the testing of the tracking method,<br />
we perform the tests with both standard and tuned JVM options. The results are<br />
illustrated in Figure 5.8 (all tracking values can be found in appendix A.4 and A.5).<br />
(a) (b)<br />
Figure 5.8.: Preprocessing time consumption for the Canny (a) and the JAI Sobel (b)<br />
edge detection.<br />
The figures show that the currently used edge detection mechanism is considerably<br />
slower than the JAI operator. The average processing time for Canny is 212.91 ms<br />
(156.42 ms with JVM tuning), in contrast to 1.27 ms (0.86 ms JVM tuned) for Sobel.<br />
Unfortunately, the excellent values of the JAI Sobel operator cannot be transmitted to<br />
the Java feature tracker. In both the test class and the final program the algorithm uses<br />
a BufferedImage. In case of the test class, the type of this image is TYPE INT RGB,<br />
in the final program it is TYPE 3BYTE BGR, as it the image is directly fetched<br />
from the video and not read from a saved .png image. In case of the latter type, the<br />
necessary image conversion requires processing times of more than 1 second.<br />
78
5.5. Summary<br />
5. Evaluation<br />
The evaluation showed that the developed Java feature tracker is able to deliver reason-<br />
able results. Important feature points can be located; differences in mean coordinate<br />
positions are below 2 pixels, and correlations of the produced feature points reach val-<br />
ues of up to 98%. The results differ per inspected feature point. For example, the left<br />
corner of the mouth showed a more accurate position as the right corner. Most signif-<br />
icant improvements can be reached by improving the preprocessing. This can be done<br />
by producing continuous, non-ragged feature contours that do not respond to light-<br />
ning changes, and omitting intensity changes in the image that do not belong to facial<br />
features. This step would also improve the oscillating behavior that can be noticed<br />
in the current version of the Java tracker. Another step would be to add geometrical<br />
transformations to link subsequent feature points, or to omit outliers by calculating<br />
more accurate values from the neighboring frames. Performancewise, the bottleneck<br />
of the tracking process is the preprocessing. It currently needs about 4 times as much<br />
time as the actual feature tracking. A JAI based edge detector would be faster for the<br />
image preparation, but further inquiry is necessary to find a workaround for extremely<br />
time-consuming image type conversions.<br />
79
Conclusions<br />
Overview<br />
The feature tracking solution described in this work is completely based on Java and<br />
finds facial feature points in preselected image regions of input videos. In this work,<br />
we explained the selection of algorithms and libraries as well as implementation and<br />
evaluation details. The main advantages of this solution are a (theoretical) platform-<br />
independence, little hardware requirements and small tracking efforts.<br />
Java turns out to be, with some constraints, very practicable as programming lan-<br />
guage for feature tracking problems. The time consumption is reasonably small and<br />
the implementation comfortable, as Java libraries are available for video processing and<br />
imaging tasks. Still, JMF, which was used for video frame extraction, does not keep all<br />
promises, as jumping between frames appears to be challenging or does not even work<br />
in the proposed way. JAI for image processing seems to be fast, but conversions be-<br />
tween image types are inapprehensibly time-consuming and could not be bypassed for<br />
the time being. For this reason, we alternatively used a Java2D based implementation.<br />
The applied algorithm, proposed by Rocha et al. [2002], shows the possibility to<br />
solve a complex task by splitting it up into smaller and therefore easier problems. The<br />
creation of a BSP tree, with nodes that hold object position, size and orientation, is<br />
straightforward and understandable. Nevertheless, we had to study different sources<br />
to verify equation definitions. The implementation is widely trouble-free, as the ap-<br />
proach is clearly specified in the underlying paper. However, problems arouse during<br />
the calculation of the orientation angle, where the method sometimes returns an angle<br />
rotated 90�. We did not yet find a solution for this difficulty. Still, the algorithm<br />
delivers good results for the feature tracking task. The accuracy of feature points<br />
often lies around 90%. Errors in the feature determination mostly have their origin in<br />
preprocessing discrepancies.<br />
80
Future Work<br />
Conclusions<br />
According to the listed problems, the major improvements of the current solution<br />
would be a revised JMF code, an enhanced preprocessing method and solved track-<br />
ing algorithm difficulties. Frame-by-frame traversion of the video could be enhanced<br />
by adding the previousFrame() functionality. Improved edge detection should make<br />
all important edges visible, like the lower line of the lower lip. Enhanced processing<br />
times could be reached by exclusively preprocessing selected image regions. Moreover,<br />
individual tracking cases with wrong object orientations have to be revised.<br />
In comparison to the commercial VeeAnimator, the Java feature tracker is inferior<br />
in a number of aspects. Image regions need manual initialization, it currently does<br />
not work with streaming media and is not able to track in realtime. The feature<br />
points still flutter if observed over a series of frames, they are not standardized, and<br />
do not deliver 3D information. The program could be changed to that effect by using<br />
geometric transformations between the shape information of subsequent frames (as<br />
described in the paper of Rocha et al. [2002]). JMF could be used to open video<br />
streams or could be replaced by alternative libraries. Automatic preselection of image<br />
areas could be introduced, for example by using facial shape models. These models<br />
could then be used to map the tracked points to a standardized image feature model<br />
by selecting feature points according to their proximity to model points. For simple<br />
3D information, the z-axis could be set to standard values.<br />
Summary<br />
The current program shows a straightforward and comprehensible feature tracking<br />
solution that provides basic tracking procedures. In case of further developments, for<br />
example by the Open Source community, the project could be a free and independent<br />
alternative in the field of feature tracking, facilitating high-level face processing tasks.<br />
Having Java as a basis, it would be suitable to professional 3D animations or future<br />
user interfaces on different platforms. Moreover, it could be used for teaching and<br />
further investigations in the field of computer vision. With this work we have unveiled<br />
the possibilities for future developments.<br />
81
Bibliography<br />
G. Antunes Abrantes and F. Pereira. MPEG-4 Facial Animation Technology: Survey,<br />
Implementation and Results. IEEE Transactions on Circuits and Systems for Video<br />
Technology, 9(2):290–305, March 1999.<br />
T. Awcock. Applied Image Processing. McGraw-Hill Companies, August 1995.<br />
S. S. Beauchemin and J. L. Barron. The Computation of Optical Flow. ACM Comput.<br />
Surv., 27(3):433–466, 1995. ISSN 0360-0300.<br />
D. L. Bimler, J. Kirkland, and K. A. Jameson. Quantifying variations in personal color<br />
spaces: Are there sex differences in color vision? Color Research & Application, 2:<br />
128–134, 2004.<br />
W. Burger and M. J. Burge. Digitale Bildverarbeitung. eXamen.press. Springer, 2005.<br />
C. Cédras and M. A. Shah. Motion Based Recognition: A Survey. Image and Vi-<br />
sion Computing, 13(2):129–155, March 1995. URL http://www.cc.gatech.edu/<br />
~jimmyd/summaries/cedras1995.html.<br />
J. Cohn, A. Zlochower, J.-J. J. Lien, and T. Kanade. Feature-Point Tracking by Op-<br />
tical Flow Discriminates Subtle Differences in Facial Expression. In Proceedings of<br />
the 3rd IEEE International Conference on Automatic Face and Gesture Recognition<br />
(FG ’98), pages 396 – 401, April 1998. URL http://www.ri.cmu.edu/pubs/pub_<br />
2075.html.<br />
R. Cutler and M. Turky. View-Based Interpretation of Real-Time Optical Flow for<br />
Gesture Recognition. http://citeseer.ist.psu.edu/cutler98viewbased.html,<br />
1998.<br />
A. Davison. Killer Game Programming in Java. O’Reilly, 1 edition, 2005. ISBN<br />
0-596-00730-2.<br />
82
Bibliography<br />
F. Dellaert and R. Collins. Fast Image-Based Tracking by Selective Pixel Integration.<br />
In ICCV 99 Workshop on Frame-Rate Vision, September 1999. URL http://www.<br />
ri.cmu.edu/pubs/pub_3195_text.html.<br />
P. Ekman and W. Friesen. Facial Action Coding System: A Technique for the Mea-<br />
surement of Facial Movement. Consulting Psychologists Press, Palo Alto, 1978.<br />
I. Essa and A. Pentland. Motion-Based Recognition, volume 9 of Computational Imag-<br />
ing and Vision, chapter 12: Facial Expression Recognition Using Image Motion.<br />
Kluwer Academic Publishers, 1997. ISBN 0-7923-4618-1.<br />
B. Fisher, S. Perkins, A. Walker, and E. Wolfart. Hypermedia Image Processing Ref-<br />
erence. http://www.cee.hw.ac.uk/hipr/html/canny.html, Department of Arti-<br />
ficial Intelligence University of Edinburgh/UK, 1994.<br />
D. Gorodnichy. Facial Recognition in Video. In Proceedings of International As-<br />
sociation for Pattern Recognition (IAPR) International Conference on Audio- and<br />
Video-Based Biometric Person Authentication (AVBPA’03), LNCS 2688, pages 505–<br />
514, Guildford, United Kingdom, June 2003. NRC 47150. URL http://iit-iti.<br />
nrc-cnrc.gc.ca/publications/nrc-47150_e.html.<br />
T. Goto, M. Escher, C. Zanardi, and N. Magnenat-Thalmann. MPEG-4 Based<br />
Animation With Face Feature Tracking. In CAS ’99 (Eurographics workshop),<br />
pages 89–98, Milano, Italy, September 1999. MIRALab, Springer. URL http:<br />
//www.miralab.unige.ch/papers/15.pdf.<br />
W. Iverson. Mac OS X for Java Geeks. O’Reilly, April 2003. URL<br />
http://www.oreilly.com/catalog/macxjvgks/http://www.oreilly.com/<br />
catalog/macxjvgks/chapter/ch10.pdf.<br />
J. Ivins and J. Porrill. Everything You Always Wanted To Know About Snakes (But<br />
Were Afraid To Ask). Technical report, Artificial Intelligence Vision Research Unit<br />
University Of Sheffield, England S10 2TP, July 1993. URL http://www.computing.<br />
edu.au/~jim/psfiles/aivru86c.ps. AIVRU Technical Memo #86 (Revised June<br />
1995; March 2000).<br />
M. Jacob, T. Blu, and M. Unser. Efficient Energies and Algorithms for Parametric<br />
83
Bibliography<br />
Snakes. IEEE Transactions on Image Processing, 13(9):1231–1244, September 2004.<br />
URL http://ip.beckman.uiuc.edu/publications.html.<br />
JMF. Java�Media Framework API Guide, jmf 2.0 fcs edition, November 1999.<br />
M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active Contour Models. Interna-<br />
tional Journal of Computer Vision,, 1:321–331, 1988.<br />
V. Krüger, A. Happe, and G. Sommer. Affine Real-Time Face Tracking Using Gabor<br />
Wavelet Networks. In ICPR00: Proceedings of the International Conference on<br />
Pattern Recognition (ICPR00), volume 1, page 1127, Washington, DC, USA, 2000.<br />
IEEE Computer Society.<br />
B. Mackiewich. Intracranial Boundary Detection and Radio Frequency Correction in<br />
Magnetic Resonance Images. Master’s thesis, Simon Fraser University, August 1995.<br />
URL http://www.cs.sfu.ca/~stella/papers/blairthesis/main/main.html.<br />
B. S. Morse. Lecture 11: Shape Representation: Regions (Moments). http://bryan.<br />
cs.byu.edu/650/home/index.php, January 2004. Course material for ‘Computer<br />
Vision’ at Brigham Young University.<br />
R. Mukundan and K. R. Ramakrishnan. Moment Functions in Image Analysis. World<br />
Scientific, 1998.<br />
L. Rocha, L. Velho, and P. C. P. Carvalho. Image Moments-Based Structuring and<br />
Tracking of Objects. sibgrapi, 00:99, 2002.<br />
H. Sahbi and N. Boujemaa. Coarse to Fine Face Detection Based on Skin Color<br />
Adaption. In ECCV ’02: Proceedings of the International ECCV 2002 Work-<br />
shop Copenhagen on Biometric Authentication, pages 112–120, London, UK, 2002.<br />
Springer-Verlag. ISBN 3-540-43723-1. URL http://www-rocq.inria.fr/imedia/<br />
Articles/23590112.pdf.<br />
T. Smith. When to use and not to use the two-sample t-test. http://<br />
www.ubht.nhs.uk/R\&D/RDSU/Statistical%20Tutorials/t-tests.pdf, Novem-<br />
ber 2002. URL http://www.ubht.nhs.uk/R&D/RDSU/Statistical%20Tutorials/<br />
statistical_tutorials.htm. Research and Effectiveness Department (United<br />
Bristol Healthcare NHS Trust).<br />
84
Glossary<br />
M. Sonka, V. Hlavac, and R. Boyle. Image Processing, Analysis, and Machine Vision.<br />
PWS Publishing, second edition, 1999.<br />
D. Terzopoulos and K. Waters. Analysis and Synthesis of Facial Image Sequences<br />
Using Physical and Anatomical Models. IEEE Trans. Pattern Anal. Mach. Intell.,<br />
15(6):569–579, 1993. ISSN 0162-8828.<br />
vidiator. Using FaceStation 2. http://www.vidiator.com/support/<br />
facestationdocs/index.html, 2004.<br />
H. Wu, Q. Chen, and M. Yachida. Face Detection From Color Images Using a Fuzzy<br />
Pattern Matching Method. IEEE Trans. Pattern Anal. Mach. Intell., 21(6):557–563,<br />
1999. ISSN 0162-8828.<br />
X. Xie and M. Mirmehdi. Geodesic Colour Active Contour Resistent to Weak<br />
Edges and Noise. In Proceedings of the 14th British Machine Vision Conference,<br />
pages 399–408. BMVA Press, September 2003. URL http://www.cs.bris.ac.uk/<br />
Publications/Papers/2000034.pdf.<br />
J. Zobel. Writing for Computer Science. Springer, 2 edition, 2004.<br />
85
Glossary<br />
ANOVA Analysis of Variance. A series of statistical procedures for examining<br />
differences in means and for partitioning variance, 74, 75<br />
API Application Programming Interface. A defined set of calling con-<br />
ventions allowing a software application to access a particular set of<br />
services, 36, 37, 39, 49<br />
AU Action Unit. The front-end interface and navigation design of an<br />
application, 9, 10, 68<br />
BSP Binary Space Partitioning. A technique for the division of geomet-<br />
rical objects. It is mainly used in game engines of computer games,<br />
20, 35, 51, 54–56, 64, 67, 80<br />
CSV Comma Separated Values. A file format used as a portable represen-<br />
tation of a database, 64<br />
FACS Facial Action Coding System. The front-end interface and navigation<br />
design of an application, 9–11, 26, 27<br />
FAP Facial Animation Parameters. Feature points used for facial anima-<br />
tion that are standardized in the MPEG-4 standard, 2<br />
FBX A platform-independent 3D authoring and interchange format., 23<br />
GUI Graphical User Interface. The front-end interface and navigation<br />
design of an application, 3, 50, 51, 54, 60, 64<br />
86
Glossary<br />
JAI The Java Advanced Imaging API. An optional package extending the<br />
Java 2 Platform, providing additional capabilities for running image<br />
processing applications and imaging applets in Java, 47–49, 62, 63,<br />
78–80<br />
JMF The Java Media Framework API. An optional package extending the<br />
Java 2 Platform that enables audio, video and other time-based media<br />
to be added to applications and applets built on Java technology, 3,<br />
36, 37, 40, 41, 49, 60–63, 80, 81<br />
JNI Java Native Interface. A programming framework that allows Java<br />
code running in the Java VM to call and be called by native appli-<br />
cations and libraries written in other languages, 36<br />
JVM Java Virtual Machine. A piece of software that converts Java byte-<br />
code into machine language and executes it, 39, 77, 78<br />
LoG Laplacian of Gaussian. A convolution operator using a Gaussian<br />
image smoothing and a second derivative Laplace operator, 45<br />
PCU Portable Control Unit. Desktop control of lightsource; external power<br />
supply replaces internal PC backpanel power supply, 23, 25<br />
QTJava Quicktime for Java. In a different context it is also: Java QT binding,<br />
39<br />
SDK Software Development Kit. A programming package that enables<br />
a programmer to develop applications for a specific platform. Java<br />
SDK versions below 1.2 and version 1.5 are called JDK, 23, 25, 40,<br />
60<br />
87
List of Figures<br />
0.1. Not standardized output of facial feature localization. . . . . . . . . . 2<br />
1.1. Basic facial feature tracking workflow. . . . . . . . . . . . . . . . . . . 7<br />
1.2. Optical flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8<br />
1.3. Standard face model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9<br />
1.4. Feature point displacements. . . . . . . . . . . . . . . . . . . . . . . . 10<br />
1.5. Control-theoretic mapping of optical flow. . . . . . . . . . . . . . . . . 10<br />
1.6. A closed snake. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br />
1.7. An example of the movement of a point in a snake. . . . . . . . . . . . 13<br />
1.8. Snakes and fiducial points used for muscle contraction estimation. . . 14<br />
1.9. Weak-edge leakage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15<br />
1.10. Example for moment calculations and shape representation. . . . . . . 16<br />
1.11. Object fitting by 2 k ellipses at each level. . . . . . . . . . . . . . . . . 20<br />
1.12. X-IST FaceTracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23<br />
1.13. VeeAnimator in action. . . . . . . . . . . . . . . . . . . . . . . . . . . . 24<br />
2.1. Overview of SplineSnake results. . . . . . . . . . . . . . . . . . . . . . 30<br />
2.2. SplineSnake interference. . . . . . . . . . . . . . . . . . . . . . . . . . . 30<br />
2.3. Overview of snake results. . . . . . . . . . . . . . . . . . . . . . . . . . 32<br />
3.1. Top view of camera layout used for recordings. . . . . . . . . . . . . . 41<br />
3.2. Two types of binary image regions applicable for the tracking algorithm. 43<br />
3.3. Function with intensity change, its first and second derivative. . . . . . 44<br />
3.4. Multi-stage canny edge detection process. . . . . . . . . . . . . . . . . 46<br />
3.5. Workflow of fetching individual pixels with Java2D. . . . . . . . . . . 47<br />
4.1. Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53<br />
4.2. Overview of the basic application workflow . . . . . . . . . . . . . . . 54<br />
4.3. Angle calculation for raster subdivision. . . . . . . . . . . . . . . . . . 60<br />
88
List of Figures<br />
5.1. Tracking results for selective regions. . . . . . . . . . . . . . . . . . . . 66<br />
5.2. Tracking result for 6 area selections. . . . . . . . . . . . . . . . . . . . 67<br />
5.3. Good and bad results. . . . . . . . . . . . . . . . . . . . . . . . . . . . 68<br />
5.4. Positions of the corners of the mouth. . . . . . . . . . . . . . . . . . . 69<br />
5.5. Preprocessing of mouth region. . . . . . . . . . . . . . . . . . . . . . . 70<br />
5.6. d 2 of subsequent video frames. . . . . . . . . . . . . . . . . . . . . . . 72<br />
5.7. Tracking time consumption. . . . . . . . . . . . . . . . . . . . . . . . . 77<br />
5.8. Preprocessing time consumption. . . . . . . . . . . . . . . . . . . . . . 78<br />
89
List of Tables<br />
1.1. Comparison of commercial products . . . . . . . . . . . . . . . . . . . 25<br />
2.1. SplineSnake parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . 29<br />
2.2. SplineSnake: Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />
2.3. Bodier snake parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 32<br />
2.4. Snake: Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33<br />
3.1. JMF 2.1.1 - Supported Video Formats . . . . . . . . . . . . . . . . . . 38<br />
5.1. Mean position of x/y coordinates. . . . . . . . . . . . . . . . . . . . . . 70<br />
5.2. Minima and maxima of x/y coordinates. . . . . . . . . . . . . . . . . . 71<br />
5.3. Sum, maximum and average of d 2 on subsequent frames. . . . . . . . . 71<br />
5.4. Correlation results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72<br />
5.5. Two-tailed paired samples t-test on d 2 . . . . . . . . . . . . . . . . . . . 73<br />
5.6. Sum, maximum and average of d 2 of different tester’s values. . . . . . 74<br />
5.7. Extrema of d 2 between the algorithm and each of the manual testers. . 75<br />
5.8. Contrast tests of d 2 between the algorithm and each of the manual testers. 75<br />
5.9. Tamhane post-hoc test on d 2 . . . . . . . . . . . . . . . . . . . . . . . . 76<br />
90
List of Procedures<br />
4.1. createBSPTree(levels) . . . . . . . . . . . . . . . . . . . . . . . . . . 56<br />
4.2. addPoint(x, y, val) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57<br />
4.3. calculateValues() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58<br />
4.4. subdivide(levels) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59<br />
91
A. Appendix<br />
A.1. Evaluation Data: Coordinates of Corners of the Mouth<br />
man1 man2 algo<br />
frame x1 y1 x2 y2 x1 y1 x2 y2 x1 y1 x2 y2<br />
1 238 220 287 220 238 221 287 221 238 220 288 219<br />
2 239 220 287 220 238 221 287 221 237 220 288 219<br />
3 238 220 287 220 238 221 287 221 237 220 288 219<br />
4 238 220 287 220 238 221 287 221 237 220 288 219<br />
5 238 219 287 220 238 221 288 221 237 220 287 219<br />
6 239 219 285 220 240 221 287 221 239 219 287 219<br />
7 241 219 284 219 243 220 283 220 244 221 285 218<br />
8 246 220 282 220 246 220 282 220 245 221 285 219<br />
9 247 220 282 220 248 221 281 221 246 221 284 218<br />
10 247 221 282 221 249 221 281 221 246 220 282 219<br />
11 249 221 281 221 250 221 281 221 249 220 282 219<br />
12 249 221 281 221 250 222 280 222 250 221 282 219<br />
13 249 221 281 221 250 222 280 222 250 223 283 219<br />
14 248 221 281 221 250 222 280 222 250 223 282 219<br />
15 248 221 280 221 250 222 280 222 250 223 282 218<br />
16 249 221 280 221 250 222 280 222 251 224 282 218<br />
17 248 220 280 221 250 222 280 222 250 220 283 218<br />
18 248 221 280 221 250 222 280 222 250 221 283 218<br />
19 248 221 279 221 250 222 280 222 250 221 283 218<br />
20 249 221 281 221 250 222 280 222 250 224 283 218<br />
21 248 221 280 221 250 222 280 222 250 224 283 218<br />
22 248 221 280 221 250 222 280 222 250 221 283 218<br />
23 248 221 280 221 249 222 280 222 249 223 283 218<br />
24 249 221 280 221 249 222 280 222 248 222 283 218<br />
25 248 221 280 221 250 222 280 222 248 221 280 217<br />
26 248 221 280 221 250 222 279 222 249 221 281 218<br />
27 248 221 280 220 251 222 279 221 249 221 280 217<br />
28 249 221 280 221 250 222 280 221 249 220 280 217<br />
29 248 221 280 221 250 222 280 221 250 222 278 216<br />
30 248 221 280 221 250 222 280 221 250 222 280 217<br />
31 249 221 280 221 250 222 280 222 248 221 282 218<br />
32 249 220 280 221 250 222 280 222 251 223 279 217<br />
33 249 220 281 220 249 222 280 222 249 221 280 217<br />
34 247 220 281 220 248 221 281 222 246 221 281 216<br />
35 243 220 284 220 245 221 283 221 244 220 284 218<br />
36 241 219 286 219 242 220 284 220 244 220 287 218<br />
37 240 219 286 219 240 220 286 220 240 219 287 218<br />
38 239 218 286 218 239 219 287 219 238 219 288 218<br />
39 238 218 286 218 239 219 287 219 237 218 289 218<br />
40 239 218 286 219 238 219 287 219 237 218 288 218<br />
41 238 219 286 218 238 220 287 219 238 219 289 218<br />
42 239 219 286 219 238 220 287 219 237 219 289 218<br />
43 238 219 287 219 239 220 287 220 238 219 288 218<br />
44 238 219 286 219 239 220 287 220 237 219 288 218<br />
45 238 219 286 219 239 220 287 220 238 219 289 218<br />
46 239 219 286 219 240 220 287 220 238 219 288 219<br />
47 238 219 286 219 240 220 287 220 238 219 288 218<br />
48 239 219 286 219 240 220 287 220 238 220 288 218<br />
49 238 220 287 219 240 220 287 220 237 220 288 219<br />
50 238 220 287 219 240 220 287 220 238 220 287 219<br />
51 238 219 286 219 239 220 286 221 237 220 288 219<br />
52 238 219 286 219 238 221 286 221 238 220 288 219<br />
53 239 219 286 219 238 221 286 221 238 220 288 219<br />
54 239 219 287 219 238 221 286 221 238 220 288 219<br />
55 239 219 286 219 238 221 286 221 238 220 287 218<br />
56 239 219 286 220 238 221 286 221 237 220 287 219<br />
57 239 219 286 220 238 221 286 221 237 220 287 219<br />
58 239 219 286 220 238 221 286 221 238 220 287 219<br />
59 240 219 287 219 239 220 285 221 238 220 287 219<br />
60 240 219 285 219 240 220 285 221 240 220 286 218<br />
continued on next page..<br />
92
A. Appendix<br />
man1 man2 algo<br />
frame x1 y1 x2 y2 x1 y1 x2 y2 x1 y1 x2 y2<br />
61 243 219 283 219 243 220 283 221 243 220 283 217<br />
62 246 220 280 219 246 220 282 221 245 220 282 218<br />
63 246 220 280 220 247 221 281 221 246 221 281 218<br />
64 247 220 280 220 248 221 280 221 249 221 281 218<br />
65 247 220 279 220 248 222 280 222 249 223 279 217<br />
66 248 220 280 220 248 222 280 222 248 222 279 217<br />
67 248 220 279 220 248 222 280 222 248 221 279 217<br />
68 247 220 279 220 248 222 280 222 247 220 279 217<br />
69 247 220 279 220 248 222 280 222 249 220 280 217<br />
70 247 220 279 220 248 222 280 222 249 222 279 217<br />
71 247 219 279 220 248 222 280 222 248 222 281 217<br />
72 247 220 279 221 248 222 279 222 248 222 278 216<br />
73 246 220 279 220 248 221 279 221 248 219 278 219<br />
74 247 220 280 220 248 221 279 221 248 221 280 217<br />
75 247 219 279 220 248 221 279 221 247 220 280 217<br />
76 249 220 280 220 248 221 280 221 247 220 282 217<br />
77 248 220 279 220 247 221 280 221 247 220 281 217<br />
78 248 219 280 219 247 221 280 221 247 220 281 217<br />
79 248 220 280 220 247 221 280 221 247 220 282 217<br />
80 247 219 279 220 247 221 280 221 247 220 280 216<br />
81 247 219 279 220 247 221 280 221 247 220 281 217<br />
82 247 220 279 220 247 221 280 221 247 220 282 217<br />
83 247 220 280 220 247 221 280 221 247 220 282 217<br />
84 248 219 280 220 247 221 280 221 247 220 282 217<br />
85 247 219 280 220 247 221 280 221 248 219 281 217<br />
86 249 220 279 220 246 221 281 221 248 219 280 216<br />
87 249 220 280 220 246 221 281 221 250 221 280 216<br />
88 247 219 280 220 246 221 282 221 250 222 280 216<br />
89 245 219 282 220 244 220 283 220 245 220 283 217<br />
90 243 219 284 219 242 220 284 220 243 219 285 217<br />
91 240 219 286 219 241 220 286 220 241 219 286 217<br />
92 239 219 286 218 240 220 287 220 239 219 288 217<br />
93 238 219 286 218 238 220 287 220 237 219 289 217<br />
94 238 218 286 218 238 220 287 220 237 219 289 218<br />
95 237 218 287 218 237 220 287 220 236 218 289 218<br />
96 238 218 286 218 237 219 288 219 236 218 289 217<br />
97 238 218 287 218 237 219 288 219 236 219 289 217<br />
98 236 218 287 218 237 219 288 219 236 219 289 217<br />
99 237 218 287 218 237 219 288 219 236 219 289 218<br />
100 237 218 287 218 237 219 288 219 236 219 289 218<br />
101 237 218 287 218 237 219 288 219 236 219 289 218<br />
102 237 218 287 218 237 219 288 219 235 220 289 218<br />
103 237 218 287 219 237 219 288 219 236 219 289 218<br />
104 237 218 287 218 237 219 288 219 236 219 289 218<br />
105 237 218 287 218 237 219 288 219 236 219 289 218<br />
106 236 218 288 218 237 219 288 219 236 219 288 218<br />
107 236 218 287 219 237 219 288 219 236 219 289 218<br />
108 238 218 287 218 237 219 288 219 236 219 289 218<br />
109 238 218 286 218 237 219 288 219 236 219 289 217<br />
110 238 218 287 218 237 219 288 219 236 219 289 218<br />
111 237 218 287 218 237 219 288 219 236 219 289 218<br />
112 237 218 286 218 237 219 288 219 236 219 289 218<br />
113 237 219 287 218 237 219 288 219 236 219 289 218<br />
114 238 218 286 218 237 219 288 219 236 219 289 217<br />
115 237 218 286 218 237 220 288 220 236 219 289 218<br />
116 238 219 286 218 237 220 288 220 236 219 289 218<br />
117 238 219 286 218 238 220 288 220 236 219 289 218<br />
118 240 219 284 219 240 220 285 220 239 219 286 218<br />
119 242 219 282 219 242 220 283 220 241 220 285 218<br />
120 245 220 280 219 244 221 281 221 244 220 282 218<br />
121 245 220 279 220 245 221 279 221 245 221 278 218<br />
122 246 220 278 220 245 221 278 221 245 221 278 217<br />
123 246 220 279 220 245 221 278 221 246 219 279 217<br />
124 246 220 278 220 245 221 278 221 246 220 277 216<br />
125 247 220 278 220 245 221 278 221 246 220 277 216<br />
126 247 220 278 220 245 221 279 221 246 220 278 216<br />
127 247 220 278 220 245 221 279 221 246 219 277 216<br />
128 246 219 279 220 245 221 279 221 246 219 280 216<br />
129 246 219 279 220 245 221 279 221 247 219 280 216<br />
130 247 219 279 220 245 221 279 221 247 218 282 216<br />
131 247 219 279 219 246 221 279 221 247 218 281 216<br />
132 245 219 280 219 246 221 279 221 247 218 281 216<br />
133 248 219 279 219 246 221 279 221 247 218 281 216<br />
134 246 219 279 219 246 221 279 221 247 218 280 215<br />
135 247 219 279 219 246 221 279 221 246 219 277 215<br />
136 247 219 279 219 246 221 279 221 246 220 277 215<br />
93
A. Appendix<br />
A.2. Evaluation Data: Tracking Mouth Area Selection (60x20)<br />
We performed 20 test-runs with the implemented Java feature tracker, selecting a<br />
mouth region with 60x20 pixels. 10 runs were done with standard JVM options, 10<br />
were done with the following tuning: -verbose:gc -Xms64m -Xmx256m -XX:NewRatio=2<br />
-XX:+UseConcMarkSweepGC.<br />
The measurement unit is milliseconds.<br />
frame JVM tuned JVM standard options<br />
1 27 48 26 26 52 69 70 69 28 69 46 31 50 32 30 24 45 46 48 47<br />
2 19 13 25 33 13 13 15 33 13 33 29 31 29 29 28 30 33 17 20 29<br />
3 22 24 39 38 22 24 22 36 21 45 32 31 32 33 33 33 34 27 23 32<br />
4 8 8 11 12 8 8 7 12 8 12 10 10 11 10 9 10 10 8 8 12<br />
5 34 29 22 22 30 26 19 22 29 22 25 17 24 24 18 24 25 18 17 22<br />
6 7 7 10 8 12 10 6 8 11 11 7 7 8 7 8 7 7 18 8 7<br />
7 5 6 6 6 5 6 7 5 5 6 6 6 5 6 4 5 5 6 7 5<br />
8 20 18 6 6 36 7 7 6 7 8 6 6 17 6 6 17 5 6 6 6<br />
9 6 6 20 7 6 8 6 8 5 10 7 6 8 6 7 8 7 8 8 6<br />
10 10 6 9 11 10 12 9 10 10 11 10 9 9 19 9 10 9 9 9 9<br />
11 54 12 9 7 16 9 7 7 7 40 9 7 9 9 10 9 9 8 8 9<br />
12 6 6 7 5 4 8 6 4 5 7 5 6 5 5 6 4 5 7 5 5<br />
13 10 9 5 8 10 15 9 23 9 9 9 9 9 10 9 9 8 9 9 10<br />
14 28 6 6 6 7 7 4 8 5 5 8 5 10 6 6 8 6 7 7 6<br />
15 3 3 9 3 4 4 4 4 2 6 3 4 4 4 4 4 3 4 4 4<br />
16 5 16 4 3 3 4 3 4 18 4 4 3 4 3 4 4 4 3 3 4<br />
17 8 4 4 4 17 4 4 4 4 3 4 5 4 4 5 4 4 4 4 4<br />
18 3 3 19 2 2 2 2 3 3 5 3 2 3 2 3 3 2 3 2 3<br />
19 3 3 3 3 2 3 3 3 3 5 3 3 3 2 3 3 3 3 3 3<br />
20 5 3 4 3 17 2 2 2 4 4 4 4 6 4 4 3 4 4 4 4<br />
21 5 3 6 17 3 3 4 3 3 6 3 4 3 3 3 3 3 3 3 2<br />
22 4 3 4 4 4 4 18 3 4 5 4 4 4 3 5 4 4 4 4 4<br />
23 8 9 4 4 4 18 5 18 5 5 6 6 5 5 5 5 7 7 6 5<br />
24 5 6 6 5 5 4 6 5 4 19 5 6 6 6 6 6 5 4 5 6<br />
25 5 22 21 7 20 21 5 21 4 4 8 8 8 8 8 9 8 8 8 8<br />
26 5 4 2 4 2 3 4 2 3 2 2 2 2 2 2 3 3 2 2 3<br />
27 3 2 5 3 2 3 3 2 3 4 2 2 3 3 3 2 3 3 2 2<br />
28 3 2 3 2 2 2 3 7 3 2 3 2 2 3 2 3 3 3 3 2<br />
29 5 2 2 2 7 2 3 2 2 3 3 3 2 3 2 2 2 3 3 2<br />
30 3 3 56 3 15 5 16 3 16 25 3 3 3 3 3 3 3 3 3 3<br />
31 6 5 2 3 2 3 3 3 6 2 3 3 3 3 2 2 3 2 2 2<br />
32 6 3 2 3 3 4 9 4 3 4 3 4 3 3 3 3 3 3 3 3<br />
33 5 6 8 6 6 19 2 7 4 6 6 16 6 7 6 7 7 6 7 6<br />
34 3 3 3 3 2 3 3 3 16 3 3 6 3 3 3 3 3 3 3 3<br />
35 6 4 3 4 17 3 20 17 3 4 4 4 3 4 4 3 3 4 4 4<br />
36 3 3 6 5 3 3 2 3 17 6 4 3 4 4 3 3 3 4 4 4<br />
37 3 4 4 3 6 3 3 2 3 3 3 4 3 3 3 3 4 3 3 3<br />
38 5 3 3 3 3 3 3 3 3 18 3 2 3 3 3 3 3 4 3 3<br />
39 4 3 6 3 3 3 3 4 6 6 4 3 3 2 3 3 3 3 3 2<br />
40 6 3 19 3 4 3 3 4 3 3 3 3 3 4 3 4 2 3 3 3<br />
41 5 3 2 3 3 2 6 3 2 3 14 3 3 2 3 4 2 3 3 3<br />
42 3 3 5 3 3 4 2 3 3 5 3 3 2 3 3 3 3 3 3 3<br />
43 2 3 2 2 2 2 2 3 2 2 2 3 2 2 2 2 2 3 2 2<br />
44 17 16 2 3 3 3 3 3 3 3 3 2 3 2 2 3 3 3 3 2<br />
45 3 3 6 3 3 3 3 3 3 5 3 2 3 2 2 3 3 2 3 3<br />
46 3 3 4 2 3 3 4 2 3 2 3 3 2 3 3 3 2 2 2 2<br />
47 4 2 2 2 2 2 2 2 2 2 2 4 2 2 2 3 2 2 2 2<br />
48 2 2 4 15 2 2 3 2 2 4 2 3 2 3 2 2 2 2 2 2<br />
49 2 3 8 8 7 8 14 2 7 7 7 8 7 8 7 8 7 8 8 8<br />
50 4 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 3<br />
51 3 2 17 2 2 2 2 2 2 18 2 3 2 2 2 2 2 2 2 2<br />
52 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 1 2 2 2<br />
53 4 2 2 3 2 2 3 2 2 20 3 2 2 2 1 1 2 3 2 1<br />
54 6 3 5 3 3 2 2 3 3 5 3 3 3 3 2 3 3 3 2 3<br />
55 2 2 2 2 2 2 17 2 2 2 2 3 2 2 3 2 2 2 2 2<br />
56 5 2 1 2 2 2 3 2 3 2 2 3 1 2 2 3 2 2 2 2<br />
57 3 2 4 1 2 2 3 2 14 3 2 3 2 1 3 2 2 3 2 2<br />
58 2 15 16 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2<br />
59 4 2 2 2 2 2 2 2 1 4 1 2 2 2 2 2 2 2 2 2<br />
60 2 2 4 2 2 2 2 2 2 16 3 2 2 2 2 2 2 2 2 2<br />
61 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2<br />
continued on next page..<br />
94
A. Appendix<br />
frame JVM tuned JVM standard options<br />
62 6 4 18 2 3 3 2 4 3 3 3 3 2 3 2 3 2 3 3 2<br />
63 4 4 6 3 3 4 3 4 3 19 3 3 3 2 4 3 3 3 4 4<br />
64 3 2 3 3 16 15 2 3 15 2 2 2 3 3 3 2 2 3 3 3<br />
65 21 2 14 2 5 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2<br />
66 2 2 4 2 2 16 2 2 2 4 2 2 2 2 1 2 2 1 2 2<br />
67 2 2 1 2 2 2 3 2 2 2 2 3 1 2 2 1 2 2 2 1<br />
68 4 2 2 2 2 2 1 2 2 2 3 2 2 1 1 2 2 1 2 2<br />
69 2 15 7 2 3 15 2 2 1 17 2 3 2 2 2 2 1 2 2 2<br />
70 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 1 1<br />
71 4 8 1 2 2 2 2 8 2 2 2 2 2 2 2 2 3 1 2 2<br />
72 2 3 5 3 3 2 2 2 3 17 3 1 3 3 3 2 3 3 2 3<br />
73 15 1 2 2 2 2 2 2 2 2 2 3 2 3 1 2 2 2 2 2<br />
74 4 1 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2<br />
75 16 16 5 4 3 3 2 3 3 6 3 7 3 3 3 3 3 3 3 3<br />
76 2 2 4 2 17 3 16 3 4 5 2 5 2 2 1 2 2 2 2 1<br />
77 4 5 2 2 2 2 2 2 2 1 4 2 4 1 4 4 8 4 5 4<br />
78 2 3 4 3 3 2 1 2 3 17 3 2 2 2 2 3 3 3 3 3<br />
79 2 2 1 2 1 2 2 1 2 3 2 1 2 2 2 3 2 2 2 2<br />
80 4 2 1 2 1 1 2 1 3 1 1 2 2 2 2 1 1 3 2 2<br />
81 3 3 5 6 2 3 2 3 2 5 3 2 3 3 2 3 3 3 3 2<br />
82 3 2 3 2 18 3 2 2 2 3 3 3 2 2 3 3 3 2 2 2<br />
83 4 2 2 3 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2<br />
84 14 2 4 1 2 2 2 1 2 4 2 2 2 2 2 2 2 2 2 2<br />
85 6 3 3 3 3 3 4 3 3 3 3 4 3 4 3 3 3 3 3 3<br />
86 4 2 2 2 2 15 8 2 1 2 2 2 2 1 2 2 2 2 2 2<br />
87 3 3 5 2 2 3 2 4 2 5 3 3 3 3 3 2 2 2 2 3<br />
88 2 2 2 1 1 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2<br />
89 19 3 3 4 3 4 3 4 3 5 3 3 3 3 4 3 3 4 3 5<br />
90 2 2 4 2 2 2 2 2 2 4 2 2 2 2 2 1 2 3 2 1<br />
91 3 3 5 3 3 3 3 3 3 3 4 3 3 4 3 3 3 3 4 3<br />
92 17 2 2 3 2 3 2 2 3 2 4 4 3 2 2 2 2 4 3 4<br />
93 1 1 3 2 2 7 2 2 2 4 2 2 2 2 2 2 2 2 1 2<br />
94 4 3 2 3 3 3 6 4 3 15 3 7 3 6 5 3 3 3 4 3<br />
95 4 2 2 2 2 2 2 2 3 2 2 2 2 3 2 2 2 2 2 2<br />
96 2 2 4 3 2 2 17 2 2 5 3 3 2 2 3 7 3 2 3 3<br />
97 4 3 3 5 4 3 3 3 3 3 4 3 3 3 3 3 3 3 4 3<br />
98 17 4 4 5 3 16 2 3 4 4 3 2 3 3 3 6 2 2 3 3<br />
99 2 15 4 2 2 2 1 2 2 4 2 2 2 2 2 2 2 2 2 2<br />
100 2 2 15 3 2 3 2 2 3 6 3 2 2 2 4 2 3 3 3 3<br />
101 6 3 3 3 2 4 3 2 3 3 5 3 3 5 4 5 4 4 4 4<br />
102 2 2 4 2 2 2 1 2 2 4 2 2 2 2 2 2 3 3 3 2<br />
103 3 3 3 3 3 3 3 2 3 3 5 2 2 2 2 3 3 6 5 3<br />
104 4 1 2 2 2 2 2 2 2 3 2 2 2 1 2 2 1 2 2 2<br />
105 2 2 4 1 2 2 2 2 2 4 2 2 2 2 2 2 1 2 2 2<br />
106 14 1 2 2 2 2 2 2 15 2 2 2 2 2 2 2 2 2 2 1<br />
107 4 2 2 2 2 2 2 1 2 1 2 2 2 2 2 2 2 2 2 1<br />
108 2 2 3 2 2 2 2 2 2 17 2 2 2 2 1 1 2 1 2 2<br />
109 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2<br />
110 4 2 2 2 3 1 2 2 2 2 2 2 1 2 2 2 2 2 3 1<br />
111 3 3 4 3 15 3 3 2 2 4 2 3 3 3 3 2 2 2 3 4<br />
112 2 1 1 2 1 2 16 2 2 2 2 2 1 1 2 2 2 2 2 1<br />
113 4 2 3 2 3 2 2 3 2 2 1 4 3 1 4 2 4 2 2 2<br />
114 2 2 4 1 1 2 2 2 2 4 1 2 2 2 2 3 2 2 2 1<br />
115 3 3 3 2 4 20 5 4 3 3 3 3 3 2 3 3 3 3 3 3<br />
116 5 3 3 4 3 3 18 3 4 3 4 4 4 5 5 3 3 4 3 3<br />
117 1 2 4 2 1 1 1 2 1 4 2 2 2 3 2 2 2 2 2 2<br />
118 2 2 6 2 2 2 1 2 2 2 2 2 1 1 2 2 2 2 2 1<br />
119 3 2 2 2 2 2 1 2 1 2 2 2 3 2 2 2 2 3 2 2<br />
120 16 3 5 3 3 3 15 3 3 5 3 2 3 3 3 3 4 4 4 3<br />
121 2 19 2 2 3 3 2 3 3 4 4 3 3 2 2 3 2 3 3 2<br />
122 4 2 2 2 1 1 3 2 2 2 2 3 2 2 2 1 2 2 2 2<br />
123 3 3 19 3 3 3 3 3 3 5 3 2 3 3 3 5 3 4 3 3<br />
124 3 16 3 3 3 3 2 3 3 5 3 3 3 3 3 4 3 4 3 3<br />
125 18 15 3 3 3 2 3 15 3 2 3 2 2 4 2 3 2 2 2 3<br />
126 2 2 4 2 2 1 3 2 1 4 2 3 2 2 2 1 2 1 2 1<br />
127 3 3 3 3 3 3 3 3 3 3 3 3 2 4 4 7 3 3 7 3<br />
128 47 2 3 3 3 3 2 3 3 2 2 2 3 2 3 3 2 3 3 3<br />
129 2 2 4 2 2 2 2 2 2 17 2 2 2 2 2 2 2 2 2 2<br />
130 4 3 4 3 3 4 2 3 3 4 3 2 3 4 3 3 5 4 4 4<br />
131 7 2 2 3 3 15 2 3 3 3 2 3 2 2 3 3 2 2 3 2<br />
132 2 1 15 2 2 2 2 2 1 3 2 3 2 2 2 2 2 2 3 2<br />
133 3 7 3 2 3 3 3 2 4 3 3 3 3 3 3 14 3 3 5 3<br />
134 6 4 3 4 17 16 2 16 17 3 3 2 4 3 3 5 4 5 4 3<br />
135 2 1 4 2 2 2 1 2 2 4 2 2 2 2 2 2 1 2 2 2<br />
136 2 2 3 2 2 15 2 1 14 2 2 2 2 2 2 1 2 1 2 2<br />
95
A. Appendix<br />
A.3. Evaluation Data: Tracking of Whole Area Selection (384x288)<br />
We performed 20 test-runs with the implemented Java feature tracker, selecting the<br />
complete image region with 384x288 pixels. 10 runs were done with standard JVM<br />
options, 10 were done with the following tuning: -verbose:gc -Xms64m -Xmx256m<br />
-XX:NewRatio=2 -XX:+UseConcMarkSweepGC.<br />
The measurement unit is milliseconds.<br />
frame JVM tuned JVM standard options<br />
1 55 61 135 112 57 126 147 133 120 108 71 117 128 67 67 69 71 124 133 70<br />
2 35 75 66 60 28 75 33 53 31 69 59 59 33 37 38 58 34 58 37 33<br />
3 32 49 53 53 30 49 38 58 31 56 47 49 42 43 43 49 44 47 42 42<br />
4 31 44 57 48 39 52 29 49 37 53 43 44 44 43 43 38 44 38 44 43<br />
5 24 34 31 30 23 29 28 35 27 28 39 35 29 29 40 35 29 37 29 35<br />
6 34 48 27 28 24 27 25 29 38 27 33 33 31 29 36 33 32 33 28 32<br />
7 25 25 29 24 23 27 27 24 25 31 31 29 32 32 34 28 28 32 32 27<br />
8 26 33 42 29 26 26 27 27 26 23 26 30 30 36 30 29 29 29 29 29<br />
9 27 21 24 21 32 20 21 21 22 21 32 28 28 37 37 28 29 28 28 27<br />
10 30 25 30 28 24 29 27 32 28 23 127 126 127 142 128 124 133 124 125 124<br />
11 21 25 24 23 26 21 24 20 23 20 31 30 24 30 34 36 30 25 29 30<br />
12 25 24 25 24 23 26 25 23 24 24 30 25 24 31 30 32 31 28 24 33<br />
13 30 26 26 23 39 24 25 31 26 26 36 36 37 37 36 36 38 39 36 35<br />
14 25 31 26 43 17 24 23 22 18 22 123 134 122 126 121 127 123 128 126 123<br />
15 23 20 20 20 26 20 21 20 21 19 31 30 29 28 28 29 30 29 29 30<br />
16 23 23 24 28 25 22 23 22 24 22 29 30 31 30 22 31 30 29 35 30<br />
17 19 23 23 19 39 19 21 23 19 18 25 26 34 30 31 30 25 25 25 25<br />
18 24 23 18 27 18 17 23 18 23 18 123 123 138 123 140 122 123 123 127 130<br />
19 23 20 24 24 23 20 21 21 21 22 29 31 35 30 29 31 30 33 30 29<br />
20 29 25 24 22 23 24 27 23 27 24 56 31 22 30 29 30 30 30 24 34<br />
21 26 18 20 19 27 18 19 22 19 18 23 24 25 25 26 24 27 24 23 23<br />
22 19 23 18 22 19 17 18 18 17 28 123 123 122 123 130 123 122 123 123 124<br />
23 26 24 24 24 25 21 22 25 22 37 36 31 30 32 30 37 25 29 34 30<br />
24 25 24 26 23 26 23 26 24 25 25 31 31 30 34 31 23 25 30 29 31<br />
25 31 22 23 19 32 22 22 26 22 27 39 38 39 38 39 38 39 39 37 38<br />
26 21 22 19 18 17 24 20 30 18 17 123 123 133 126 122 125 125 133 123 124<br />
27 22 20 22 22 21 18 19 34 20 20 28 29 23 29 28 28 28 27 29 28<br />
28 20 17 29 26 29 21 25 22 17 25 21 29 21 21 21 22 23 21 21 20<br />
29 17 18 19 20 16 16 19 16 16 16 25 24 25 23 26 24 24 22 24 22<br />
30 21 19 23 17 16 16 18 17 18 16 142 125 135 129 123 138 134 133 145 136<br />
31 22 26 23 21 19 20 27 20 22 20 28 28 28 28 29 28 29 29 28 28<br />
32 25 20 22 29 21 25 16 22 17 25 20 29 20 22 20 22 21 22 22 32<br />
33 18 27 16 18 23 16 21 16 17 16 22 24 27 23 22 23 23 27 24 24<br />
34 17 24 23 33 29 18 20 22 20 17 137 135 122 126 123 124 135 124 124 134<br />
35 23 23 23 27 18 21 22 21 22 20 28 29 28 29 30 28 28 28 28 27<br />
36 24 19 22 28 33 21 16 28 16 21 20 21 22 20 22 21 23 21 20 20<br />
37 19 16 17 17 16 16 16 17 17 16 22 23 23 22 22 23 23 23 28 23<br />
38 17 22 18 18 20 16 20 16 20 17 135 123 164 134 125 135 142 133 121 134<br />
39 22 24 19 20 18 19 17 19 17 20 28 28 28 28 28 27 28 27 27 28<br />
40 22 23 17 21 22 17 17 16 17 16 20 21 21 28 27 21 28 29 21 21<br />
41 17 19 16 20 16 16 18 16 23 16 23 23 23 23 23 22 23 23 22 22<br />
42 16 22 18 18 16 18 23 22 24 23 124 130 134 121 133 123 134 133 124 140<br />
43 22 17 23 26 23 25 16 25 16 22 22 28 28 28 27 22 28 27 28 27<br />
44 17 19 20 17 20 16 21 17 21 16 20 22 27 21 20 21 21 27 28 20<br />
45 17 19 31 25 16 17 33 16 34 17 22 28 23 23 23 23 28 22 23 22<br />
46 19 17 20 37 19 19 16 19 17 19 123 129 134 133 134 134 128 122 134 136<br />
47 23 17 22 18 26 17 17 20 17 21 28 29 28 27 29 27 28 27 27 28<br />
48 27 20 18 18 20 17 19 17 19 17 20 22 21 33 20 21 21 27 28 21<br />
49 17 27 27 28 27 23 29 23 27 23 30 24 23 22 22 23 23 22 23 22<br />
50 23 20 23 22 22 22 17 22 16 22 124 123 123 122 126 122 122 123 134 140<br />
51 25 21 17 17 26 16 16 16 17 16 28 31 28 27 27 28 28 32 28 27<br />
52 17 19 20 18 17 16 19 17 19 16 28 29 27 28 29 28 29 27 21 33<br />
53 22 26 19 22 17 19 22 18 31 18 28 28 23 23 23 23 23 23 23 23<br />
54 20 16 21 21 19 20 17 21 17 21 123 123 126 123 129 122 126 136 134 123<br />
55 22 16 18 18 25 17 16 21 17 17 27 28 28 35 29 28 29 32 52 27<br />
56 20 22 20 22 16 21 23 17 20 21 27 28 28 28 28 21 28 34 22 28<br />
57 18 23 19 19 21 18 22 19 22 18 24 23 23 23 23 32 24 22 29 34<br />
58 20 25 21 21 23 21 17 20 16 20 136 129 128 130 123 129 138 128 130 124<br />
59 23 26 17 17 22 16 17 20 17 16 29 27 28 27 33 28 29 27 27 27<br />
60 17 17 21 20 16 16 20 21 20 16 28 28 33 28 21 28 34 28 27 30<br />
61 21 18 19 18 24 18 22 18 22 19 22 23 23 23 28 22 23 22 22 22<br />
continued on next page..<br />
96
A. Appendix<br />
frame JVM tuned JVM standard options<br />
62 26 20 24 20 20 20 17 23 16 20 134 123 122 129 123 130 135 123 138 134<br />
63 23 20 18 18 22 16 17 17 16 17 27 28 26 27 27 28 28 28 28 28<br />
64 17 17 17 20 16 17 19 16 19 17 21 28 29 28 27 27 27 28 21 28<br />
65 19 25 19 19 25 18 21 18 22 17 22 22 23 22 25 22 24 23 23 23<br />
66 20 23 25 21 19 20 16 21 17 21 137 135 139 124 134 122 141 129 123 138<br />
67 25 22 17 17 22 17 17 20 16 17 28 28 28 28 31 28 28 27 28 28<br />
68 20 17 17 21 20 17 23 16 23 16 29 29 30 29 34 29 30 30 29 29<br />
69 17 18 19 20 17 19 21 19 22 19 23 23 26 23 23 22 23 23 23 22<br />
70 21 21 24 21 19 20 17 37 16 20 124 126 122 123 122 122 123 122 124 125<br />
71 27 17 17 21 26 16 17 17 16 16 34 28 27 27 27 28 28 27 27 31<br />
72 20 21 17 17 20 16 19 17 19 17 29 30 30 29 29 29 30 30 28 29<br />
73 20 18 18 18 21 18 22 22 22 22 22 22 22 22 22 22 23 22 22 21<br />
74 20 22 22 22 24 22 17 21 17 21 135 141 134 134 135 134 125 133 139 136<br />
75 22 16 33 21 22 15 25 16 17 16 27 27 32 28 29 27 27 28 28 27<br />
76 17 21 17 22 17 16 19 24 20 17 27 28 27 32 21 32 28 22 27 28<br />
77 21 20 19 21 25 19 21 19 22 19 22 24 23 22 23 23 23 22 23 23<br />
78 20 25 26 30 19 26 17 21 16 21 124 124 135 123 131 138 134 123 124 130<br />
79 25 16 17 17 23 17 17 17 16 16 27 28 27 27 27 28 29 30 28 27<br />
80 17 22 22 21 16 17 20 25 20 16 30 30 30 29 29 29 30 29 29 29<br />
81 17 20 24 20 17 18 21 20 22 20 22 28 23 23 23 23 23 22 22 23<br />
82 20 22 22 23 19 25 16 22 16 25 124 125 129 123 129 124 134 136 123 123<br />
83 26 17 17 17 22 16 17 20 18 18 27 27 28 29 28 28 29 28 28 27<br />
84 16 27 21 23 16 17 19 22 20 21 28 30 29 30 29 29 30 29 30 29<br />
85 16 19 22 19 21 19 22 18 21 18 21 22 22 22 22 22 22 22 23 22<br />
86 19 24 17 16 19 15 16 16 16 16 142 138 135 136 136 134 133 140 136 135<br />
87 17 17 16 22 18 15 16 21 24 16 28 27 27 31 30 27 28 31 27 27<br />
88 17 18 28 18 20 17 19 21 18 17 21 28 21 20 21 20 21 21 21 20<br />
89 18 23 25 22 21 21 16 21 16 22 22 23 22 22 22 22 22 22 29 26<br />
90 22 16 20 19 21 15 17 16 16 16 134 135 137 141 135 145 138 133 129 123<br />
91 16 17 18 17 17 16 18 16 19 16 26 27 28 27 28 21 27 27 27 22<br />
92 17 17 17 20 16 18 20 18 20 18 19 21 21 20 21 20 20 20 20 20<br />
93 27 25 21 21 19 21 19 21 19 21 21 23 22 22 23 23 22 22 22 21<br />
94 17 22 17 16 16 16 16 15 16 21 135 135 122 139 135 140 135 135 133 134<br />
95 17 21 16 25 16 17 18 16 18 17 27 29 27 22 28 28 31 27 28 27<br />
96 18 22 23 19 18 18 23 19 20 18 20 21 20 20 20 20 21 20 20 28<br />
97 20 16 20 17 23 15 16 15 17 21 21 23 21 22 27 22 23 23 21 27<br />
98 17 16 16 16 20 15 16 19 16 15 136 135 135 122 122 139 141 133 128 134<br />
99 17 20 19 22 17 18 19 18 18 22 27 27 31 28 28 28 27 28 27 29<br />
100 18 27 20 20 19 20 21 20 21 19 27 29 32 27 28 28 28 28 27 27<br />
101 21 17 18 17 25 16 17 17 17 22 22 23 23 23 23 23 23 22 22 25<br />
102 21 16 17 17 19 16 17 20 17 16 135 133 123 138 134 122 123 124 137 124<br />
103 21 19 18 23 17 19 18 18 19 17 28 28 27 22 28 27 29 28 28 27<br />
104 19 24 21 36 18 20 21 21 20 19 29 30 30 30 30 29 29 29 30 29<br />
105 22 20 18 17 20 16 23 16 17 22 22 23 23 34 23 22 23 32 23 22<br />
106 17 16 16 17 17 17 16 20 16 16 135 129 135 127 138 122 123 123 123 141<br />
107 17 22 22 19 17 18 18 19 18 16 27 28 28 27 27 28 28 28 32 34<br />
108 20 24 21 21 18 20 20 20 21 20 29 30 29 29 29 34 30 29 29 29<br />
109 32 19 17 38 24 16 16 17 17 22 22 29 23 23 23 23 23 23 23 22<br />
110 19 16 17 17 17 16 16 21 16 17 124 126 136 135 125 123 135 140 135 127<br />
111 20 18 19 19 17 18 18 18 19 16 27 28 28 27 27 28 28 27 28 28<br />
112 19 20 20 21 19 20 20 22 20 18 29 30 29 29 29 29 29 29 29 29<br />
113 21 21 18 17 24 16 17 16 17 20 22 23 22 22 23 22 22 22 22 23<br />
114 17 16 18 17 17 17 16 20 17 16 136 136 129 135 129 135 123 140 124 127<br />
115 18 18 23 19 17 19 18 18 19 16 27 28 28 27 28 35 28 27 27 27<br />
116 22 31 21 20 19 20 21 21 21 18 29 34 30 29 29 29 30 29 29 29<br />
117 24 16 16 17 21 16 17 17 17 20 22 23 23 23 23 22 22 23 22 44<br />
118 17 16 17 17 16 16 16 21 16 29 135 138 135 142 138 135 134 138 140 126<br />
119 17 19 23 19 16 18 19 19 19 22 27 27 28 27 27 30 27 27 27 27<br />
120 19 24 21 21 18 20 20 20 21 18 29 29 28 29 29 29 29 29 29 33<br />
121 21 16 18 21 21 16 16 16 17 20 22 23 23 22 23 23 23 27 22 23<br />
122 16 17 16 18 17 16 16 20 17 17 134 134 123 124 135 139 136 123 124 126<br />
123 17 17 18 19 16 18 18 18 19 16 22 27 26 29 22 22 23 27 27 26<br />
124 18 21 27 22 17 21 22 21 22 18 20 21 20 27 20 20 21 27 20 19<br />
125 21 17 20 17 21 15 16 17 16 21 21 22 22 23 23 22 23 22 23 23<br />
126 17 20 21 17 19 16 16 24 16 16 136 134 134 135 133 133 141 133 134 133<br />
127 20 19 20 22 23 18 19 35 18 16 22 27 22 22 23 27 23 29 22 26<br />
128 19 15 17 18 18 16 17 16 16 18 20 20 20 20 21 20 27 20 20 20<br />
129 16 16 19 16 16 15 16 16 16 21 21 22 23 22 22 26 23 22 22 22<br />
130 20 18 21 18 17 18 18 18 17 16 136 136 133 134 135 135 133 122 134 136<br />
131 22 27 25 22 18 22 21 26 21 15 22 22 22 22 22 27 28 27 23 26<br />
132 23 22 16 24 21 15 16 15 16 19 20 24 27 20 20 20 21 21 20 23<br />
133 17 16 16 21 16 16 20 16 16 15 21 22 22 22 22 23 23 22 26 21<br />
134 17 18 23 19 16 18 18 18 19 15 135 141 133 140 134 134 135 140 135 134<br />
135 19 16 17 17 18 17 16 17 17 18 27 27 27 28 28 27 27 27 27 27<br />
136 20 17 17 17 23 17 17 17 16 21 27 27 27 32 30 27 29 31 31 27<br />
97
A. Appendix<br />
A.4. Evaluation Data: Canny Preprocessing<br />
We performed 20 test-runs with the class PreprocessingTimeTest (which is responsible<br />
for the wrong frame order). 10 runs were done with standard JVM options, 10 were<br />
done with the following tuning: -verbose:gc -Xms64m -Xmx256m -XX:NewRatio=2<br />
-XX:+UseConcMarkSweepGC.<br />
The measurement unit is milliseconds.<br />
frame JVM tuned JVM standard options<br />
3 202 205 229 217 225 203 218 222 214 202 343 412 345 352 345 348 342 346 358 357<br />
4 157 164 173 156 156 157 161 156 153 160 242 237 248 231 251 245 258 250 244 247<br />
5 147 152 147 148 148 152 148 152 149 148 167 212 170 170 176 170 170 165 164 185<br />
6 151 144 144 152 145 145 145 145 155 144 227 228 236 229 235 236 226 224 230 232<br />
7 152 151 149 153 159 148 152 148 149 148 226 221 220 222 220 222 230 223 222 217<br />
8 155 154 155 153 150 156 161 151 155 152 234 218 210 207 210 209 215 213 210 207<br />
9 144 558 145 148 146 146 144 150 144 148 214 210 213 208 194 218 210 207 209 207<br />
10 147 163 147 152 148 155 147 148 148 152 224 212 230 217 197 219 211 206 216 221<br />
1 152 152 146 151 151 148 146 146 146 148 211 212 213 208 216 216 214 210 211 209<br />
11 230 367 226 231 230 229 227 231 245 230 209 208 214 211 211 211 234 213 210 214<br />
12 151 273 150 147 147 148 152 148 148 147 210 210 216 213 211 208 214 208 210 209<br />
13 145 229 145 150 145 146 145 152 151 149 214 212 212 213 212 227 215 215 213 221<br />
14 145 203 144 144 143 145 144 145 143 148 211 208 210 212 192 211 216 209 213 223<br />
15 146 166 147 149 152 147 146 153 145 148 208 210 207 206 185 206 216 216 206 207<br />
16 152 146 146 151 149 147 153 147 147 149 206 212 214 206 213 207 210 207 213 206<br />
17 146 211 146 143 148 143 145 142 143 143 214 212 212 213 214 214 188 206 222 217<br />
18 150 160 150 151 144 144 148 145 149 145 208 207 214 208 213 207 191 208 206 208<br />
19 146 172 144 145 144 151 146 152 145 146 207 206 207 212 212 211 221 211 206 207<br />
20 147 148 148 177 148 149 146 153 147 151 208 207 209 209 207 208 211 209 210 210<br />
21 141 154 141 146 147 142 141 141 144 142 217 212 218 214 218 213 223 207 242 213<br />
22 146 171 148 151 146 146 146 145 168 145 214 211 208 207 208 213 216 209 206 207<br />
23 148 159 147 149 148 143 145 143 146 143 208 187 186 213 207 206 212 212 221 207<br />
24 155 376 157 148 147 147 151 148 157 148 208 187 189 207 207 207 208 206 208 213<br />
25 151 365 147 152 149 154 148 153 149 149 230 205 226 213 214 214 210 210 211 215<br />
26 143 532 138 144 143 149 147 144 231 145 209 208 211 208 207 186 210 209 185 208<br />
27 209 372 283 223 219 210 210 209 435 210 220 208 208 215 209 185 216 217 194 194<br />
28 177 345 173 149 151 146 147 146 156 146 210 208 211 213 208 218 208 209 221 194<br />
29 157 331 165 149 149 148 154 149 167 148 214 218 216 214 216 217 213 209 213 192<br />
30 143 316 142 142 143 144 142 148 142 145 209 208 208 208 214 207 214 209 212 224<br />
31 145 291 146 146 147 147 147 150 147 152 211 209 210 208 208 208 212 225 214 209<br />
32 145 259 146 147 173 150 145 144 147 144 209 207 209 215 209 210 209 207 208 214<br />
33 149 255 152 153 153 148 147 151 171 148 231 220 218 216 216 215 209 218 214 216<br />
34 152 239 149 156 154 148 150 149 148 148 213 209 209 209 207 207 212 209 208 209<br />
35 148 234 144 143 144 143 149 144 143 144 209 209 214 208 209 208 215 214 211 209<br />
36 150 191 189 146 145 146 153 156 146 145 211 212 210 209 208 212 208 211 210 209<br />
37 145 163 207 145 145 152 145 150 146 149 220 216 216 217 215 212 211 209 213 217<br />
38 147 171 199 151 149 148 147 148 148 150 210 209 213 207 208 212 212 208 207 208<br />
39 142 246 143 146 142 147 142 141 148 145 208 208 208 207 209 208 213 217 207 208<br />
40 153 150 148 146 154 147 149 163 147 146 209 207 207 207 209 208 207 207 189 212<br />
41 144 144 143 149 144 146 151 144 142 150 214 216 218 215 215 219 217 207 193 214<br />
42 148 147 298 147 148 150 148 176 148 144 208 208 207 212 210 193 210 212 223 212<br />
43 212 148 155 217 220 219 213 220 213 213 207 208 208 207 212 186 218 213 208 208<br />
44 154 173 193 152 144 148 147 147 152 149 209 211 213 209 208 185 212 227 207 206<br />
45 147 146 144 145 150 145 146 146 145 145 217 216 192 215 217 225 208 208 221 214<br />
46 150 151 145 146 145 145 150 147 145 145 209 209 186 207 211 214 211 210 208 208<br />
47 148 148 146 152 152 152 148 153 174 148 207 209 209 212 208 228 214 212 207 212<br />
48 142 141 144 143 142 147 144 142 143 146 213 208 237 209 208 186 212 208 207 208<br />
49 145 145 151 148 146 146 145 144 145 145 222 215 222 213 194 188 207 216 215 213<br />
50 144 158 146 148 143 143 144 148 145 143 209 207 207 211 184 212 214 210 211 212<br />
51 152 194 146 148 152 147 150 148 149 149 209 215 210 208 199 214 213 215 208 209<br />
52 157 267 149 149 149 149 152 149 149 148 212 208 208 209 207 217 209 208 212 186<br />
53 148 266 167 142 143 142 143 147 143 144 214 217 216 219 214 209 208 208 216 189<br />
54 147 157 151 145 144 150 144 145 146 154 210 214 214 209 210 212 210 210 208 227<br />
55 144 151 146 152 155 148 145 139 152 147 212 209 209 210 208 212 213 230 208 208<br />
56 153 149 150 155 151 148 150 148 148 149 212 207 208 210 207 211 208 194 208 186<br />
57 147 139 208 143 143 142 143 147 142 143 215 217 218 215 217 209 214 185 216 187<br />
58 212 147 146 214 214 213 222 218 241 217 210 209 217 210 209 212 211 186 210 207<br />
59 145 145 142 144 144 146 146 144 146 149 209 216 209 218 209 210 220 245 209 209<br />
60 153 149 153 151 148 154 148 149 154 148 213 210 213 209 207 212 209 194 208 186<br />
61 150 149 171 153 150 149 148 148 150 148 218 217 217 216 211 209 208 186 224 187<br />
62 148 144 142 149 149 142 144 145 144 145 209 211 209 209 218 213 215 186 208 222<br />
continued on next page..<br />
98
A. Appendix<br />
frame JVM tuned JVM standard options<br />
63 150 146 144 145 152 150 150 148 146 145 212 209 209 208 210 210 191 237 208 186<br />
64 145 146 146 145 145 146 146 153 146 146 212 208 207 209 208 208 186 208 207 187<br />
65 148 149 153 149 148 154 148 151 149 152 214 217 217 215 212 209 210 208 216 217<br />
66 142 214 135 145 142 141 141 141 147 142 209 209 208 208 208 214 207 186 209 211<br />
67 147 149 143 151 146 150 146 145 146 145 208 214 214 213 209 212 230 188 213 211<br />
68 147 144 144 144 149 143 147 143 143 172 213 209 208 209 210 186 210 224 207 209<br />
69 153 149 148 147 148 148 153 149 147 151 215 216 213 215 211 187 222 209 220 210<br />
70 149 149 151 149 150 150 149 153 149 153 210 208 186 209 213 208 209 215 208 210<br />
71 143 146 182 151 143 149 144 143 146 144 227 214 187 209 211 208 215 211 210 214<br />
72 144 146 212 149 146 147 144 144 150 146 212 208 221 212 208 209 208 214 209 214<br />
73 149 145 145 150 146 144 146 150 145 145 220 216 212 219 189 208 213 210 215 208<br />
74 219 147 148 214 213 213 218 214 212 212 207 209 212 208 193 217 209 207 211 211<br />
75 143 143 145 141 142 143 142 148 143 144 209 210 215 208 219 209 214 210 208 211<br />
76 146 148 150 146 146 150 146 147 146 150 213 208 208 209 208 208 208 212 207 209<br />
77 149 154 149 144 148 149 148 148 149 149 220 220 216 221 246 219 217 217 225 216<br />
78 153 152 152 164 153 152 151 152 152 151 213 217 223 214 220 220 215 216 213 228<br />
79 162 183 152 158 156 152 156 153 153 152 218 214 213 216 217 213 228 215 212 213<br />
80 149 255 149 153 157 148 152 150 149 151 218 212 215 212 217 218 213 217 216 213<br />
81 150 157 152 150 149 150 153 158 151 151 219 234 217 220 217 213 214 214 220 214<br />
82 150 159 155 150 150 162 152 151 152 151 209 212 212 217 218 218 216 213 212 218<br />
83 151 152 153 156 153 152 151 152 166 153 191 214 214 214 199 214 219 216 213 215<br />
84 150 146 145 151 148 147 147 146 153 147 195 217 213 214 190 216 217 217 217 213<br />
85 151 146 146 147 146 145 147 148 146 147 226 216 215 217 192 210 208 209 217 219<br />
86 144 145 146 144 146 144 149 147 143 145 209 210 208 215 232 217 211 213 209 214<br />
87 147 148 153 147 149 149 147 147 148 153 213 210 209 208 219 212 214 211 211 210<br />
88 148 153 213 155 148 154 149 148 150 148 212 208 208 209 207 209 208 208 208 209<br />
89 210 143 146 215 216 208 208 214 214 209 214 218 218 218 211 209 208 209 215 248<br />
90 148 145 145 146 149 145 146 149 144 144 210 211 204 209 215 218 212 209 212 213<br />
91 150 166 146 145 146 149 155 148 145 144 209 209 186 208 208 209 220 215 209 187<br />
92 149 151 152 147 146 148 147 152 146 148 212 213 189 210 207 208 208 210 210 186<br />
93 141 148 148 142 141 148 142 142 143 146 194 220 218 215 216 209 209 209 219 229<br />
94 145 217 146 149 146 148 146 146 156 146 187 209 213 212 220 217 210 212 214 210<br />
95 145 143 143 148 143 143 143 144 144 143 213 210 209 213 208 207 213 213 208 209<br />
96 151 148 147 147 153 147 149 149 146 148 207 208 208 209 208 209 208 208 208 209<br />
97 148 149 148 154 149 148 154 149 148 150 228 215 216 216 212 213 246 214 221 208<br />
98 143 144 147 141 143 144 144 148 143 146 208 212 215 209 214 214 210 208 207 211<br />
99 150 145 149 149 145 150 145 146 146 146 211 188 213 211 209 211 224 212 209 214<br />
100 145 145 146 149 146 145 145 147 151 148 209 186 208 210 208 204 209 208 208 218<br />
101 156 148 148 148 149 148 149 148 148 149 219 228 211 194 217 192 208 209 217 209<br />
102 148 143 141 144 143 147 144 144 142 144 210 209 215 191 213 192 211 212 215 216<br />
103 146 148 216 146 145 146 150 148 145 148 209 217 215 224 209 216 214 215 213 209<br />
104 145 150 149 143 143 145 144 145 144 149 210 211 209 209 208 187 220 209 237 210<br />
105 216 149 149 220 216 221 216 216 217 216 215 217 212 215 212 186 208 210 214 209<br />
106 151 150 148 152 152 148 157 150 155 148 210 213 213 208 218 202 187 209 209 219<br />
107 147 143 143 152 148 142 144 144 143 143 212 210 215 209 212 211 264 211 209 212<br />
108 146 217 146 148 145 145 150 146 144 144 209 187 209 212 209 209 222 209 209 214<br />
109 150 149 151 148 146 147 146 150 147 146 228 190 216 218 214 212 240 213 219 209<br />
110 150 152 154 148 155 154 149 149 150 153 208 210 210 224 215 225 248 210 208 225<br />
111 146 146 147 150 147 146 145 145 156 146 213 213 214 211 213 216 231 221 213 213<br />
112 154 151 149 151 151 155 151 151 152 150 216 219 212 212 211 213 223 214 214 217<br />
113 153 149 181 150 149 149 149 150 148 149 224 216 216 240 215 212 220 213 218 213<br />
114 152 156 154 152 153 152 156 154 151 154 214 214 216 213 217 218 234 213 212 218<br />
115 150 156 156 150 151 152 151 151 151 156 215 213 225 216 215 212 271 225 218 215<br />
116 152 151 147 155 152 158 154 152 152 149 218 213 213 215 213 215 333 213 215 219<br />
117 150 149 148 160 152 150 150 149 153 148 220 221 222 232 195 220 225 215 219 218<br />
118 153 150 150 150 156 151 151 150 150 150 219 217 213 217 197 218 214 213 212 219<br />
119 158 153 221 153 153 152 165 154 152 153 214 214 214 213 219 213 405 221 212 213<br />
120 149 147 152 145 149 148 146 158 212 148 213 213 217 214 211 214 213 222 219 213<br />
121 215 149 151 217 215 222 215 216 150 221 202 220 222 221 215 213 220 214 219 215<br />
122 150 148 149 153 149 147 147 147 152 148 191 211 213 214 213 224 212 212 213 219<br />
123 150 218 147 141 152 153 149 147 147 147 218 188 210 209 186 210 209 217 209 214<br />
124 153 149 147 150 148 149 149 148 148 148 214 186 211 210 186 208 211 209 213 214<br />
125 143 144 150 142 143 144 146 146 142 146 215 224 232 215 225 209 217 253 215 209<br />
126 138 149 150 146 143 146 145 146 145 150 210 212 209 213 214 214 213 209 215 216<br />
127 151 147 146 153 146 150 146 145 146 145 210 210 219 214 210 213 208 223 208 209<br />
128 149 147 147 154 149 149 148 147 176 148 213 209 208 195 208 209 209 208 210 212<br />
129 145 142 142 143 149 142 145 143 149 143 215 217 218 195 211 210 214 214 214 212<br />
130 150 146 145 146 145 145 151 153 147 145 214 208 208 187 218 215 208 204 208 216<br />
131 149 145 147 142 146 145 143 154 143 145 211 209 217 226 208 208 207 191 213 209<br />
132 147 154 147 147 147 147 146 148 148 153 214 209 210 215 210 213 215 188 212 210<br />
133 147 148 149 151 150 153 153 148 150 148 226 237 235 222 218 221 224 229 219 225<br />
134 142 142 209 148 144 145 142 142 147 142 208 214 208 208 213 214 206 208 207 214<br />
135 150 145 145 146 151 151 146 146 145 146 213 209 220 209 209 209 208 217 213 216<br />
136 151 147 148 147 146 146 152 147 214 149 213 187 213 216 209 213 210 208 214 209<br />
2 214 154 153 214 214 214 214 219 151 215 228 190 216 194 187 214 217 213 227 211<br />
99
A. Appendix<br />
A.5. Evaluation Data: Sobel Preprocessing<br />
We performed 20 test-runs with the class PreprocessingTimeTest (which is responsible<br />
for the wrong frame order). 10 runs were done with standard JVM options, 10 were<br />
done with the following tuning: -verbose:gc -Xms64m -Xmx256m -XX:NewRatio=2<br />
-XX:+UseConcMarkSweepGC.<br />
The measurement unit is milliseconds.<br />
frame JVM tuned JVM standard options<br />
3 2 2 2 2 2 3 2 2 2 2 2 2 2 2 1 2 1 2 2 5<br />
4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1<br />
5 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 0 1 1 1<br />
6 2 1 2 1 1 2 1 2 1 1 1 12 1 1 1 1 1 1 1 11<br />
7 6 5 5 5 4 5 5 5 5 4 4 5 15 4 5 5 4 5 5 4<br />
8 2 2 2 2 2 2 1 2 2 2 2 2 13 13 2 2 2 2 3 2<br />
9 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1<br />
10 1 1 0 1 1 1 0 1 1 1 1 1 0 1 1 11 1 1 1 1<br />
1 1 1 1 0 0 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1<br />
11 1 1 0 1 0 1 1 1 0 1 1 0 0 1 1 1 1 1 0 1<br />
12 1 0 1 1 1 1 1 0 1 0 1 0 1 0 1 1 1 1 1 1<br />
13 1 1 1 1 1 1 1 1 1 0 1 2 1 1 1 1 1 2 1 1<br />
14 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1<br />
15 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1<br />
16 1 3 1 1 2 2 2 1 2 6 2 1 13 2 2 1 2 2 2 2<br />
17 1 1 1 0 1 1 1 1 1 1 1 5 4 1 1 1 0 0 1 1<br />
18 2 2 1 1 2 2 2 2 1 2 3 3 2 2 3 3 3 13 3 2<br />
19 1 2 1 1 2 1 2 1 1 3 1 1 1 1 1 1 1 1 1 1<br />
20 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 12 1 1 1 1<br />
21 1 0 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0<br />
22 1 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 0 12 1<br />
23 2 1 1 2 2 2 1 2 1 1 2 1 1 2 13 2 2 1 1 1<br />
24 1 1 0 0 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1<br />
25 1 1 1 1 1 1 1 1 1 2 1 1 1 5 1 1 1 1 1 1<br />
26 1 1 0 1 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 1<br />
27 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1<br />
28 1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 0<br />
29 1 1 1 1 1 1 1 0 0 1 1 1 0 0 1 0 0 1 1 1<br />
30 0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 1<br />
31 1 0 0 1 1 1 1 1 0 0 1 0 0 1 0 1 1 1 1 1<br />
32 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 11 12 1 2 1<br />
33 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0<br />
34 1 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 0 0 0 0<br />
35 1 1 1 0 0 1 1 0 1 1 1 1 0 0 1 1 1 0 1 1<br />
36 0 1 0 0 1 0 0 1 1 1 1 1 0 1 1 0 1 12 0 1<br />
37 4 1 1 1 0 1 0 0 1 1 0 0 1 0 1 1 1 0 0 0<br />
38 1 1 0 0 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1<br />
39 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1<br />
40 0 1 1 0 0 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1<br />
41 1 1 1 1 0 0 1 1 1 0 1 0 5 0 1 1 0 1 1 0<br />
42 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 12 1 1 1 1<br />
43 1 1 1 0 0 0 0 0 0 1 0 1 1 1 11 1 1 1 1 1<br />
44 1 1 1 1 0 1 1 0 0 1 1 12 1 1 1 1 1 0 0 1<br />
45 1 1 1 1 1 0 1 1 1 0 1 0 0 1 1 1 1 0 1 1<br />
46 1 1 1 1 1 1 0 1 1 1 1 2 1 1 1 1 0 1 1 1<br />
47 1 0 0 1 1 1 1 1 1 0 1 2 1 1 0 1 1 1 1 1<br />
48 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1<br />
49 1 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 12 1 0 0<br />
50 1 1 1 1 1 0 1 0 1 0 1 1 1 1 1 0 1 1 1 1<br />
51 1 1 1 3 1 0 1 3 3 3 1 1 1 0 1 0 0 1 0 1<br />
52 1 3 2 1 3 3 3 1 1 8 0 1 1 1 1 1 0 1 0 0<br />
53 1 0 0 1 1 1 1 0 1 1 1 1 1 5 1 0 0 0 5 1<br />
54 1 1 0 0 0 1 0 1 1 0 0 1 0 0 1 1 1 1 0 1<br />
55 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1<br />
56 1 1 1 0 0 1 1 0 0 1 0 1 1 0 1 1 1 1 1 1<br />
57 1 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1<br />
58 1 1 1 0 1 0 1 1 1 1 1 1 12 1 1 0 1 1 1 0<br />
59 1 0 1 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0<br />
60 1 1 1 1 2 1 1 1 1 1 1 0 0 1 1 1 1 1 1 2<br />
61 1 0 1 1 1 1 1 1 1 0 0 1 0 1 2 0 12 1 1 0<br />
62 1 1 0 1 1 1 1 1 1 0 1 1 0 0 13 0 1 1 0 11<br />
continued on next page..<br />
100
A. Appendix<br />
frame JVM tuned JVM standard options<br />
63 1 1 1 1 1 0 0 1 1 1 0 0 1 1 0 0 1 0 1 1<br />
64 9 10 6 6 10 6 6 6 5 6 3 3 3 9 4 14 4 3 3 3<br />
65 0 1 0 1 0 0 1 1 1 1 1 1 1 0 1 0 1 1 0 0<br />
66 1 1 1 1 1 1 2 1 2 2 1 1 1 0 1 1 1 1 1 0<br />
67 1 1 1 1 1 1 0 1 0 1 1 0 1 0 0 1 0 1 1 1<br />
68 1 1 1 1 0 0 1 1 0 0 1 1 1 0 1 1 0 1 0 0<br />
69 3 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2<br />
70 1 1 1 1 1 1 1 1 1 0 5 1 1 1 1 1 1 1 1 0<br />
71 1 1 0 0 0 0 0 1 1 0 1 0 0 1 1 1 0 1 1 1<br />
72 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 4 1 1 0 0<br />
73 0 1 0 0 1 1 1 0 1 1 0 4 1 5 1 1 1 1 1 0<br />
74 1 1 0 0 1 0 0 0 1 1 4 0 1 1 1 1 1 1 16 4<br />
75 1 1 1 1 0 1 0 1 1 0 0 1 1 0 15 1 4 1 4 0<br />
76 1 0 1 0 0 0 1 1 1 0 1 1 4 1 1 1 1 1 1 0<br />
77 1 0 0 0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1<br />
78 1 1 1 0 1 0 1 0 1 1 1 1 2 1 1 1 0 1 1 1<br />
79 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 1 1 0 1<br />
80 6 1 1 1 1 1 0 0 1 0 1 1 1 1 0 1 1 0 0 1<br />
81 0 1 0 0 1 0 1 0 0 1 0 1 1 0 0 1 1 1 1 1<br />
82 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 0 0 1 0<br />
83 1 1 1 1 1 1 1 0 1 0 1 1 11 1 1 1 1 0 0 1<br />
84 1 0 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 1 0 1<br />
85 1 1 1 0 1 1 0 1 0 0 0 0 1 1 1 0 1 0 0 0<br />
86 0 0 0 0 1 1 0 1 0 0 0 0 1 1 1 1 1 0 1 1<br />
87 1 0 1 1 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1<br />
88 1 1 1 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 0 1<br />
89 1 0 1 1 0 1 1 1 1 1 1 14 1 1 1 0 1 0 1 1<br />
90 0 1 0 0 0 1 1 1 1 1 0 12 1 1 1 1 0 1 1 1<br />
91 0 0 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 0 1<br />
92 1 1 0 0 1 1 1 0 1 1 1 1 1 1 0 0 0 0 0 1<br />
93 1 0 0 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 1 0<br />
94 1 1 1 0 1 0 1 1 1 1 1 0 0 1 0 1 1 0 1 1<br />
95 2 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0 1<br />
96 0 1 1 1 1 1 1 0 1 1 1 11 0 1 1 1 0 1 1 0<br />
97 0 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0<br />
98 1 0 1 2 0 0 0 1 1 1 1 0 0 0 2 0 1 1 1 0<br />
99 1 1 0 0 1 1 0 0 0 1 1 0 1 1 1 1 1 1 11 1<br />
100 1 0 0 0 1 1 0 1 0 0 0 1 1 1 1 1 1 0 12 1<br />
101 0 1 0 1 0 1 1 0 0 1 1 0 1 0 1 1 0 1 1 1<br />
102 1 1 1 0 0 0 1 0 0 0 1 0 1 1 0 4 0 11 1 1<br />
103 0 0 1 1 1 1 0 1 1 0 1 3 1 14 0 0 0 3 0 1<br />
104 1 1 1 0 1 0 1 1 0 0 4 1 0 1 0 1 1 1 3 0<br />
105 0 1 1 1 0 0 1 1 1 1 0 0 3 0 7 1 4 0 0 1<br />
106 1 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 1 1 0 1<br />
107 1 2 1 1 1 2 1 1 1 2 1 2 2 2 2 1 1 1 1 1<br />
108 0 0 1 1 1 1 1 1 0 0 1 0 17 0 1 1 1 1 0 1<br />
109 1 1 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1<br />
110 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 1 0 1<br />
111 0 2 2 1 2 1 1 1 0 1 1 1 1 0 0 0 0 1 1 1<br />
112 0 1 0 0 1 1 0 1 0 0 1 1 0 0 1 0 0 1 1 1<br />
113 1 1 1 0 0 1 0 1 1 0 1 0 1 0 3 0 2 1 1 1<br />
114 0 1 0 0 0 1 1 0 1 0 1 0 1 1 1 0 0 1 1 2<br />
115 1 1 1 1 1 1 0 1 0 1 1 0 1 1 1 0 0 1 1 0<br />
116 1 0 1 1 1 1 1 1 1 1 0 3 1 1 1 1 0 1 1 1<br />
117 1 1 0 0 0 0 1 1 1 0 2 0 0 1 0 1 0 2 13 1<br />
118 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0 2 1 0 0<br />
119 1 0 1 1 1 1 0 1 0 0 1 0 1 0 1 1 1 1 1 0<br />
120 0 0 0 1 0 1 1 0 0 1 1 0 1 1 0 1 1 1 0 2<br />
121 1 1 1 1 1 9 1 0 0 1 1 0 0 0 3 0 3 1 1 1<br />
122 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1 0 0 1 1 0<br />
123 0 1 0 1 0 0 1 1 0 0 1 0 1 1 1 1 2 0 1 1<br />
124 0 1 1 0 0 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1<br />
125 1 0 0 1 0 0 1 1 0 1 1 1 2 0 1 1 1 1 0 0<br />
126 1 1 0 1 0 0 1 0 0 1 1 1 1 1 0 1 1 1 1 1<br />
127 3 1 0 1 0 1 0 0 1 1 0 0 1 1 1 2 1 1 0 1<br />
128 1 0 0 1 1 1 0 0 1 1 0 4 1 2 0 1 1 2 1 11<br />
129 0 0 0 1 0 1 0 0 1 0 2 3 1 1 1 1 1 0 14 0<br />
130 0 1 0 1 1 1 0 0 1 0 1 0 3 0 3 0 2 1 0 1<br />
131 1 0 1 0 0 1 0 1 1 1 0 1 12 1 1 1 1 1 1 1<br />
132 1 0 0 1 1 0 0 1 0 0 0 12 0 1 1 0 0 0 1 1<br />
133 0 0 1 1 0 0 1 1 1 0 0 1 1 0 1 1 1 1 1 1<br />
134 1 1 0 1 1 1 1 0 1 0 0 0 1 0 1 1 1 1 1 1<br />
135 1 1 0 1 1 0 0 1 1 0 1 0 0 1 0 0 11 0 1 1<br />
136 3 0 1 1 1 1 1 1 1 0 1 1 0 0 1 1 0 11 0 1<br />
2 1 1 1 1 1 0 0 0 1 1 0 0 1 0 1 1 0 1 0 0<br />
101