01.03.2014 Views

Visual Analytics of Patterns in High-Dimensional Data - Fachbereich ...

Visual Analytics of Patterns in High-Dimensional Data - Fachbereich ...

Visual Analytics of Patterns in High-Dimensional Data - Fachbereich ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Visual</strong> <strong>Analytics</strong> <strong>of</strong> <strong>Patterns</strong> <strong>in</strong><br />

<strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

Dissertation zur Erlangung des akademischen Grades<br />

e<strong>in</strong>es Dr. rer. nat.<br />

vorgelegt von<br />

Andrada Tatu<br />

an der<br />

Mathematisch-Naturwissenschaftliche Sektion<br />

<strong>Fachbereich</strong> Informatik und Informationswissenschaft<br />

Tag der mündlichen Prüfung: 12 Juli 2013<br />

Referenten:<br />

Pr<strong>of</strong>. Dr. Daniel A. Keim, Universität Konstanz<br />

Pr<strong>of</strong>. Dr. Oliver Deussen, Universität Konstanz<br />

Pr<strong>of</strong>. Dr. Giuseppe Santucci, Sapienza Università di Roma


Pentru păr<strong>in</strong>ţii mei iubitori.


Acknowledgements<br />

This dissertation is the most important milestone <strong>in</strong> my academic career. One <strong>of</strong> the<br />

joys <strong>of</strong> completion is to look back and remember all the mentors, friends, collaborators,<br />

colleagues and family who have guided, supported, and <strong>in</strong>spired me along this fulfill<strong>in</strong>g<br />

journey.<br />

First and foremost, I would like to express my deep appreciation to my advisor, Pr<strong>of</strong>essor<br />

Dr. Daniel Keim, who has stirred my <strong>in</strong>terest <strong>in</strong> <strong>Visual</strong> <strong>Analytics</strong> early on <strong>in</strong> my<br />

studies. He has not only been a strong supporter <strong>of</strong> my work, but he has also allowed<br />

me great freedom to develop my thesis. Without his guidance and persistent help, this<br />

dissertation would not have been possible. As a part <strong>of</strong> his group, I was able to perfect<br />

my research skills and draw appropriate conclusions.<br />

In addition, I would like to thank my committee members, Pr<strong>of</strong>essor Dr. Oliver Deussen<br />

and Pr<strong>of</strong>essor Dr. Giuseppe Santucci for their encourag<strong>in</strong>g and <strong>in</strong>sightful comments and<br />

their analytic questions that prompted me shape my ideas comprehensively.<br />

I am especially grateful to Dr. Enrico Bert<strong>in</strong>i and Dr. Tobias Schreck, who closely<br />

accompanied my research dur<strong>in</strong>g these years and motivated me to seek perfect solutions.<br />

Many <strong>of</strong> the results reported here present jo<strong>in</strong>t e orts. Their recommendations and <strong>in</strong>structions<br />

have enabled me to assemble and f<strong>in</strong>ish the dissertation e ectively.<br />

I would also like to express my gratitude to my collaborators for their guidance and<br />

<strong>in</strong>spirations <strong>in</strong> these past years, and especially name Ines Färber, Pr<strong>of</strong>essor Dr. Thomas<br />

Seidl, Pr<strong>of</strong>essor Dr. Tamara Munzner, Dr. Michael Sedlmair, Pr<strong>of</strong>essor Dr. Melanie Tory,<br />

Georgia Albuquerque, Dr. Mart<strong>in</strong> Eisemann, Dr. Jörn Schneidew<strong>in</strong>d and Dr. Peter Bak.<br />

I am grateful to my colleagues for creat<strong>in</strong>g a pleasant work<strong>in</strong>g atmosphere. A special<br />

thank you goes to Svenja Simon (for her friendship and tricky R programm<strong>in</strong>g sessions),<br />

Miloš Krstajić (for support<strong>in</strong>g all my moods and encourag<strong>in</strong>g me throughout these years),<br />

Dr. Florian Mansmann (for gett<strong>in</strong>g me <strong>in</strong>to the group and becom<strong>in</strong>g a lovely friend),<br />

David Spretke (for accompany<strong>in</strong>g me from the first day <strong>of</strong> my Bachelor studies to the<br />

last <strong>of</strong> my doctoral work as a friend and hardwork<strong>in</strong>g colleague), Dr. Andreas Sto el (for<br />

always keep<strong>in</strong>g his door open and the helpful debugg<strong>in</strong>g sessions), Christian Rohrdantz<br />

(for helpful suggestions and mental support dur<strong>in</strong>g the writ<strong>in</strong>g phase and preparation<br />

<strong>of</strong> my defense talk), Dr. Leishi Zhang (for the great collaboration dur<strong>in</strong>g the ClustNails<br />

project), Dr. Daniela Oelke (for <strong>in</strong>itial paper writ<strong>in</strong>g suggestions and provid<strong>in</strong>g me the<br />

thesis template), and Sab<strong>in</strong>e Kuhr (for her support <strong>in</strong> adm<strong>in</strong>istrative work). I am very<br />

happy that, <strong>in</strong> many cases, my friendship with all <strong>of</strong> you has enriched my time beyond<br />

our shared time <strong>in</strong> the o ce.<br />

Special thanks goes to my student assistant Fabian Maaß, who implemented parts <strong>of</strong><br />

the subspace visualization system and whose creativity shaped the research outcome.


vi<br />

This acknowledgement would not be complete without extend<strong>in</strong>g my s<strong>in</strong>cere thanks<br />

to our DBVIS support team, which really made my life easier by provid<strong>in</strong>g fast, anytime<br />

technical support, computational power, and storage opportunities for my projects. I<br />

would like to specially mention Florian Sto el and Juri Buchmüller.<br />

Special thanks go to Mrs. Anna Dowden-Williams from the Academic Sta Development<br />

for pro<strong>of</strong>read<strong>in</strong>g most <strong>of</strong> my research papers and this thesis, which has pr<strong>of</strong>oundly<br />

improved its overall composition.<br />

My deepest appreciation and gratitude goes, however, to my family who has encouraged<br />

my studies from the start and provided me with the moral and emotional support<br />

needed through the entire process. They believed <strong>in</strong> my dream and helped me to fulfill it.<br />

I will be forever grateful for your unconditional love and support.<br />

I gratefully acknowledge also the f<strong>in</strong>ancial support received from the German Research<br />

Society (DFG) under the research grant DFG-611 with<strong>in</strong> the DFG Priority Program<br />

“Scalable <strong>Visual</strong> <strong>Analytics</strong>: Interactive <strong>Visual</strong> Analysis Systems <strong>of</strong> Complex Information<br />

Spaces” (SPP 1335). I also recognize be<strong>in</strong>g an associated PhD student to the GK-1042<br />

(PhD Graduate Program) “Explorative Analysis and <strong>Visual</strong>ization <strong>of</strong> Large Information<br />

Spaces”.


Abstract<br />

Due to the technological progress over the last decades, today’s scientific and commercial<br />

applications are capable <strong>of</strong> generat<strong>in</strong>g, stor<strong>in</strong>g, and process<strong>in</strong>g, massive amounts <strong>of</strong> data<br />

sets. This <strong>in</strong>fluences the type <strong>of</strong> data generated, which <strong>in</strong> turn means that with each<br />

data entry di erent aspects are comb<strong>in</strong>ed and stored <strong>in</strong>to one common database. Often<br />

the describ<strong>in</strong>g attributes are numeric; we name data with more than a handful attributes<br />

(dimensions) high-dimensional. Hav<strong>in</strong>g to make use <strong>of</strong> these types <strong>of</strong> data archives provides<br />

new challenges to analysis techniques.<br />

The work <strong>of</strong> this thesis centers around the question <strong>of</strong> f<strong>in</strong>d<strong>in</strong>g <strong>in</strong>terest<strong>in</strong>g patterns<br />

(mean<strong>in</strong>gful <strong>in</strong>formation) <strong>in</strong> high-dimensional data sets. This task is highly challeng<strong>in</strong>g<br />

because <strong>of</strong> the so called curse <strong>of</strong> dimensionality, express<strong>in</strong>g that when dimensionality<br />

<strong>in</strong>creases the data becomes sparse. This phenomena disturbs standard analysis techniques.<br />

Automatic techniques have to deal with the data complexity not only <strong>in</strong>creas<strong>in</strong>g their<br />

runtime, but also vitiat<strong>in</strong>g their computation functions (like distance functions). Moreover,<br />

explor<strong>in</strong>g these data sets visually is h<strong>in</strong>dered by the high number <strong>of</strong> dimensions that have<br />

to be displayed on the two dimensional screen space.<br />

This thesis is motivated by the idea that search<strong>in</strong>g for <strong>in</strong>terest<strong>in</strong>g patterns <strong>in</strong> this<br />

k<strong>in</strong>d <strong>of</strong> data can be done through a mixed approach <strong>of</strong> automation, visualization, and<br />

<strong>in</strong>teraction. The amount <strong>of</strong> patterns a visualization conta<strong>in</strong>s can be measured by so called<br />

quality metrics. These automated functions can then filter the high number <strong>of</strong> highdimensional<br />

visualizations and present to the user a pre-filtered good subset for further<br />

<strong>in</strong>vestigation. We propose quality metrics for scatterplots and parallel coord<strong>in</strong>ates focus<strong>in</strong>g<br />

on di erent user tasks like identify<strong>in</strong>g clusters and correlations. We also evaluate these<br />

measures with regard to (1) their ability to identify clusters <strong>in</strong> a variety <strong>of</strong> real and<br />

synthetic datasets; (2) their correlation with human perception <strong>of</strong> clusters <strong>in</strong> scatterplots.<br />

A thorough discussion <strong>of</strong> results follows reflect<strong>in</strong>g the impact on directions for future<br />

research.<br />

As quality metrics were developed for a large number <strong>of</strong> di erent high-dimensional<br />

visualization techniques, we present our reflections on how these methods are related to<br />

each other and how the approach can be developed further. For this purpose, we provide<br />

an overview <strong>of</strong> approaches that use quality metrics <strong>in</strong> high-dimensional data visualization<br />

and propose a systematization based on a comprehensive literature review.<br />

In high-dimensional data, patterns exist <strong>of</strong>ten only <strong>in</strong> a subset <strong>of</strong> the dimensions.<br />

Subspace cluster<strong>in</strong>g techniques aim at f<strong>in</strong>d<strong>in</strong>g these subspaces where clusters exist and<br />

which might otherwise be hidden if a traditional cluster<strong>in</strong>g algorithm is applied. While<br />

subspace cluster<strong>in</strong>g approaches tackle the sparsity problem <strong>in</strong> high-dimensional data well,<br />

design<strong>in</strong>g e ective visualization to help analyz<strong>in</strong>g the cluster<strong>in</strong>g result is not trivial. In<br />

addition to the cluster membership <strong>in</strong>formation, the relevant sets <strong>of</strong> dimensions and the<br />

overlaps <strong>of</strong> memberships and dimensions need to also be considered. Although, a number<br />

<strong>of</strong> techniques (for example, scatterplots, heat maps, dendrograms, hierarchical parallel<br />

coord<strong>in</strong>ates) exist for visualiz<strong>in</strong>g traditional cluster<strong>in</strong>g results, little research has been<br />

done for visualiz<strong>in</strong>g subspace cluster<strong>in</strong>g results. Moreover, while extensive research has<br />

been carried out with regard to design<strong>in</strong>g subspace cluster<strong>in</strong>g algorithms, surpris<strong>in</strong>gly<br />

little attention has been paid to the develop<strong>in</strong>g <strong>of</strong> e ective visualization tools analyz<strong>in</strong>g the


viii<br />

cluster<strong>in</strong>g result. Appropriate visualization techniques will not only help <strong>in</strong> monitor<strong>in</strong>g the<br />

cluster<strong>in</strong>g process but, with special m<strong>in</strong><strong>in</strong>g techniques, they could also enable the doma<strong>in</strong><br />

expert to guide and even to steer the subspace cluster<strong>in</strong>g process to reveal the patterns <strong>of</strong><br />

<strong>in</strong>terest. To this goal, we envision a concept that comb<strong>in</strong>es subspace cluster<strong>in</strong>g algorithms<br />

and <strong>in</strong>teractive scalable visual exploration techniques. This work <strong>in</strong>cludes the task <strong>of</strong><br />

comparative visualization and feedback guided computation <strong>of</strong> alternative cluster<strong>in</strong>gs.


Zusammenfassung<br />

Bed<strong>in</strong>gt durch den technologischen Fortschritt der letzten Jahrzehnte s<strong>in</strong>d heutige kommerzielle<br />

Applikationen <strong>in</strong> der Lage, riesige Datenmengen zu erzeugen, zu speichern und<br />

zu verarbeiten. Diese Entwicklung bee<strong>in</strong>flusst auch die Natur der erzeugten Daten, d.h.<br />

dass für jeden Datene<strong>in</strong>trag unterschiedliche Aspekte <strong>in</strong> der gleichen Datenbank gespeichert<br />

werden. Oft s<strong>in</strong>d die beschreibenden Attribute numerisch. Datensätze, die mehr<br />

als fünf solcher Attribute (Dimensionen) be<strong>in</strong>halten, nenne ich hochdimensional. Der<br />

wertbr<strong>in</strong>gende Gebrauch solcher Datenarchive br<strong>in</strong>gt neue Herausforderungen an Analysetechniken<br />

mit sich.<br />

Die vorliegende Dissertation bearbeitet die Fragestellung, wie <strong>in</strong>teressante Muster (bedeutende<br />

Information) <strong>in</strong> hochdimensionalen Räumen gefunden werden können. Diese<br />

Aufgabenstellung ist durch das Problem des Fluches der <strong>Dimensional</strong>ität äußerst herausfordernd.<br />

Dieses Problem besagt, dass Daten im hochdimensionalen Raum spärlich<br />

vorkommen. Herkömmliche Analysetechniken werden dadurch bee<strong>in</strong>trächtigt. Automatische<br />

Methoden müssen die Datenkomplexität nicht nur ihre Laufzeit, sondern auch ihre<br />

Berechnungsfunktionen (z.B. Distanzfunktionen) betre end, e<strong>in</strong>beziehen. Außerdem wird<br />

die visuelle Exploration dieser Daten durch die Zweidimensionalität der Darstellungen<br />

bee<strong>in</strong>trächtigt.<br />

Diese Dissertation stützt sich auf das Konzept, dass die Suche nach <strong>in</strong>teressanten<br />

Mustern <strong>in</strong> hochdimensionalen Datenmengen mit e<strong>in</strong>em komb<strong>in</strong>ierten Ansatz von automatischen,<br />

visuellen und <strong>in</strong>teraktiven Methoden durchgeführt werden kann. Die Ausprägung<br />

der Muster e<strong>in</strong>er <strong>Visual</strong>isierung kann durch sogenannte Qualitätsmaße gemessen werden.<br />

Durch diese automatischen Funktionen kann die große Menge an hochdimensionalen <strong>Visual</strong>isierungen<br />

e<strong>in</strong>gegrenzt und dem Benutzer e<strong>in</strong>e ausgewählte Menge zur weiteren Untersuchung<br />

zur Verfügung gestellt werden. Ich schlage Qualitätsmaße für Scatterplots<br />

und Parallele Koord<strong>in</strong>aten vor, die sich auf unterschiedliche Aufgaben, wie die Identifikation<br />

von Gruppen oder Korrelationen, konzentrieren. Zusätzlich werden diese Techniken<br />

bezüglich (1) ihrer Fähigkeit Cluster <strong>in</strong> unterschiedlichen realen und synthetischen<br />

Datensätzen und (2) ihrer Korrelation mit der menschlichen Wahrnehmung untersucht.<br />

Der ausführlichen Diskussion dieser Resultate folgen Überlegungen für die zukünftige<br />

Forschung.<br />

Da viele verschiedene Qualitätsmaße für e<strong>in</strong>e Reihe weiterer hochdimensionaler <strong>Visual</strong>isierungen<br />

entwickelt wurden, werde ich Vorschläge für deren Vernetzung und Weiterentwicklung<br />

vorstellen. Hierfür wird e<strong>in</strong>e Übersicht über die verschiedenen Ansätze erstellt,<br />

welcher e<strong>in</strong>e Systematisierung zugrunde liegt, die aufgrund e<strong>in</strong>er umfassenden Literaturauswertung<br />

zustande kam.<br />

Im hochdimensionalen Raum existieren manche Muster nur <strong>in</strong> verschiedenen Unterräumen<br />

des Datenraumes. Subspace Cluster<strong>in</strong>g Algorithmen wurden entwickelt, um Unterräume<br />

zu f<strong>in</strong>den <strong>in</strong> denen Cluster existieren, die durch traditionelle Cluster<strong>in</strong>g Algorithmen<br />

nicht gefunden werden würden. Obwohl diese Algorithmen spärlich mit Daten<br />

besetzte, hochdimensionale Räume gut explorieren können, ist das Entwickeln von e ektiven<br />

<strong>Visual</strong>isierungstechniken, um diese Cluster<strong>in</strong>gresultate zu analysieren, nicht trivial.<br />

Zusätzlich zu der Clusterzugehörigkeit von Elementen müssen die relevanten Attributmengen<br />

e<strong>in</strong>es Clusters und die Objekt- und Dimensionsüberlappungen von Subspaceclus-


x<br />

tern dargestellt werden. Auch wenn e<strong>in</strong>e Reihe von Techniken für die <strong>Visual</strong>isierung<br />

von traditionellen Cluster<strong>in</strong>g Resultaten existiert (z.B. Scatterplots, Heatmaps, Dendrogramme,<br />

hierarchische Parallele Koord<strong>in</strong>aten) gibt es nur wenige Ansätze, um das Resultat<br />

von Subspace Cluster<strong>in</strong>g Algorithmen zu visualisieren. Außerdem wurden bisher<br />

erstaunlich wenige Ansätze vorgestellt, die e<strong>in</strong>e visuelle Analyse der Subspace Cluster<strong>in</strong>g<br />

Ergebnisse unterstützen können, obwohl im Bereich der Subspace Cluster<strong>in</strong>g Algorithmen<br />

viel Forschung betrieben wurde. Angemessene <strong>Visual</strong>isierungstechniken, die<br />

von speziellen Methoden zur Extraktion von Informationen unterstützt werden, würden<br />

nicht nur die Nachverfolgung der Cluster<strong>in</strong>g Ergebnisse ermöglichen, sondern auch Fachleuten<br />

dabei helfen, den Subspace Cluster<strong>in</strong>g Prozess so zu steuern, dass relevante Muster<br />

zum Vorsche<strong>in</strong> kommen. Dieses Ziel vor Augen stelle ich e<strong>in</strong> Konzept vor, das Subspace<br />

Cluster<strong>in</strong>g Algorithmen mit <strong>in</strong>teraktiven skalierbaren <strong>Visual</strong>isierungen komb<strong>in</strong>iert. Me<strong>in</strong>e<br />

Ansätze widmen sich deshalb der Aufgabe der <strong>Visual</strong>isierung zum Vergleich von alternativen<br />

Clustergruppen, die durch Nutzerfeedback gesteuert werden.


Contents<br />

1 Introduction 1<br />

1.1 Need for <strong>Visual</strong> Interactive <strong>Data</strong> Exploration . . . . . . . . . . . . . . . . . 1<br />

1.2 Contributions <strong>of</strong> the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 4<br />

1.3 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br />

2 <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> Analysis 11<br />

2.1 Basic Techniques for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> Analysis . . . . . . . . . . . . 12<br />

2.1.1 Common Challenges with <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> . . . . . . . . . . 12<br />

2.1.2 Feature Selection and Feature Extraction . . . . . . . . . . . . . . . 12<br />

2.2 Information <strong>Visual</strong>ization Techniques for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> . . . . . . 13<br />

2.2.1 Information <strong>Visual</strong>ization Techniques . . . . . . . . . . . . . . . . . 13<br />

2.2.2 Limitations while <strong>Visual</strong>iz<strong>in</strong>g <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> . . . . . . . . 16<br />

2.3 Automated Techniques for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> . . . . . . . . . . . . . . 17<br />

2.3.1 <strong>Data</strong> M<strong>in</strong><strong>in</strong>g Techniques for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> . . . . . . . . . 17<br />

2.3.2 Quality Measures for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>izations . . . . . 19<br />

2.4 <strong>Visual</strong> <strong>Analytics</strong> for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> . . . . . . . . . . . . . . . . . . 22<br />

2.4.1 <strong>Visual</strong> Interactive Systems for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> Analysis . . . 22<br />

2.4.2 Subspace Cluster Analysis and <strong>Visual</strong>ization . . . . . . . . . . . . . 26<br />

3 Quality Measures based <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> 29<br />

3.1 Quality Measures for Scatterplots and Parallel Coord<strong>in</strong>ates . . . . . . . . . 30<br />

3.1.1 Overview and Problem Description . . . . . . . . . . . . . . . . . . . 30<br />

3.1.2 Quality Measures for Scatterplots with Unclassified <strong>Data</strong> . . . . . . 32<br />

3.1.3 Quality Measures for Scatterplots with Classified <strong>Data</strong> . . . . . . . . 34<br />

3.1.4 Quality Measures for Parallel Coord<strong>in</strong>ates with Unclassified <strong>Data</strong> . . 38<br />

3.1.5 Quality Measures for Parallel Coord<strong>in</strong>ates with Classified <strong>Data</strong> . . . 40<br />

3.1.6 Application on Real <strong>Data</strong> Sets . . . . . . . . . . . . . . . . . . . . . 41<br />

3.1.7 Evaluation <strong>of</strong> the Measures’ Performance Us<strong>in</strong>g Synthetic <strong>Data</strong> . . . 49<br />

3.1.8 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . 53<br />

3.2 Quality Measures and Human Perception – An Empirical Study . . . . . . . 54<br />

3.2.1 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54<br />

3.2.2 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 57<br />

3.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59<br />

3.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62<br />

3.2.5 Guidel<strong>in</strong>es . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63<br />

3.2.6 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . 63<br />

4 A Systematization <strong>of</strong> Quality Metrics <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization<br />

65<br />

4.1 Quality Metrics <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization . . . . . . . . . . . 66<br />

4.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69<br />

4.1.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70


xii<br />

Contents<br />

4.1.3 Quality Metrics Pipel<strong>in</strong>e . . . . . . . . . . . . . . . . . . . . . . . . . 71<br />

4.1.4 Systematic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 74<br />

4.1.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79<br />

4.1.6 F<strong>in</strong>d<strong>in</strong>gs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84<br />

4.1.7 Directions for Further Research . . . . . . . . . . . . . . . . . . . . . 85<br />

4.1.8 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86<br />

4.1.9 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . 86<br />

4.2 <strong>Visual</strong> Cluster Separation Factors: Sketch<strong>in</strong>g a Taxonomy . . . . . . . . . . 87<br />

4.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87<br />

4.2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88<br />

4.2.3 <strong>Visual</strong> Cluster Separation Taxonomy . . . . . . . . . . . . . . . . . . 89<br />

4.2.4 Discussion and Further Research . . . . . . . . . . . . . . . . . . . . 90<br />

5 <strong>Visual</strong> Subspace Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> 93<br />

5.1 <strong>Visual</strong> Exploration for Subspace Cluster<strong>in</strong>g . . . . . . . . . . . . . . . . . . 94<br />

5.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94<br />

5.1.2 Subspace Cluster<strong>in</strong>g Algorithms . . . . . . . . . . . . . . . . . . . . 96<br />

5.1.3 Task Def<strong>in</strong>ition and Design Space for <strong>Visual</strong> Subspace Cluster Analysis 99<br />

5.1.4 The ClustNails System . . . . . . . . . . . . . . . . . . . . . . . . . . 101<br />

5.1.5 Use Case and System Comparison . . . . . . . . . . . . . . . . . . . 106<br />

5.1.6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . 109<br />

5.2 <strong>Visual</strong> <strong>Analytics</strong> <strong>of</strong> Subspace Search . . . . . . . . . . . . . . . . . . . . . . 110<br />

5.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110<br />

5.2.2 Subspace Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112<br />

5.2.3 Proposed Analytical Workflow . . . . . . . . . . . . . . . . . . . . . 113<br />

5.2.4 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120<br />

5.2.5 Discussion and Possible Extensions . . . . . . . . . . . . . . . . . . . 124<br />

5.2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127<br />

6 Conclusion and Future Work 129<br />

6.1 Summary <strong>of</strong> Contributions and Future Work . . . . . . . . . . . . . . . . . 129<br />

List <strong>of</strong> Figures 133<br />

List <strong>of</strong> Tables 143<br />

A Appendix 145<br />

A.1 Orig<strong>in</strong>al <strong>Data</strong> Dimensions for Used <strong>Data</strong> Sets . . . . . . . . . . . . . . . . . 145<br />

A.2 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149<br />

A.2.1 General Questions Form . . . . . . . . . . . . . . . . . . . . . . . . . 149<br />

A.2.2 Experiment Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152<br />

A.2.3 Additional Experiment Results . . . . . . . . . . . . . . . . . . . . . 155<br />

A.3 Quality Metrics Pipel<strong>in</strong>es for the Literature Review . . . . . . . . . . . . . . 156<br />

A.4 Hierarchical Group<strong>in</strong>g <strong>of</strong> Interest<strong>in</strong>g Subspaces . . . . . . . . . . . . . . . . 162<br />

Bibliography 163


1<br />

Introduction<br />

Contents<br />

„Everybody gets so much <strong>in</strong>formation all day long<br />

that they lose their common sense.”<br />

Gertrude Ste<strong>in</strong><br />

1.1 Need for <strong>Visual</strong> Interactive <strong>Data</strong> Exploration . . . . . . . . . . 1<br />

1.2 Contributions <strong>of</strong> the Thesis . . . . . . . . . . . . . . . . . . . . . 4<br />

1.3 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br />

1.1 Need for <strong>Visual</strong> Interactive <strong>Data</strong> Exploration<br />

T<br />

oday data is produced everywhere - everyth<strong>in</strong>g is recorded from production processes<br />

<strong>in</strong> the <strong>in</strong>dustry to employees work<strong>in</strong>g behavior and their personal data. Even animals<br />

are equipped with sensors and all their movements are recorded over long periods <strong>of</strong> time,<br />

click behavior <strong>of</strong> <strong>in</strong>ternet users is traced, or supermarket purchases are stored for later<br />

analysis. S<strong>in</strong>ce today’s technology allows for <strong>in</strong>expensive and abundant storage space,<br />

there will even be more data stored <strong>in</strong> the near future. At the same time, these advantages<br />

reveal the problem <strong>of</strong> how to handle the data most e ectively. The gap between the<br />

generated data and the understand<strong>in</strong>g <strong>of</strong> it <strong>in</strong>creases [154], which also poses a challenge<br />

for analysis techniques, e.g. it is di cult to filter and extract relevant <strong>in</strong>formation s<strong>in</strong>ce<br />

not only the volume <strong>in</strong>creases, but also the complexity.<br />

<strong>Visual</strong>ization has long been used as an e ective tool to explore and make sense <strong>of</strong> data,<br />

especially when analysts need to generate hypotheses about the <strong>in</strong>formation that is hidden<br />

<strong>in</strong> the data. While some techniques and commercial products have proven to be useful <strong>in</strong><br />

provid<strong>in</strong>g e ective solutions, there are still modern databases that can store data <strong>of</strong> such<br />

complexities that go well beyond the limits <strong>of</strong> human understand<strong>in</strong>g.<br />

The goal <strong>of</strong> this thesis is pattern f<strong>in</strong>d<strong>in</strong>g <strong>in</strong> high-dimensional or multidimensional data.<br />

The methods presented here work with numerical data sets, with a large number <strong>of</strong> objects,<br />

and a large number <strong>of</strong> dimensions, also called attributes. Depend<strong>in</strong>g on the application<br />

area, a large number <strong>of</strong> objects can already start at hundreds and go up to thousands. The<br />

same is true for the describ<strong>in</strong>g attributes, or features <strong>of</strong> the objects. In this work we call<br />

high-dimensional data, all data sets with more than hundred objects and more than ten<br />

dimensions. An example <strong>of</strong> analysis tasks based on a costumer database will be described<br />

later <strong>in</strong> this section.<br />

Classical data exploration requires the user to f<strong>in</strong>d <strong>in</strong>terest<strong>in</strong>g phenomena <strong>in</strong> the data


2 Chapter 1. Introduction<br />

<strong>in</strong>teractively, by start<strong>in</strong>g with an <strong>in</strong>itial visual representation. In [36] the authors suggest<br />

that “the purpose <strong>of</strong> visualization is <strong>in</strong>sight, not pictures”. The techniques for highdimensional<br />

data visualization can also <strong>in</strong>corporate automated analysis components to<br />

reduce its complexity and to e ectively guide the user dur<strong>in</strong>g the <strong>in</strong>teractive exploration<br />

process. This process is called visual analytics. “<strong>Visual</strong> analytics strives to facilitate<br />

the analytical reason<strong>in</strong>g process by creat<strong>in</strong>g s<strong>of</strong>tware that maximizes human capacity to<br />

perceive, understand, and reason about complex and dynamic data and situations” [137].<br />

<strong>Patterns</strong> are also not a new concept when analyz<strong>in</strong>g data. Witten and Frank expressed<br />

this perfectly <strong>in</strong> [154]: “There is noth<strong>in</strong>g new about this” (patterns). “People have been<br />

seek<strong>in</strong>g patterns <strong>in</strong> data s<strong>in</strong>ce human life began. Hunters seek patterns <strong>in</strong> animal migration<br />

behavior, farmers seek patterns <strong>in</strong> crop growth, politicians seek patterns <strong>in</strong> voter op<strong>in</strong>ion,<br />

and lovers seek patterns <strong>in</strong> their partners’ responses. A scientist’s job (like a baby’s) is<br />

to make sense <strong>of</strong> data, to discover the patterns that govern how the physical world works<br />

and encapsulate them <strong>in</strong> theories that can be used for predict<strong>in</strong>g what will happen <strong>in</strong> new<br />

situations.”<br />

In large scale multivariate data sets, sole <strong>in</strong>teractive exploration becomes <strong>in</strong>e ective<br />

or even unfeasible s<strong>in</strong>ce the number <strong>of</strong> possible representations grows rapidly with the<br />

number <strong>of</strong> dimensions. Methods are needed that help the user to automatically f<strong>in</strong>d<br />

e ective and expressive visualizations. E ective and e cient analysis methods <strong>of</strong> large<br />

multidimensional data is necessary to understand the complexity <strong>of</strong> the <strong>in</strong>formation hidden<br />

<strong>in</strong> these databases. <strong>Data</strong> dimensionality is <strong>of</strong>ten the major limit<strong>in</strong>g factor.<br />

For automatic pattern detection, a typically employed paradigm is one <strong>of</strong> cluster<strong>in</strong>g<br />

identify<strong>in</strong>g groups <strong>of</strong> objects based on their mutual similarity. Unlike traditional cluster<strong>in</strong>g<br />

methods, for the aforementioned high-dimensional data consider<strong>in</strong>g all features simultaneously<br />

is no longer e ective due to the so-called curse <strong>of</strong> dimensionality [28]. As dimensionality<br />

<strong>in</strong>creases, the distances between any two objects become less discrim<strong>in</strong>ative.<br />

Moreover, the probability <strong>of</strong> many dimensions be<strong>in</strong>g irrelevant for the underly<strong>in</strong>g cluster<br />

structure <strong>in</strong>creases. In such data sets it can be observed that each object may participate<br />

<strong>in</strong> di erent group<strong>in</strong>gs, mean<strong>in</strong>g that objects may have di erent roles. In comparison, <strong>in</strong><br />

classical cluster<strong>in</strong>g each object belongs to one cluster, and the data set is partitioned <strong>in</strong>to<br />

a number <strong>of</strong> clusters. “For example, <strong>in</strong> customer segmentation, we observe for each customer<br />

multiple possible behaviors which should be detected as clusters. In other doma<strong>in</strong>s,<br />

such as sensor networks each sensor node can be assigned to multiple clusters accord<strong>in</strong>g to<br />

di erent environmental events. In gene expression analysis, objects should be detected <strong>in</strong><br />

multiple clusters due to the various functions <strong>of</strong> each gene. In general, multiple group<strong>in</strong>gs<br />

are desired as they characterize di erent views <strong>of</strong> the data” [103].<br />

If we consider for example a customer database with a large number <strong>of</strong> customers<br />

(rows <strong>in</strong> the table) described by a large number <strong>of</strong> attributes (columns <strong>in</strong> the table) we<br />

may ask, how do this customers relate to each other, and what k<strong>in</strong>d <strong>of</strong> patterns <strong>in</strong> this<br />

case groups can be identified <strong>in</strong> this database. In Figure 1.1 we can see a toy-example<br />

belong<strong>in</strong>g to this k<strong>in</strong>d <strong>of</strong> multiple valid group<strong>in</strong>gs for one database. We can have groups<br />

like: “rich oldies”, “healthy sporties”, “unhealthy gamers”, “unemployed people”, “average<br />

people” and “sport pr<strong>of</strong>essionals” 1 . To facilitate the data analysis <strong>in</strong> this direction, we<br />

present <strong>in</strong> Chapter 5 visual <strong>in</strong>teractive systems and new analysis methods to support the<br />

understand<strong>in</strong>g and comparison <strong>of</strong> di erent group<strong>in</strong>gs <strong>in</strong> high-dimensional data.<br />

As already mentioned, this thesis is about visual analytics <strong>of</strong> patterns <strong>in</strong> high-dimensional<br />

1 This image appeared <strong>in</strong> the tutorial slides <strong>of</strong> Müller et al. [104] and the describ<strong>in</strong>g story is made up<br />

by myself.


1.1. Need for <strong>Visual</strong> Interactive <strong>Data</strong> Exploration 3<br />

Figure 1.1: Multiple valid and <strong>in</strong>terest<strong>in</strong>g group<strong>in</strong>gs <strong>of</strong> a high-dimensional data set [104].<br />

data. To assist the analysis <strong>of</strong> such data sets, e ective <strong>in</strong>formation visualization techniques<br />

provid<strong>in</strong>g a mapp<strong>in</strong>g <strong>of</strong> data properties to the screen, have been developed and are needed<br />

to make sense <strong>of</strong> the complex data at hand. The visualization <strong>of</strong> large complex <strong>in</strong>formation<br />

spaces typically <strong>in</strong>volves mapp<strong>in</strong>g high-dimensional data to lower-dimensional visual<br />

representations. The challenge for the analyst is to f<strong>in</strong>d an <strong>in</strong>sightful mapp<strong>in</strong>g, while the<br />

dimensionality <strong>of</strong> the data, and consequently the number <strong>of</strong> possible mapp<strong>in</strong>gs, <strong>in</strong>creases.<br />

As we will see later <strong>in</strong> Chapter 2, numerous expressive and e ective low-dimensional<br />

visualizations for high-dimensional data sets have been proposed <strong>in</strong> the past, such as<br />

scatterplots and scatterplot matrices (SPLOM) [37], parallel coord<strong>in</strong>ates [78], glyph-based<br />

techniques [147], pixel-based displays [145] and geometrically transformed displays [86,<br />

145]. However, f<strong>in</strong>d<strong>in</strong>g <strong>in</strong>formation-bear<strong>in</strong>g and user-<strong>in</strong>terpretable visual representations<br />

automatically rema<strong>in</strong>s a di cult task s<strong>in</strong>ce there could be a large number <strong>of</strong> possible<br />

representations. In addition, it could be di cult to expla<strong>in</strong> their relevance to the user.<br />

F<strong>in</strong>d<strong>in</strong>g relations, patterns, and trends over numerous dimensions is also di cult because<br />

the projection <strong>of</strong> n-dimensional objects over 2D spaces carries necessarily some form<br />

<strong>of</strong> <strong>in</strong>formation loss. Projection techniques like multidimensional scal<strong>in</strong>g (MDS) and pr<strong>in</strong>cipal<br />

component analysis (PCA) o er traditional solutions by creat<strong>in</strong>g data embedd<strong>in</strong>gs<br />

that try as much as possible to preserve distances <strong>of</strong> the orig<strong>in</strong>al multidimensional space<br />

<strong>in</strong> the 2D projection. These techniques have, however, severe problems <strong>in</strong> terms <strong>of</strong> <strong>in</strong>terpretation,<br />

as it is no longer possible to <strong>in</strong>terpret the observed patterns <strong>in</strong> terms <strong>of</strong> the<br />

dimension <strong>of</strong> the orig<strong>in</strong>al data space.<br />

Mechanisms to measure the quality <strong>of</strong> the visualizations are therefore needed. In<br />

the past, quality measures have been developed for di erent areas like measures for data<br />

quality (outliers, miss<strong>in</strong>g values, sampl<strong>in</strong>g rate, level <strong>of</strong> detail), cluster<strong>in</strong>g quality (purity,<br />

F-measure (comb<strong>in</strong><strong>in</strong>g precision and recall), Rand <strong>in</strong>dex [114], silhouette coe cient [85],<br />

etc.), association rule quality (support and confidence [7], <strong>in</strong>formation ga<strong>in</strong> [40], etc.) or<br />

the distance distribution measure <strong>in</strong> SURFING [16], a subspace search algorithm described<br />

and used <strong>in</strong> Chapter 5 to filter data spaces and f<strong>in</strong>d <strong>in</strong>terest<strong>in</strong>g subspaces. For visualizations,<br />

a number <strong>of</strong> authors have started <strong>in</strong>troduc<strong>in</strong>g quality measures to quantify their<br />

importance. The rationale beh<strong>in</strong>d this method is that quality measures can help users<br />

reduce the search space by filter<strong>in</strong>g out views with low <strong>in</strong>formation content. In the ideal


4 Chapter 1. Introduction<br />

system, users can select one or more measures and the system optimizes the visualization<br />

<strong>in</strong> such a way as to reflect the choice <strong>of</strong> the user. This thesis also contributes to the field<br />

<strong>of</strong> quality measures, and <strong>in</strong> Chapter 3 new measures are presented for scatterplot matrices<br />

and parallel coord<strong>in</strong>ates plots.<br />

However, there is one problem with these measures the lack <strong>of</strong> empirical validation<br />

based on user studies. These studies are <strong>in</strong> fact needed to <strong>in</strong>spect the underly<strong>in</strong>g assumption<br />

that the patterns captured by these measures correspond to the patterns captured by<br />

the human eye. S<strong>in</strong>ce many di erent patterns can be analyzed, <strong>in</strong> this thesis we started<br />

with clusters <strong>in</strong> visualizations and research <strong>in</strong> this direction by compar<strong>in</strong>g some <strong>of</strong> the<br />

most promis<strong>in</strong>g quality measures for filter<strong>in</strong>g visualizations that present clusters to the<br />

human judgement by look<strong>in</strong>g at the visualizations.<br />

The analysis <strong>of</strong> high-dimensional data is an ubiquitously relevant, yet well-known difficult<br />

problem. Problems exist both <strong>in</strong> automatic data analysis and <strong>in</strong> the visualization<br />

<strong>of</strong> this k<strong>in</strong>d <strong>of</strong> data. On the visual-<strong>in</strong>teractive side, a limited number <strong>of</strong> available visual<br />

variables and limited short-term memory <strong>of</strong> human analysts make it di cult to e ectively<br />

visualize data <strong>in</strong> high numbers <strong>of</strong> dimensions. In Chapter 5 we tackle this problem from<br />

the visual-<strong>in</strong>teractive side. We present a visual-<strong>in</strong>teractive tool to make sense <strong>of</strong> clusters<br />

<strong>in</strong> di erent subspaces, as well as an approach to identify subspaces that might show<br />

complementary cluster<strong>in</strong>gs.<br />

In summary, the focus <strong>of</strong> this thesis is to contribute on both sides <strong>of</strong> pattern f<strong>in</strong>d<strong>in</strong>g <strong>in</strong><br />

high-dimensional data, the automatic and the visual <strong>in</strong>teractive part. We believe that these<br />

parts are simultaneously needed to solve the problem and therefore we present automatic<br />

mechanisms namely quality measures to reduce the alternative possible visualizations <strong>of</strong><br />

high-dimensional data, and on the other side we visualize the relations between results to<br />

support the user <strong>in</strong> an <strong>in</strong>teractive pattern f<strong>in</strong>d<strong>in</strong>g process.<br />

1.2 Contributions <strong>of</strong> the Thesis<br />

This dissertation provides visual analytics mechanisms for pattern f<strong>in</strong>d<strong>in</strong>g <strong>in</strong> high-dimensional<br />

data. In achiev<strong>in</strong>g this goal Substantiat<strong>in</strong>g the results, we supply the follow<strong>in</strong>g contributions:<br />

• Quality measures for scatterplots and parallel coord<strong>in</strong>ates plots are developed. <strong>Visual</strong><br />

quality metrics have been recently devised to automatically extract <strong>in</strong>terest<strong>in</strong>g visual<br />

projections out <strong>of</strong> a large number <strong>of</strong> available candidates <strong>in</strong> the exploration <strong>of</strong> highdimensional<br />

databases. The metrics permit for <strong>in</strong>stance to search with<strong>in</strong> a large set <strong>of</strong><br />

scatterplots (e.g., <strong>in</strong> a scatterplot matrix) and select the views that conta<strong>in</strong> the best<br />

separation among clusters. The rationale beh<strong>in</strong>d these techniques is that automatic<br />

selection <strong>of</strong> “best” views is not only useful but also necessary when the number <strong>of</strong><br />

potential projections exceeds the limit <strong>of</strong> human <strong>in</strong>terpretation (Chapter 3) [132,<br />

133].<br />

• Validat<strong>in</strong>g the measures trough a perceptual study. We present a perceptual study<br />

<strong>in</strong>vestigat<strong>in</strong>g the relationship between human <strong>in</strong>terpretation <strong>of</strong> clusters <strong>in</strong> 2D scatterplots<br />

and the measures that were automatically extracted from these plots. Specifically,<br />

we compare a series <strong>of</strong> selected metrics and analyze how they predict human


1.3. Thesis Structure 5<br />

detection <strong>of</strong> clusters. A thorough discussion <strong>of</strong> results follows with reflections on<br />

their impact and directions for future research (Chapter 3) [134].<br />

• A systematization <strong>of</strong> techniques that use quality metrics to help <strong>in</strong> the visual exploration<br />

<strong>of</strong> mean<strong>in</strong>gful patterns <strong>in</strong> high-dimensional data. We present reflections<br />

on how di erent quality measure methods are related to each other and how the<br />

approach can be developed further. For this purpose, we provide an overview <strong>of</strong> approaches<br />

that use quality metrics <strong>in</strong> high-dimensional data visualization and propose<br />

a systematization based on a thorough literature review. We carefully analyze the<br />

papers and derive a set <strong>of</strong> factors for discrim<strong>in</strong>at<strong>in</strong>g the quality metrics, visualization<br />

techniques, and the process itself. A quality metrics pipel<strong>in</strong>e is proposed to model<br />

all the encountered varieties <strong>of</strong> metrics (Chapter 4) [27].<br />

• A visual subspace cluster analysis system (ClustNails) to understand the result <strong>of</strong><br />

subspace cluster<strong>in</strong>g. In subspace cluster<strong>in</strong>g <strong>in</strong> addition to the group<strong>in</strong>g <strong>in</strong>formation<br />

(clusters), the relevance <strong>of</strong> dimensions for particular groups and overlaps between<br />

groups, both <strong>in</strong> terms <strong>of</strong> dimensions and records, need to be analyzed. ClustNails <strong>in</strong>tegrates<br />

several novel visualization techniques with various user <strong>in</strong>teraction facilities<br />

to support navigat<strong>in</strong>g and <strong>in</strong>terpret<strong>in</strong>g the result <strong>of</strong> subspace cluster<strong>in</strong>g algorithms<br />

(Chapter 5) [136].<br />

• A novel method for the visual analysis <strong>of</strong> high-dimensional data for understand<strong>in</strong>g<br />

high-dimensional data from di erent perspectives and <strong>in</strong>vestigat<strong>in</strong>g alternative<br />

cluster<strong>in</strong>gs. We employ an <strong>in</strong>terest<strong>in</strong>gness-guided subspace search algorithm to detect<br />

a candidate set <strong>of</strong> <strong>in</strong>terest<strong>in</strong>g subspaces, that may conta<strong>in</strong> important patterns<br />

for further analysis. Based on appropriately def<strong>in</strong>ed subspace similarity functions,<br />

we visualize the subspaces and provide navigation facilities to <strong>in</strong>teractively explore<br />

large sets <strong>of</strong> subspaces. Our approach allows users to e ectively compare and relate<br />

subspaces identify<strong>in</strong>g complementary or contradict<strong>in</strong>g relations among them, thus<br />

identify<strong>in</strong>g alternative cluster<strong>in</strong>gs (Chapter 5) [135].<br />

1.3 Thesis Structure<br />

After illustrat<strong>in</strong>g the problem <strong>in</strong> the previous section and enumerat<strong>in</strong>g the contributions<br />

<strong>of</strong> this thesis, the rema<strong>in</strong>der <strong>of</strong> the thesis is structured as follows.<br />

Chapter 2 provides a brief overview <strong>of</strong> important related work <strong>in</strong> the field <strong>of</strong> highdimensional<br />

data analysis, cover<strong>in</strong>g three ma<strong>in</strong> areas. Section 2.1 <strong>in</strong>troduces the common<br />

challenges when analyz<strong>in</strong>g high-dimensional data and presents dimension reduction techniques<br />

that reduce the data complexity. Section 2.2 describes important visualization<br />

techniques for high-dimensional data. Section 2.3 <strong>in</strong>troduces standard automatic techniques<br />

from the <strong>Data</strong> M<strong>in</strong><strong>in</strong>g community, as well as presents quality measures, that are<br />

automated rank<strong>in</strong>g functions, to judge the quality <strong>of</strong> a visualization with respect to a<br />

given task. Section 2.4 presents some examples where the <strong>in</strong>terplay between visualization,<br />

automation, and <strong>in</strong>teraction is far more beneficial then any <strong>of</strong> these techniques alone.<br />

Chapter 3 proposes eight new quality metrics, for di erent tasks and two visualization<br />

types: scatterplot matrices and parallel coord<strong>in</strong>ates. The metrics are tested on a set <strong>of</strong><br />

synthetical and real data sets to prove their e ect. To ensure that the metrics reflect the


6 Chapter 1. Introduction<br />

user’s perception, a selected subset <strong>of</strong> measures for scatterplot matrices is evaluated and<br />

compared with the user’s perception. We found that both perform similar. Based on this<br />

study, we have formulated guidel<strong>in</strong>es for further evaluation <strong>of</strong> exist<strong>in</strong>g metrics.<br />

Based on a literature review, Chapter 4 <strong>in</strong>troduces a systematization <strong>of</strong> di erent quality<br />

measures for high-dimensional data visualization. Their relation is described through<br />

characteristic factors like visualization techniques or a purpose for com<strong>in</strong>g up with a coherent<br />

and unified picture for these techniques. By putt<strong>in</strong>g the exist<strong>in</strong>g methods <strong>in</strong>to a<br />

common framework, we hope <strong>in</strong> eas<strong>in</strong>g the generation <strong>of</strong> new research <strong>in</strong> the field and spott<strong>in</strong>g<br />

relevant gaps to bridge with future research. Follow<strong>in</strong>g, Section 4.2 briefly presents<br />

the results <strong>of</strong> a qualitative data analysis that lead to a visual cluster separability taxonomy.<br />

This results are the basis for the follow up discussion on relevant aspects that arise<br />

when analyz<strong>in</strong>g clusters visually and what future works need to be focused on.<br />

Chapter 5 presents two <strong>in</strong>teractive systems that help to make sense <strong>of</strong> the highdimensional<br />

data sets with respect to di erent cluster<strong>in</strong>gs. Search<strong>in</strong>g <strong>in</strong> subspaces is<br />

needed as automatic pattern search is done trough cluster<strong>in</strong>g algorithms, and it is not feasible<br />

to search for clusters <strong>in</strong> full space for high-dimensional data. Section 5.1 <strong>in</strong>troduces a<br />

visual tool, ClustNails, to <strong>in</strong>vestigate subspace cluster<strong>in</strong>g results for di erent state <strong>of</strong> the<br />

art subspace cluster<strong>in</strong>g algorithms. This tool is <strong>in</strong>tended to support the <strong>in</strong>terpretation <strong>of</strong><br />

the result with respect to the subspace cluster relations. With this visual tool questions<br />

like how many objects do clusters conta<strong>in</strong>, how many dimensions, what dimensions do<br />

overlap between clusters or what objects are shared by more clusters can be answered.<br />

Section 5.2 goes one step further and presents an analytical approach to support the<br />

identification <strong>of</strong> alternative cluster<strong>in</strong>gs <strong>in</strong> this spaces. As we know, the high-dimensionality<br />

provides di erent facets <strong>in</strong> the data like for example <strong>in</strong> a data set about people we might<br />

have clusters <strong>in</strong> the taste <strong>of</strong> music perspective (rock-music, classical music, jazz, etc.) but<br />

at the same time we also might have di erent group<strong>in</strong>gs <strong>of</strong> the same people describ<strong>in</strong>g their<br />

sportive activity level. Both views on this data are valid but provide a di erent <strong>in</strong>sight<br />

about the data. To discover such alternative cluster<strong>in</strong>gs <strong>in</strong> high-dimensional data, <strong>in</strong> this<br />

section we propose an analytical workflow that starts from search<strong>in</strong>g the set <strong>of</strong> possible<br />

subspaces identify<strong>in</strong>g <strong>in</strong>terest<strong>in</strong>g subspaces. We then group these subspaces accord<strong>in</strong>g to<br />

their data similarity provid<strong>in</strong>g filter<strong>in</strong>g mechanisms for further <strong>in</strong>teractive <strong>in</strong>vestigation.<br />

Supported by <strong>in</strong>teraction, di erent cluster<strong>in</strong>gs <strong>of</strong> the data can be identified.<br />

Chapter 6 concludes the thesis and gives an overview <strong>of</strong> further research questions that<br />

we seem <strong>in</strong>terest<strong>in</strong>g to be <strong>in</strong>vestigated <strong>in</strong> future.<br />

A schematic overview <strong>of</strong> the chapter <strong>in</strong>terrelations is shown <strong>in</strong> Figure 1.2.


1.3. Thesis Structure 7<br />

Chapter1: Introduction<br />

Chapter2: <strong>High</strong> <strong>Dimensional</strong> <strong>Data</strong> Analysis<br />

HD data<br />

Chapter4: A Model <strong>of</strong> HD <strong>Data</strong> <strong>Visual</strong>ization<br />

subspaces<br />

dimension<br />

projections<br />

<strong>Data</strong> Quality<br />

Metrics<br />

<strong>Visual</strong> Quality<br />

Metrics<br />

what is<br />

<strong>in</strong>terest<strong>in</strong>g?<br />

subspaces with<br />

"<strong>in</strong>terest<strong>in</strong>g"<br />

patterns<br />

methods to<br />

extract<br />

patterns<br />

present most<br />

<strong>in</strong>terest<strong>in</strong>g<br />

results first<br />

rank<strong>in</strong>g<br />

the result space<br />

visualization <strong>of</strong><br />

the result space<br />

how do we<br />

visualize and<br />

<strong>in</strong>teract with that?<br />

how do<br />

subspaces relate<br />

to each other?<br />

Chapter3: QM based <strong>Visual</strong> Analysis <strong>of</strong> HD <strong>Data</strong><br />

Chapter5: <strong>Visual</strong> Subspace Analysis <strong>of</strong> HD <strong>Data</strong><br />

Chapter6: Conclusion and Future Work<br />

Figure 1.2: Schematic overview <strong>of</strong> the <strong>in</strong>terrelation <strong>of</strong> chapters <strong>in</strong> this thesis.<br />

Parts <strong>of</strong> this thesis where published <strong>in</strong>:<br />

1. A. Tatu, G. Albuquerque, M. Eisemann, J. Schneidew<strong>in</strong>d, H. Theisel, M. Magnor,<br />

and D. Keim. Comb<strong>in</strong><strong>in</strong>g automated analysis and visualization techniques<br />

for e ective exploration <strong>of</strong> high dimensional data. Proceed<strong>in</strong>gs <strong>of</strong> the IEEE<br />

Symposium on <strong>Visual</strong> <strong>Analytics</strong> Science and Technology (VAST), pages 59-66, 2009.<br />

The contributions: for this publication I took the lead on the computer science<br />

research part <strong>of</strong> the paper implement<strong>in</strong>g the data space measures and lead<strong>in</strong>g also<br />

the writ<strong>in</strong>g <strong>of</strong> the paper itself. G. Albuquerque and M. Eisemann implemented the<br />

image quality metrics and provided their description <strong>in</strong> the paper and some parts<br />

<strong>of</strong> the evaluation section with these metrics. The Histogram Density measures were<br />

programmed by myself. J. Schneidew<strong>in</strong>d gave advice for structur<strong>in</strong>g the paper and<br />

present<strong>in</strong>g the results. D. Keim accompanied the project with suggestions for improvements<br />

for application and text. H. Theisel and M. Magnor gave advice to the<br />

project. All parts <strong>of</strong> the paper where revised several times by me, thus <strong>in</strong> this thesis<br />

I use the paper text without citation marks. G. Albuquerque’s thesis (title unknown<br />

by the time <strong>of</strong> my submission) might conta<strong>in</strong> some text passages <strong>of</strong> this paper too for<br />

the parts she took part <strong>in</strong> the project.<br />

2. A. Tatu, G. Albuquerque, M. Eisemann, P. Bak, H. Theisel, M. Magnor, and D. A.<br />

Keim. Automated <strong>Visual</strong> Analysis Methods for an E ective Exploration<br />

<strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong>. IEEE Transactions on <strong>Visual</strong>ization and Computer<br />

Graphics (TVCG), 17(5):pp. 584-597, May 2011.


8 Chapter 1. Introduction<br />

The contributions: publication 1. was elected as one <strong>of</strong> the best for the VAST’09<br />

conference and this publication is an <strong>in</strong>vited extension <strong>of</strong> 1. As primary author, I<br />

was responsible for writ<strong>in</strong>g the paper, generat<strong>in</strong>g new use-cases, test<strong>in</strong>g our measures<br />

and describ<strong>in</strong>g further research directions <strong>in</strong> this area. G. Albuquerque implemented,<br />

described and tested the new CSM measure. P. Bak gave advice for structur<strong>in</strong>g<br />

the experiments and present<strong>in</strong>g the results. D. Keim accompanied the paper with<br />

suggestions for improvements for application and text. M. Eisemann, H. Theisel<br />

and M. Magnor gave advice to the paper. All parts <strong>of</strong> the paper where revised several<br />

times by me, thus, <strong>in</strong> this thesis I use the paper text without citation marks. G.<br />

Albuquerque’s thesis (title unknown by the time <strong>of</strong> my submission) might conta<strong>in</strong><br />

some text passages <strong>of</strong> this paper too for the parts she took part <strong>in</strong> the project.<br />

3. A. Tatu, P. Bak, E. Bert<strong>in</strong>i, D. A. Keim, and J. Schneidew<strong>in</strong>d. <strong>Visual</strong> quality<br />

metrics and human perception: an <strong>in</strong>itial study on 2D projections <strong>of</strong> large<br />

multidimensional data. In Proceed<strong>in</strong>gs <strong>of</strong> the Work<strong>in</strong>g Conference on Advanced<br />

<strong>Visual</strong> Interfaces (AVI), pages 49-56. ACM, 2010.<br />

The contributions: for this publication I took primary responsibility and additionally,<br />

I took the lead on the automatic evaluation. P. Bak took the lead on the human<br />

experiment. Together we compared the results and evaluated them statistically. E.<br />

Bert<strong>in</strong>i, D. Keim and J. Schneidew<strong>in</strong>d accompanied the paper with suggestions for<br />

improvements for experimental design and text. All parts <strong>of</strong> the paper where revised<br />

several times by me; thus, <strong>in</strong> this thesis I use the paper text without citation marks.<br />

4. D. J. Lehmann, G. Albuquerque, M. Eisemann, A. Tatu, D. A. Keim, H. Schumann,<br />

M. Magnor and H. Theisel. <strong>Visual</strong>isierung und Analyse multidimensionaler<br />

Datensätze. Informatik-Spektrum, Spr<strong>in</strong>ger Berl<strong>in</strong>/Heidelberg, 33(6):589-<br />

600, 2010.<br />

The contributions: this publication was authored by D. Lehman. My contribution<br />

was to describe the use <strong>of</strong> quality metrics for high-dimensional data. This thesis was<br />

<strong>in</strong>spired by the discussions <strong>of</strong> this paper.<br />

5. E. Bert<strong>in</strong>i, A. Tatu, and D. A. Keim. Quality Metrics <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong><br />

<strong>Data</strong> <strong>Visual</strong>ization: An Overview and Systematization. Proceed<strong>in</strong>gs <strong>of</strong> the<br />

IEEE Symposium on Information <strong>Visual</strong>ization (InfoVis), 17(12):pages 2203-2212,<br />

Dec. 2011.<br />

The contributions: this publication was authored equally by E. Bert<strong>in</strong>i and myself.<br />

We decided to show this by enumerat<strong>in</strong>g our names alphabetically <strong>in</strong> the authors list.<br />

E. Bert<strong>in</strong>i and I conducted the literature review, came up with the systematization<br />

and description model <strong>of</strong> quality metrics, and described this process <strong>in</strong> this paper. D.<br />

Keim played the devils advocate to test our model and gave advice for improvement.<br />

All parts <strong>of</strong> the paper where written and revised several times by both lead<strong>in</strong>g authors.<br />

Thus, <strong>in</strong> this thesis I use the paper text without citation marks.<br />

6. M. Sedlmair, A. Tatu, T. Munzner, and M. Tory. A taxonomy <strong>of</strong> visual cluster<br />

separation factors. Computer Graphics Forum (EuroVis), 31(3pt4):1335-1344,<br />

June 2012.<br />

The contributions: M. Sedlmair took the lead <strong>in</strong> writ<strong>in</strong>g this publication. M.<br />

Sedlmair and I conducted the qualitative analysis <strong>of</strong> the over 800 plots, and labeled


1.3. Thesis Structure 9<br />

all the cases with di erent keywords. Based on these M. Sedlmair and T. Munzner<br />

came up with the taxonomy, and described it <strong>in</strong> the paper. I tested special cases like<br />

grid size <strong>in</strong>fluence dur<strong>in</strong>g the writ<strong>in</strong>g process <strong>of</strong> the paper. M. Tory accompanied the<br />

paper with suggestions for improvements for the analysis and taxonomy and revised<br />

the text. In this thesis, I describe the results presented <strong>in</strong> that paper, without us<strong>in</strong>g<br />

the text, and I provide further ideas for research <strong>in</strong> this area.<br />

7. A. Tatu, F. Maaß, I. Färber, E. Bert<strong>in</strong>i, T. Schreck, T. Seidl, and D. Keim. Subspace<br />

Search and <strong>Visual</strong>ization to Make Sense <strong>of</strong> Alternative Cluster<strong>in</strong>gs<br />

<strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong>. IEEE Symposium on <strong>Visual</strong> <strong>Analytics</strong> Science and<br />

Technology (VAST), pages 63-72, 2012.<br />

The contributions: for this publication I took the lead on the project and paper<br />

writ<strong>in</strong>g. F. Maaß implemented the subspace tool advised by myself, E. Bert<strong>in</strong>i and T.<br />

Schreck. T. Schreck gave advise <strong>in</strong> structur<strong>in</strong>g the paper and present<strong>in</strong>g the results<br />

by provid<strong>in</strong>g <strong>in</strong>itial sections <strong>of</strong> the paper. I. Färber provided an <strong>in</strong>itial section on<br />

subspace cluster<strong>in</strong>g. T. Seidl and D. Keim gave advice to the project. Major parts <strong>of</strong><br />

the paper where written by myself and all the other parts where revised several times<br />

by me. Thus, <strong>in</strong> this thesis I use the paper text without citation marks.<br />

8. A. Tatu, L. Zhang, E. Bert<strong>in</strong>i, T. Schreck, D. A. Keim, S. Bremm, and T. von Landesberger.<br />

ClustNails: <strong>Visual</strong> Analysis <strong>of</strong> Subspace Clusters. Ts<strong>in</strong>ghua Science<br />

and Technology, Special Issue on <strong>Visual</strong>ization and Computer Graphics, 17(4):419-<br />

428, Aug. 2012.<br />

The contributions: for this publication I took the lead on the project and paper<br />

writ<strong>in</strong>g. I implemented the subspace tool supported for some components by L. Zhang.<br />

E. Bert<strong>in</strong>i, T. Schreck gave advise <strong>in</strong> structur<strong>in</strong>g the paper and present<strong>in</strong>g the results<br />

and provided <strong>in</strong>itial sections that I shaped for the f<strong>in</strong>al submission. D. A. Keim, S.<br />

Bremm, and T. von Landesberger gave advice to the project. Major parts <strong>of</strong> the<br />

paper where written by myself and I revised all the other parts <strong>of</strong> my co-authors<br />

several times to shape the f<strong>in</strong>al paper version. Thus, <strong>in</strong> this thesis I use the paper<br />

text without citation marks.<br />

Other publications to which I contributed but are not <strong>in</strong>cluded <strong>in</strong> this thesis:<br />

1. M. Schaefer, L. Zhang, T. Schreck, A. Tatu, J. A. Lee, M. Verleysen and D. A.<br />

Keim. Improv<strong>in</strong>g projection-based data analysis by feature space transformations.<br />

In Proceed<strong>in</strong>gs <strong>of</strong> SPIE 8654, <strong>Visual</strong>ization and <strong>Data</strong> Analysis, 2013.<br />

2. B. Bustos, D. A. Keim, D. Saupe, T. Schreck and A. Tatu. Methods and User<br />

Interfaces for E ective Retrieval <strong>in</strong> 3D <strong>Data</strong>bases (<strong>in</strong> German). Datenbank<br />

- Spektrum - Zeitschrift fuer Datenbank Technologie und Information Retrieval,<br />

dpunkt.verlag, 7(20):23-32, 2007.


10 Chapter 1. Introduction


2<br />

<strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> Analysis<br />

Contents<br />

„You can observe a lot by watch<strong>in</strong>g.”<br />

Yogi Berra<br />

2.1 Basic Techniques for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> Analysis . . . . . 12<br />

2.1.1 Common Challenges with <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> . . . . . . . . 12<br />

2.1.2 Feature Selection and Feature Extraction . . . . . . . . . . . . . 12<br />

2.2 Information <strong>Visual</strong>ization Techniques for <strong>High</strong>-<strong>Dimensional</strong><br />

<strong>Data</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13<br />

2.2.1 Information <strong>Visual</strong>ization Techniques . . . . . . . . . . . . . . . 13<br />

2.2.2 Limitations while <strong>Visual</strong>iz<strong>in</strong>g <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> . . . . . . 16<br />

2.3 Automated Techniques for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> . . . . . . . 17<br />

2.3.1 <strong>Data</strong> M<strong>in</strong><strong>in</strong>g Techniques for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> . . . . . . . 17<br />

2.3.2 Quality Measures for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>izations . . . 19<br />

2.4 <strong>Visual</strong> <strong>Analytics</strong> for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> . . . . . . . . . . . 22<br />

2.4.1 <strong>Visual</strong> Interactive Systems for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> Analysis . 22<br />

2.4.2 Subspace Cluster Analysis and <strong>Visual</strong>ization . . . . . . . . . . . 26<br />

H<br />

igh-dimensional data conta<strong>in</strong>s complex patterns and di erent data analysis approaches<br />

have beed developed dur<strong>in</strong>g the past years to uncover the possible hidden<br />

patterns <strong>of</strong> this data. As is outl<strong>in</strong>ed <strong>in</strong> the follow<strong>in</strong>g, this thesis is related to a number <strong>of</strong><br />

broader areas <strong>in</strong> data analysis and visualization <strong>of</strong> high-dimensional data.<br />

In this chapter, Section 2.1 describes the ma<strong>in</strong> challenges when deal<strong>in</strong>g with highdimensional<br />

data and some basic techniques to reduce its dimensionality. Section 2.2 gives<br />

an overview <strong>of</strong> exist<strong>in</strong>g visualization techniques for high-dimensional data, and identifies<br />

the visualization challenges that arise due to the data complexity. Section 2.3 presents a<br />

series <strong>of</strong> automated techniques from <strong>Data</strong> M<strong>in</strong><strong>in</strong>g for pattern analysis <strong>in</strong> high-dimensional<br />

data, focus<strong>in</strong>g on cluster<strong>in</strong>g. The second part presents mechanisms to quantify the quality<br />

<strong>of</strong> visualizations, called quality metrics. Due to the limitations <strong>of</strong> the pure visual<strong>in</strong>teractive<br />

solution or a sole automatic approach, <strong>in</strong> Section 2.4 we present works from<br />

related fields where the <strong>in</strong>terplay <strong>of</strong> visualization and automation together with <strong>in</strong>teractive<br />

features can provide better solutions to the tasks at hand. All examples <strong>of</strong> these sections<br />

are <strong>in</strong> the context <strong>of</strong> pattern f<strong>in</strong>d<strong>in</strong>g and understand<strong>in</strong>g <strong>of</strong> high-dimensional data.<br />

Parts <strong>of</strong> this chapter appeared <strong>in</strong> [27, 132, 133, 134, 135, 136].


12 Chapter 2. <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> Analysis<br />

2.1 Basic Techniques for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> Analysis<br />

2.1.1 Common Challenges with <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

Before present<strong>in</strong>g di erent techniques to analyze high-dimensional data sets, we will discuss<br />

two common challenges <strong>in</strong> this area.<br />

The first issue is the so called curse <strong>of</strong> dimensionality. In high-dimensional analysis<br />

problems are known to be di cult due to the curse <strong>of</strong> dimensionality. This term was<br />

formulated by R. Bellman [20] <strong>in</strong> the context <strong>of</strong> dynamic programm<strong>in</strong>g, and describes<br />

the fact, that when dimensionality <strong>in</strong>creases the data becomes sparse. In other words,<br />

<strong>in</strong> high-dimensional data everyth<strong>in</strong>g tends to be basically equidistant mak<strong>in</strong>g it hard to<br />

make any dist<strong>in</strong>ctions between objects. Additionally, many exist<strong>in</strong>g <strong>Data</strong> M<strong>in</strong><strong>in</strong>g algorithms<br />

have a complexity exponential with respect to the number <strong>of</strong> data dimensions.<br />

With <strong>in</strong>creas<strong>in</strong>g dimensionality, these algorithms become computationally <strong>in</strong>tractable and<br />

therefore <strong>in</strong>applicable <strong>in</strong> many real applications.<br />

The second issue concerns the mean<strong>in</strong>g <strong>of</strong> similarity <strong>in</strong> a high-dimensional space is<br />

therefore dim<strong>in</strong>ished. It was shown <strong>in</strong> [28] that as dimensionality <strong>in</strong>creases the distance to<br />

the nearest data po<strong>in</strong>t approaches the distance to the farthest data po<strong>in</strong>t. This problem<br />

<strong>in</strong>fluences the design <strong>of</strong> similarity functions for objects <strong>in</strong> high-dimensional spaces.<br />

2.1.2 Feature Selection and Feature Extraction<br />

A simple, but sometimes very e ective, way to deal with high-dimensional data is to reduce<br />

the number <strong>of</strong> dimensions by elim<strong>in</strong>at<strong>in</strong>g those that seem to be irrelevant.<br />

Dimension reduction can be achieved by either feature selection [61] or feature extraction<br />

[44]. Feature selection is the problem <strong>of</strong> select<strong>in</strong>g from a large space <strong>of</strong> <strong>in</strong>put features<br />

(or dimensions) a smaller number <strong>of</strong> features that optimize a measurable criterion, e.g.,<br />

the accuracy <strong>of</strong> a classifier [97].<br />

Feature extraction methods reduce the dimensionality <strong>of</strong> the data by form<strong>in</strong>g a new<br />

set <strong>of</strong> dimensions as a l<strong>in</strong>ear or nonl<strong>in</strong>ear comb<strong>in</strong>ation <strong>of</strong> the orig<strong>in</strong>al dimensions. This<br />

synthetic dimensions represent most (or all) <strong>of</strong> the structure <strong>of</strong> the orig<strong>in</strong>al data set by<br />

us<strong>in</strong>g less attributes. Depend<strong>in</strong>g on the tra<strong>in</strong><strong>in</strong>g data, the methods can be supervised<br />

or unsupervised. “Supervised methods rely on class labels and optimize the performance<br />

<strong>of</strong> a supervised learn<strong>in</strong>g algorithm, typically a classifier. Unsupervised methods rely on<br />

quality criteria measured from the output <strong>of</strong> an unsupervised learn<strong>in</strong>g method, typically a<br />

cluster<strong>in</strong>g algorithm. However, many algorithms have variations for both supervised and<br />

unsupervised learn<strong>in</strong>g” [119]. Most automatic feature selection methods rely on supervised<br />

<strong>in</strong>formation (e.g., class labeled data) to perform the selection. Consequently, they are not<br />

directly applicable to the explorative analysis problem.<br />

For understand<strong>in</strong>g the fundamental pr<strong>in</strong>ciple <strong>of</strong> feature extraction techniques <strong>in</strong> the<br />

next paragraphs, we describe the traditional dimension reduction methods, the pr<strong>in</strong>cipal<br />

component analysis (PCA) [83] and the multidimensional scal<strong>in</strong>g (MDS) [41].<br />

PCA tries to preserve the variance <strong>in</strong> the data and transforms the set <strong>of</strong> possibly<br />

correlated dimensions <strong>in</strong>to new set <strong>of</strong> l<strong>in</strong>early uncorrelated dimensions that are a l<strong>in</strong>ear<br />

comb<strong>in</strong>ation <strong>of</strong> the orig<strong>in</strong>al dimensions and are called pr<strong>in</strong>cipal components. The first<br />

component conta<strong>in</strong>s the largest variance <strong>of</strong> the orig<strong>in</strong>al dimension set, the second component<br />

is l<strong>in</strong>early uncorrelated to the previous one and also conta<strong>in</strong>s the maximal possible


2.2. Information <strong>Visual</strong>ization Techniques for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> 13<br />

variance and so on. The data set can be reduced by ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g a smaller set <strong>of</strong> pr<strong>in</strong>cipal<br />

coord<strong>in</strong>ates, as transformed dimensions.<br />

MDS tries to preserve the pairwise distances between the data po<strong>in</strong>ts. There are a lot<br />

<strong>of</strong> variants <strong>of</strong> MDS dependent on the used distance functions [31]. The simplest version<br />

is the l<strong>in</strong>ear MDS, also called classical scal<strong>in</strong>g, and its solution is very closely related to<br />

PCA when us<strong>in</strong>g an Euclidian distance function.<br />

All these techniques rely on the idea that variation <strong>of</strong> the data can be expla<strong>in</strong>ed by<br />

a smaller number <strong>of</strong> transformed features. Their ma<strong>in</strong> di erence to the feature selection<br />

methods is that these methods <strong>in</strong>stead <strong>of</strong> choos<strong>in</strong>g a subset <strong>of</strong> dimensions from the data,<br />

create new dimensions def<strong>in</strong>ed as functions over all dimensions. They also do not consider<br />

class labels but rather their computation is rely<strong>in</strong>g just on data po<strong>in</strong>ts.<br />

General problems <strong>in</strong> these techniques are that the mapp<strong>in</strong>g <strong>of</strong>ten is not unique. The<br />

techniques have several parameters that <strong>in</strong>fluence the result, and the <strong>in</strong>terpretability <strong>of</strong><br />

result<strong>in</strong>g dimensions is sometimes di cult because the orig<strong>in</strong>al space dimensions com<strong>in</strong>g<br />

from a specific doma<strong>in</strong> have a certa<strong>in</strong> <strong>in</strong>terpretation (like age, <strong>in</strong>come, etc.) but their<br />

l<strong>in</strong>ear comb<strong>in</strong>ations can be hardly <strong>in</strong>terpreted.<br />

Koren and Carmel propose a series <strong>of</strong> new methods for creat<strong>in</strong>g projections from highdimensional<br />

data sets us<strong>in</strong>g l<strong>in</strong>ear transformations [89]. For non-labeled data, they propose<br />

a generalization <strong>of</strong> the PCA, the normalized PCA, that normalizes the squared pairwise<br />

distances to reduce the dom<strong>in</strong>ance <strong>of</strong> the large distances normally occurr<strong>in</strong>g for the standard<br />

PCA transformation. For labeled data, their methods <strong>in</strong>tegrate the class labels <strong>of</strong><br />

the data <strong>in</strong> the computation, result<strong>in</strong>g <strong>in</strong> projections with a clearer separation between<br />

the classes. This methods compared to traditional PCA or MDS have the advantage that<br />

they also capture <strong>in</strong>tra-cluster shapes.<br />

In addition to PCA and MDS presented above, there have been developed more techniques<br />

based on l<strong>in</strong>ear or non-l<strong>in</strong>ear transformations <strong>of</strong> the orig<strong>in</strong>al features to obta<strong>in</strong> a<br />

reduced set <strong>of</strong> synthetic dimensions. Detailed surveys can be found <strong>in</strong> [111, 153]. Another<br />

prom<strong>in</strong>ent group <strong>of</strong> techniques for dimension reduction, which we want to recall shortly<br />

at this po<strong>in</strong>t, rely on signal process<strong>in</strong>g techniques, that, when applied to a data vector,<br />

transform it to a numerically di erent vector [64]. These are for e.g. Discrete Fourier<br />

Transform, Cos<strong>in</strong>e Transform, Wavelet Transform etc. S<strong>in</strong>ce <strong>in</strong>put and transformed data<br />

vectors have the same length, the data is reduced by a user specified threshold that is used<br />

to truncate the transformed vector (e.g. wavelet coe cients).<br />

2.2 Information <strong>Visual</strong>ization Techniques for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

2.2.1 Information <strong>Visual</strong>ization Techniques<br />

The representation <strong>of</strong> high-dimensional data is one <strong>of</strong> the ma<strong>in</strong> research challenges <strong>in</strong><br />

visualization. Several techniques have been developed <strong>in</strong> recent years to deal with the<br />

problem <strong>of</strong> represent<strong>in</strong>g relations among many dimensions on a computer display, which<br />

is <strong>in</strong>herently bi-dimensional. Consider<strong>in</strong>g also the visual variables data visualizations can<br />

go a bit beyond 2D us<strong>in</strong>g color, shape, etc. but still have di erent issues for represent<strong>in</strong>g<br />

high-dimensional data sets. Classic approaches <strong>in</strong>clude parallel coord<strong>in</strong>ates, scatterplot<br />

matrices, glyph-based and pixel-oriented techniques [145]. Figure 2.1 shows some examples


14 Chapter 2. <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> Analysis<br />

for these techniques taken from [145].<br />

A<br />

B<br />

C<br />

D<br />

Figure 2.1: <strong>High</strong>-dimensional visualization techniques taken from [145]. A: Scatterplot matrix<br />

show<strong>in</strong>g on the diagonal a histogram plot for each dimension. Selected po<strong>in</strong>ts are marked <strong>in</strong> red <strong>in</strong><br />

all plots. B: Parallel coord<strong>in</strong>ates plot <strong>of</strong> a seven-dimensional data set. One polyl<strong>in</strong>e represent<strong>in</strong>g<br />

one data po<strong>in</strong>t is highlighted <strong>in</strong> red. C: Star glyphs <strong>in</strong> a MDS layout. D: Dense pixel displays<br />

represent<strong>in</strong>g a 14-dimensional data set.<br />

Scatterplots and Scatterplot Matrices [37]<br />

2D scatterplots are one <strong>of</strong> the most common used visualization techniques <strong>in</strong> data analysis.<br />

The data is represented by po<strong>in</strong>ts <strong>in</strong> a rectangular box, each hav<strong>in</strong>g the value <strong>of</strong> one<br />

variable (dimension) determ<strong>in</strong><strong>in</strong>g the position on the horizontal axis, and the value <strong>of</strong> the<br />

other variable, determ<strong>in</strong><strong>in</strong>g the position on the vertical axis. To represent a data set <strong>of</strong> a<br />

higher dimensionality, a common approach is to build a scatterplot matrix (SPLOM) [37].<br />

Figure 2.1A shows an example <strong>of</strong> such a matrix for a four-dimensional data set, where<br />

every pair <strong>of</strong> dimensions is represented <strong>in</strong> one scatterplot. The matrix shows every plot<br />

twice, be<strong>in</strong>g symmetrical with respect to the diagonal. Additionally, on the diagonal, dimension<br />

histograms show the value distribution <strong>in</strong>formation for each dimension. Selected<br />

po<strong>in</strong>ts are highlighted <strong>in</strong> red and a purple rectangle <strong>in</strong>dicates their region.


2.2.1 Information <strong>Visual</strong>ization Techniques 15<br />

Parallel Coord<strong>in</strong>ates [78]<br />

Another important visualization method for multivariate data sets is parallel coord<strong>in</strong>ates.<br />

Parallel coord<strong>in</strong>ates was first <strong>in</strong>troduced by Inselberg [77] and is used <strong>in</strong> several tools,<br />

e.g. XmdvTool [146] and VIS-STAMP [60], for visualiz<strong>in</strong>g multivariate data. The basic<br />

idea is that each dimension 1 <strong>of</strong> the data is a vertical l<strong>in</strong>e, so the axes <strong>of</strong> the plot are a<br />

collection <strong>of</strong> parallel l<strong>in</strong>es. Each data po<strong>in</strong>t is a polyl<strong>in</strong>e that crosses each dimension axis<br />

by <strong>in</strong>tersect<strong>in</strong>g it at its dimension value. Figure 2.1B shows an example <strong>of</strong> parallel coord<strong>in</strong>ates<br />

for a seven-dimensional data set where one data po<strong>in</strong>t’s ployl<strong>in</strong>e is highlighted <strong>in</strong><br />

red. In comparison to the scatterplots, parallel coord<strong>in</strong>ates can show data sets <strong>of</strong> higher<br />

dimensionality <strong>in</strong> one display. In a SPLOM a higher dimensional data set can be visualized<br />

by plott<strong>in</strong>g every two-dimensional comb<strong>in</strong>ation <strong>in</strong> one scatterplot. For both, parallel coord<strong>in</strong>ates<br />

and SPLOM, the order<strong>in</strong>g is important. For parallel coord<strong>in</strong>ates the order <strong>of</strong> axes<br />

(dimensions) and analog for the SPLOM the order <strong>of</strong> rows and columns, s<strong>in</strong>ce di erent<br />

order<strong>in</strong>gs make di erent relations <strong>in</strong> the data visible. It is important to decide the order<br />

<strong>of</strong> the dimensions that are to be presented to the user. Their e ectiveness, however, is<br />

highly related to the dimensionality <strong>of</strong> the data under <strong>in</strong>spection. Because the resolution<br />

available decreases as the number <strong>of</strong> data dimensions <strong>in</strong>creases, it becomes very di cult, if<br />

not impossible, to explore the whole set <strong>of</strong> available order<strong>in</strong>gs manually. In Section 2.3.2,<br />

we describe the notion <strong>of</strong> quality metrics that are mechanisms to automatically quantify<br />

the quality <strong>of</strong> the display and <strong>in</strong> Section 3.1.4, we <strong>in</strong>troduce new quality metrics to determ<strong>in</strong>e<br />

the best order<strong>in</strong>g <strong>in</strong> parallel coord<strong>in</strong>ates with respect to a given task.<br />

Glyph-based techniques [147]<br />

“Glyphs are graphical entities that convey one or more data values via attributes such<br />

as shape, size, color, and position” [147]. There is a variety <strong>of</strong> glyphs proposed <strong>in</strong> the<br />

literature so far, and just to name some there are: star glyphs, face glyphs, pr<strong>of</strong>ile glyphs<br />

or box glyphs. An overview <strong>of</strong> multivariate glyphs can be found <strong>in</strong> [147]. They all have<br />

<strong>in</strong> common that they have one graphical representation per object, but use di erent encod<strong>in</strong>gs<br />

for the objects attributes (e.g. length, area, color). In Figure 2.1C star glyphs<br />

are exemplified. As the name suggests each object is represented by a star shaped glyph,<br />

where the value <strong>of</strong> each dimension is represented by the length <strong>of</strong> evenly spaced rays. The<br />

ray ends are connected by a polyl<strong>in</strong>e.<br />

Pixel-oriented techniques [145]<br />

Pixel-oriented techniques “map each value to <strong>in</strong>dividual pixels and create a filled polygon to<br />

represent each dimension” [145]. In Figure 2.1D a 14-dimensional data set is represented<br />

by dense pixel displays show<strong>in</strong>g each dimension <strong>in</strong> a separate rectangle and each data<br />

value as a colored pixel <strong>in</strong> the rectangle. The values are sorted accord<strong>in</strong>g to the tenth<br />

dimension, that is marked with a black border. Here we can see several challenges for<br />

this techniques. One is the already mentioned order<strong>in</strong>g <strong>of</strong> data values, to spot correlated<br />

dimensions, another one is the order<strong>in</strong>g <strong>of</strong> dimensions to position similar dimensions close<br />

to each other on the screen. Us<strong>in</strong>g di erent colormaps can also reveal di erent patterns <strong>in</strong><br />

the data, thus choos<strong>in</strong>g the suitable colormap for each data and task, suitable colormap<br />

is yet another challenge. Additionally, position<strong>in</strong>g the dimensions on the screen is not<br />

trivial, s<strong>in</strong>ce di erent layouts – not only the grid layout – can be possible.<br />

1 We use the terms dimension and attribute (as well as feature, variable, column and axis) <strong>in</strong>terchangeably<br />

<strong>in</strong> this thesis. We choose among them based on the context <strong>of</strong> the discussion, while attempt<strong>in</strong>g to be<br />

consistent with their use <strong>in</strong> the literature.


16 Chapter 2. <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> Analysis<br />

2.2.2 Limitations while <strong>Visual</strong>iz<strong>in</strong>g <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

As previously demonstrated, there are di erent ways to represent high-dimensional data<br />

on the screen and all these br<strong>in</strong>g a number <strong>of</strong> challenges with them. Moreover, as already<br />

identified there are challenges due to the scalability <strong>of</strong> the display, the order<strong>in</strong>g <strong>of</strong> displayed<br />

objects or dimensions, the position<strong>in</strong>g <strong>of</strong> objects on the screen, the high number<br />

<strong>of</strong> possible visual mapp<strong>in</strong>gs. Provid<strong>in</strong>g solutions for some <strong>of</strong> this problems would ease the<br />

exploration <strong>of</strong> the high-dimensional data. By an appropriate sort<strong>in</strong>g <strong>of</strong> dimensions and<br />

an appropriate mapp<strong>in</strong>g to visual variables, clutter can be reduced and these visualization<br />

methods could allow to overview and relate high-dimensional data sets [49]. The data<br />

dimensionality causes problems <strong>in</strong> the visual mapp<strong>in</strong>g stage, mean<strong>in</strong>g it is unclear which<br />

mapp<strong>in</strong>g is the best, so what data dimension should be mapped to what visual variable.<br />

Because <strong>of</strong> the high number <strong>of</strong> possible mapp<strong>in</strong>gs for a high-dimensional data set, automated<br />

methods are needed to restrict this number. One way to judge the quality <strong>of</strong> these<br />

mapp<strong>in</strong>gs is to compute quality measures for the displayed data (see Chapter 3 for more<br />

details) or to reduce the number <strong>of</strong> dimensions by dimensionality reduction techniques<br />

(see Section 2.1.2).<br />

Enrich<strong>in</strong>g <strong>Visual</strong>izations<br />

Static visualization techniques are not flexible enough to reveal the complex high-dimensional<br />

patterns, thus <strong>in</strong>teraction is needed at this po<strong>in</strong>t. Proposed are di erent solutions to make<br />

visualizations <strong>in</strong>teractive, support<strong>in</strong>g a dynamic use for high-dimensional data. These <strong>in</strong>clude<br />

brush<strong>in</strong>g and l<strong>in</strong>k<strong>in</strong>g [46], pann<strong>in</strong>g and zoom<strong>in</strong>g [19], focus-plus-context [92], magic<br />

lenses [29].<br />

“Brush<strong>in</strong>g and l<strong>in</strong>k<strong>in</strong>g refers to the connect<strong>in</strong>g <strong>of</strong> two or more views <strong>of</strong> the same data,<br />

such that a change to the representation <strong>in</strong> one view a ects the representation <strong>in</strong> the<br />

other views as well. ... Pann<strong>in</strong>g and zoom<strong>in</strong>g refers to the actions <strong>of</strong> a movie camera<br />

that can scan sideways across a scene (pann<strong>in</strong>g) or move <strong>in</strong> for a closeup or back away to<br />

get a wider view (zoom<strong>in</strong>g). ... When zoom<strong>in</strong>g is used, the more detail is visible about<br />

a particular item, the less can be seen about the surround<strong>in</strong>g items. Focus-plus-context<br />

is used to partly alleviate this e ect. The idea is to make one portion <strong>of</strong> the view – the<br />

focus <strong>of</strong> attention – larger, while simultaneously shr<strong>in</strong>k<strong>in</strong>g the surround<strong>in</strong>g objects. The<br />

farther an object is from the focus <strong>of</strong> attention, the smaller it is made to appear. ... Magic<br />

lenses are directly manipulable transparent w<strong>in</strong>dows that, when overlapped on some other<br />

data type, cause a transformation to be applied to the underly<strong>in</strong>g data, thus chang<strong>in</strong>g<br />

its appearance” [15]. A full exemplification <strong>of</strong> these techniques is out <strong>of</strong> the scope <strong>of</strong> this<br />

work, and more details can be read <strong>in</strong> [15] 2 .<br />

<strong>Patterns</strong> that are just visible <strong>in</strong> subspaces <strong>of</strong> the orig<strong>in</strong>al data space also need specialized<br />

visualizations to disclose the relations between the di erent subspaces from which<br />

they orig<strong>in</strong>ate as well as their possible object overlap. In Chapter 5 we present a visual<strong>in</strong>teractive<br />

tool for this purpose.<br />

2 The cited description for each technique are from Chapter 10: User Interfaces and <strong>Visual</strong>ization - by<br />

Marti Hearst. This chapter can also be found onl<strong>in</strong>e at http://people.ischool.berkeley.edu/˜hearst/<br />

irbook/10/node3.html#SECTION00122000000000000000f(last accessed on 03/13).


2.3. Automated Techniques for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> 17<br />

2.3 Automated Techniques for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

In this section, we present automated methods for analyz<strong>in</strong>g high-dimensional data. Section<br />

2.3.1 discusses di erent data m<strong>in</strong><strong>in</strong>g approaches to extract patterns from data. The<br />

focus is on cluster<strong>in</strong>g. We present general approaches, enumerat<strong>in</strong>g approaches that have<br />

been especially developed for cop<strong>in</strong>g with high-dimensional data, and present the di erence<br />

between cluster<strong>in</strong>g <strong>in</strong> a dimension reduced data set and subspace cluster<strong>in</strong>g. Besides<br />

automated pattern extraction, <strong>in</strong> Section 2.3.2 we <strong>in</strong>troduce automation to judge the quality<br />

<strong>of</strong> visualization, namely by quality metrics. Given the huge number <strong>of</strong> possible visual<br />

representations for high-dimensional data, the user is assisted <strong>in</strong> f<strong>in</strong>d<strong>in</strong>g the right visual<br />

mapp<strong>in</strong>g or the right projection for his data. Our contribution to this area consist<strong>in</strong>g <strong>of</strong><br />

new measures, a quality measures pipel<strong>in</strong>e, and a systematization <strong>of</strong> exist<strong>in</strong>g measures, is<br />

outl<strong>in</strong>ed <strong>in</strong> Chapters 3 and 4.<br />

2.3.1 <strong>Data</strong> M<strong>in</strong><strong>in</strong>g Techniques for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

<strong>Data</strong> M<strong>in</strong><strong>in</strong>g refers to extract<strong>in</strong>g, or m<strong>in</strong><strong>in</strong>g, knowledge (<strong>in</strong>terest<strong>in</strong>g patterns) from large<br />

amounts <strong>of</strong> data [64]. In order to extract these data patterns, di erent <strong>in</strong>telligent methods<br />

have been developed <strong>in</strong> the past. One important method, which is also the closest to<br />

this thesis, is cluster<strong>in</strong>g. Cluster<strong>in</strong>g takes the data set as <strong>in</strong>put and groups the objects<br />

accord<strong>in</strong>g to their similarity <strong>in</strong>to di erent groups, called clusters. Therefore, the similarity<br />

between objects <strong>of</strong> one group is maximized, and between objects <strong>of</strong> di erent groups the<br />

similarity is m<strong>in</strong>imized. That means that objects <strong>of</strong> one group are very similar to each<br />

other, while dissimilar to objects <strong>of</strong> other groups. The similarity is calculated on the full<br />

attribute space, us<strong>in</strong>g di erent distance functions, like Euclidian, M<strong>in</strong>kowski, or City-block<br />

distances.<br />

State <strong>of</strong> the Art Cluster<strong>in</strong>g<br />

There are di erent criteria to classify the exist<strong>in</strong>g cluster<strong>in</strong>g algorithms. We would like to<br />

di erentiate them roughly <strong>in</strong>to hierarchical cluster<strong>in</strong>g algorithms, and partition<strong>in</strong>g cluster<strong>in</strong>g<br />

algorithms and enumerate some <strong>of</strong> the most known representatives. For further details<br />

please refer to the follow<strong>in</strong>g surveys [21, 155] or the orig<strong>in</strong>al papers <strong>of</strong> the algorithms.<br />

Hierarchical cluster<strong>in</strong>g organizes objects <strong>in</strong>to groups that are at the same time grouped<br />

<strong>in</strong>to groups. This is done consecutively build<strong>in</strong>g up a hierarchy <strong>of</strong> clusters. Representatives<br />

for this category, which we will also use later <strong>in</strong> Section 5.2, are hierarchical cluster<strong>in</strong>gs<br />

with di erent l<strong>in</strong>kage methods, like s<strong>in</strong>gle-l<strong>in</strong>kage, complete-l<strong>in</strong>kage, average-l<strong>in</strong>kage, or<br />

m<strong>in</strong>imum variance [144]. Try<strong>in</strong>g to develop algorithms for handl<strong>in</strong>g large-scale data, <strong>in</strong> recent<br />

years, new hierarchical algorithms appeared that improve the cluster<strong>in</strong>g performance.<br />

Examples <strong>in</strong>clude BIRCH [162] an algorithm designed to use a height-balanced tree to<br />

store summaries <strong>of</strong> the orig<strong>in</strong>al data that can achieve a l<strong>in</strong>ear computational complexity.<br />

The partition<strong>in</strong>g methods, divide all the data objects <strong>in</strong>to a fixed number <strong>of</strong> groups,<br />

without any hierarchical structure. Major representatives for this category are algorithms<br />

like the density based DBSCAN [50] and OPTICS [10], or relocation methods like k-<br />

medoids and k-means methods [56].


18 Chapter 2. <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> Analysis<br />

Cluster<strong>in</strong>g <strong>in</strong> <strong>High</strong> Dimensions<br />

For high-dimensional data sets, the challenge is to design e ective and e cient cluster<strong>in</strong>g<br />

algorithms that can cope with the high number <strong>of</strong> objects, dimensions, and the noise level<br />

<strong>of</strong> this k<strong>in</strong>d <strong>of</strong> data. Therefore a number <strong>of</strong> di erent algorithms were proposed to cluster<br />

this type <strong>of</strong> data.<br />

CURE [57] is a hierarchical cluster<strong>in</strong>g algorithm that can explore arbitrary cluster<br />

shapes and utilizes a random sample strategy to reduce computational complexity.<br />

Density-based cluster<strong>in</strong>g (DENCLUE) [70] is a well known approach for density based<br />

cluster<strong>in</strong>g for high-dimensional data. To make computations more feasible, the data is <strong>in</strong>dexed<br />

us<strong>in</strong>g a B + -tree. The algorithm is built on the idea that the <strong>in</strong>fluence <strong>of</strong> each data<br />

po<strong>in</strong>t on his neighborhood can be modeled us<strong>in</strong>g a so called <strong>in</strong>fluence function. The overall<br />

density <strong>of</strong> the data space can be modeled analytically as the sum <strong>of</strong> the <strong>in</strong>fluence function<br />

applied to all data po<strong>in</strong>ts. Clusters are then determ<strong>in</strong>ed by identify<strong>in</strong>g local maxima <strong>of</strong><br />

the overall density function.<br />

Although, these algorithms can deal with large-scale data, they are sometimes not<br />

su cient to analyze high-dimensional data. Due to the previously described problem, the<br />

curse <strong>of</strong> dimensionality, namely algorithms rely<strong>in</strong>g on distance functions, can no longer<br />

perform well <strong>in</strong> high-dimensional spaces. To overcome this problem, dimension reduction<br />

(see Section 2.1.2) is used <strong>in</strong> cluster analysis to reduce the dimensionality <strong>of</strong> the data<br />

sets. However, dimensionality reduction methods cause some loss <strong>of</strong> <strong>in</strong>formation, and<br />

may destroy the <strong>in</strong>terpretability <strong>of</strong> the results, even distort the real clusters. Moreover,<br />

such techniques do not actually remove any <strong>of</strong> the orig<strong>in</strong>al attributes from the analysis.<br />

This is problematic when there are a large number <strong>of</strong> irrelevant attributes. The irrelevant<br />

<strong>in</strong>formation may mask the real clusters, even after transformation. Another way to tackle<br />

this problem is to use subspace cluster<strong>in</strong>g algorithms, that search for data clusters <strong>in</strong><br />

di erent subsets <strong>of</strong> the same data set. Di erent subspaces may conta<strong>in</strong> di erent mean<strong>in</strong>gful<br />

clusters. The problem here is how to identify such subspace clusters e ciently.<br />

A large number <strong>of</strong> algorithms for subspace cluster<strong>in</strong>g have been developed <strong>in</strong> the past<br />

and we picked some representatives to be briefly described next. CLIQUE (CLuster<strong>in</strong>g<br />

In QUEst) [6] employs a bottom-up approach and searches for dense rectangular cells <strong>in</strong><br />

all subspaces with high density <strong>of</strong> po<strong>in</strong>ts. The clusters are generated by merg<strong>in</strong>g these<br />

rectangles. OptiGrid [71] is designed to obta<strong>in</strong> an optimal grid partition<strong>in</strong>g us<strong>in</strong>g cutt<strong>in</strong>g<br />

hyperplanes. It uses density estimations similar to DENCLUE to f<strong>in</strong>d the plane that<br />

separates two significantly dense half spaces, and goes trough a po<strong>in</strong>t <strong>of</strong> m<strong>in</strong>imal density,<br />

us<strong>in</strong>g a set <strong>of</strong> l<strong>in</strong>ear projections. In Section 5.1 we use the k-medoid based algorithm<br />

PROCLUS (PROjected CLUster<strong>in</strong>g) [4], one <strong>of</strong> the most robust algorithms for subspace<br />

cluster<strong>in</strong>g. It def<strong>in</strong>es a cluster as a densely distributed subset <strong>of</strong> data objects <strong>in</strong> a subspace.<br />

ORCLUS (arbitrarily ORiented projected CLUster generation) [5] uses a similar approach<br />

but uses non-axes parallel subspaces to f<strong>in</strong>d the clusters. Further elaborations on the<br />

problem <strong>of</strong> subspace cluster<strong>in</strong>g are described <strong>in</strong> Section 2.4.2 and Section 5.1.2.<br />

Other <strong>Data</strong> M<strong>in</strong><strong>in</strong>g Techniques<br />

In addition to cluster<strong>in</strong>g techniques, many other techniques have been developed dur<strong>in</strong>g<br />

the past.<br />

Ma<strong>in</strong>ly they are m<strong>in</strong><strong>in</strong>g frequent patterns, associations, correlations, or outliers


2.3.2 Quality Measures for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>izations 19<br />

<strong>in</strong> data. A frequent pattern is a set <strong>of</strong> items that occur frequently <strong>in</strong> a data set. This<br />

term was first proposed by [7] <strong>in</strong> the context <strong>of</strong> frequent itemsets and association rule<br />

m<strong>in</strong><strong>in</strong>g. By m<strong>in</strong><strong>in</strong>g frequent patterns, the goal is to identify regularities <strong>in</strong> the data, like<br />

products purchased <strong>of</strong>ten together <strong>in</strong> basket data analysis. Frequent patterns form the<br />

foundation for many essential data m<strong>in</strong><strong>in</strong>g tasks, such as association analysis, correlation<br />

analysis, classification (associative classification) and cluster analysis (frequent patternbased<br />

cluster<strong>in</strong>g). “Association analysis is the discovery <strong>of</strong> association rules show<strong>in</strong>g<br />

attribute-value conditions that occur frequently together <strong>in</strong> a dataset” [63]. As mentioned<br />

<strong>in</strong> Section 1.1 support and confidence can characterize the quality <strong>of</strong> association rules. The<br />

rules are generated based on the identified frequent itemset <strong>in</strong> the data. One problem,<br />

however, is that for low support and confidence levels the result<strong>in</strong>g set <strong>of</strong> association rules<br />

is very high. Us<strong>in</strong>g higher levels <strong>of</strong> support and confidence can remove useful rules, so<br />

a mechanism is needed to detect the right confidence level. <strong>Visual</strong>ization can help to<br />

overcome this issue, and supports the user <strong>in</strong> identify<strong>in</strong>g the right rules. In Section 3.1 we<br />

will present image based quality measures to identify correlation among data attributes<br />

and attributes form<strong>in</strong>g strong groups (clusters) <strong>in</strong> the data.<br />

In classification analysis the data is <strong>of</strong>ten classified (labeled), and a model is derived<br />

to dist<strong>in</strong>guish these data classes. This model is tra<strong>in</strong>ed on a subset <strong>of</strong> the data, called<br />

tra<strong>in</strong><strong>in</strong>g set. Another subset <strong>of</strong> the data is used to validate the rules, which is the so<br />

called test set. The model can be represented by classification rules, decision trees, neural<br />

networks or mathematical formulas and is used to classify new data. However, <strong>of</strong>ten users<br />

need to predict miss<strong>in</strong>g values <strong>in</strong> the data, rather than class labels. When the predicted<br />

values are numerical the process is named prediction. Our work on quality metrics with<br />

labeled data (see Section 3.1.3 and Section 3.1.5), can be seen as a complementary way<br />

to identify the attributes that can best dist<strong>in</strong>guish the classes <strong>in</strong> the data relevant for<br />

build<strong>in</strong>g the classification model. Classification is also referred to as supervised learn<strong>in</strong>g,<br />

because the tra<strong>in</strong><strong>in</strong>g set is used to teach how to classify new data. Cluster<strong>in</strong>g is referred<br />

as unsupervised learn<strong>in</strong>g, s<strong>in</strong>ce there are no class labels for tra<strong>in</strong><strong>in</strong>g, and clusters or classes<br />

are established to group the data elements.<br />

In some applications, as <strong>in</strong> fraud detection, rare events can be <strong>of</strong> <strong>in</strong>terest. The analysis<br />

<strong>of</strong> outlier data is referred to as outlier m<strong>in</strong><strong>in</strong>g. Outliers can be detected for example by<br />

us<strong>in</strong>g statistical tests, but also by some quality metrics. Examples for quality metrics for<br />

outliers are marked <strong>in</strong> Table 4.2 later <strong>in</strong> Chapter 4.<br />

2.3.2 Quality Measures for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>izations<br />

General Measures<br />

Quality metrics (or measures) <strong>in</strong> visualization have a long history. While <strong>in</strong> our work we<br />

focus only on their specific use <strong>in</strong> high-dimensional data analysis, they have a broader<br />

scope than we can describe here. Early attempts to calculate quality metrics can be<br />

traced back to the work <strong>of</strong> Tufte [139], where he proposed metrics such as the data to<br />

<strong>in</strong>k ratio and the lie factor, which respectively optimize the use <strong>of</strong> the visualization space<br />

and reduce the distortions that visualization may <strong>in</strong>troduce. Later <strong>in</strong> 1997 Richard Brath<br />

proposed a rich set <strong>of</strong> metrics to characterize the quality <strong>of</strong> bus<strong>in</strong>ess visualizations [32]<br />

and, around the same period Miller et al. advocated the use <strong>of</strong> visualization metrics as a<br />

way to compare visualizations [100]. The graph draw<strong>in</strong>g community developed its own set


20 Chapter 2. <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> Analysis<br />

<strong>of</strong> metrics, most notable aesthetic metrics such as those found <strong>in</strong> the foundational work <strong>of</strong><br />

Ware et al. on cognitive measurements <strong>of</strong> graph aesthetics [149]. Later, the word quality<br />

metrics assumed a more specific mean<strong>in</strong>g; <strong>in</strong> particular it appeared <strong>in</strong> the context <strong>of</strong> a<br />

number <strong>of</strong> papers related to clutter reduction and scalability [24, 26, 80, 82, 112].<br />

For the sake <strong>of</strong> completeness, it is worth mention<strong>in</strong>g that the word metric is also used <strong>in</strong><br />

the context <strong>of</strong> <strong>in</strong>formation visualization user studies as a way to <strong>in</strong>dicate how the elements<br />

<strong>of</strong> <strong>in</strong>terest are measured (e.g., [108, 113]).<br />

Scatterplot Measures<br />

The idea <strong>of</strong> us<strong>in</strong>g measures calculated over the data or over the visualization space to select<br />

<strong>in</strong>terest<strong>in</strong>g projections, has been proposed already <strong>in</strong> some foundational works like Projection<br />

Pursuit [54, 74] and Grand Tour [13]. Projection Pursuit searches for low-dimensional<br />

(one or two-dimensional) projections that expose <strong>in</strong>terest<strong>in</strong>g structures, us<strong>in</strong>g a “Projection<br />

Pursuit Index” that considers <strong>in</strong>ter-po<strong>in</strong>t distances and their variation. Grand Tour<br />

adopts a more <strong>in</strong>teractive approach by allow<strong>in</strong>g the user to easily navigate through many<br />

view<strong>in</strong>g directions, creat<strong>in</strong>g a movie like presentation <strong>of</strong> the whole orig<strong>in</strong>al space.<br />

More recently, several works appeared <strong>in</strong> the visualization community that propose different<br />

forms <strong>of</strong> quality measures. Examples are, graph-theoretic measures for scatterplot<br />

matrices [151], measures over pixel-based visualizations [120], measures based on clutter<br />

reduction for visualizations [25, 112], and composite measures to f<strong>in</strong>d several data structures<br />

outliers, correlations, and sub-clusters [82]. We present a systematization <strong>of</strong> works<br />

on quality measures <strong>in</strong> Chapter 4 and propose a quality measures pipel<strong>in</strong>e to describe the<br />

process <strong>of</strong> these measures. Additionally, several factors are derived to characterize the<br />

measures <strong>in</strong> a common language, and implications on further research are raised. At this<br />

po<strong>in</strong>t, it seems important to provide a short description <strong>of</strong> the first two categories, and<br />

postpone the details for the others for Chapter 4.<br />

First, the scagnostics measures [140] have an important role s<strong>in</strong>ce they are a major<br />

<strong>in</strong>spiration source for our work. As an alternative to Projection Pursuit, the scagnostics<br />

method [140] was proposed to analyze structures <strong>in</strong> scatterplots. S<strong>in</strong>ce they never<br />

published their specifics <strong>of</strong> the method, Wilk<strong>in</strong>son et al. [151] take their opportunity to<br />

presented this scagnostics ideas and apply them to high-dimensional data. They describe<br />

detailed graph-theoretic measures for scatterplots. This means that graphs and their properties<br />

(like convex hull, alpha hull, M<strong>in</strong>imum Spann<strong>in</strong>g Tree (MST)) are used as bases for<br />

comput<strong>in</strong>g scagnostics measures. Their scagnostics <strong>in</strong>dices assess five aspects <strong>of</strong> the po<strong>in</strong>t<br />

distribution: outliers, shape, trend, density and coherence propos<strong>in</strong>g n<strong>in</strong>e characteristic<br />

<strong>in</strong>dices for the distribution <strong>of</strong> po<strong>in</strong>ts <strong>in</strong> scatterplots: outly<strong>in</strong>g, skewed, clumpy, convex,<br />

sk<strong>in</strong>ny, striated, str<strong>in</strong>gy, straight, and monotonic. Orig<strong>in</strong>ally these <strong>in</strong>dices are used to<br />

form a SPLOM <strong>of</strong> scagnostics, where each axes is a scagnostics measure. Here each data<br />

scatterplot is represented by a po<strong>in</strong>t accord<strong>in</strong>g to his measures. The scagnostics SPLOM<br />

was used to spot unusual scatterplots regard<strong>in</strong>g their data distribution (see Figure 2.2A).<br />

These <strong>in</strong>dices were also used as rank<strong>in</strong>g functions <strong>in</strong> data SPLOMs support<strong>in</strong>g di erent<br />

analysis tasks [152] as shown <strong>in</strong> Figure 2.2B.<br />

Second, the approach most similar to ours presented <strong>in</strong> Chapter 3 is Pixnostics, proposed<br />

by Schneidew<strong>in</strong>d et al. [120]. They also use image-analysis techniques to rank the<br />

di erent lower-dimensional views <strong>of</strong> the data set and present only the best ranked to the<br />

user. The method does not only provide valuable lower-dimensional projections to the


2.3.2 Quality Measures for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>izations 21<br />

A<br />

B<br />

Figure 2.2: (A) Scagnostics SPLOM hav<strong>in</strong>g as axes scagnostics measures and show<strong>in</strong>g each data<br />

scatterplot as a po<strong>in</strong>t <strong>in</strong> the measures scatterplot [152]. (B) Scagnostics <strong>in</strong>dices used as quality<br />

measures to rank data scatterplots [152].<br />

user, but also optimized parameter sett<strong>in</strong>gs for pixel-level visualizations. However, while<br />

their approach concentrates on pixel-level visualizations, we focus on scatterplots and<br />

parallel coord<strong>in</strong>ates.<br />

We contribute to the field <strong>of</strong> quality metrics by propos<strong>in</strong>g image-based and data-based<br />

measures for classified and non-classified data <strong>in</strong> scatterplots and parallel coord<strong>in</strong>ates<br />

<strong>in</strong> Section 3.1. In Section 3.1.2 we present an image-based measure for non-classified<br />

scatterplots <strong>in</strong> order to quantify the structures and correlations between the respective<br />

dimensions. Our measure could for example be used as an additional <strong>in</strong>dex <strong>in</strong> a scagnostics<br />

matrix.<br />

Parallel to our work from Section 3.1 published <strong>in</strong> [133], Sips et al. [129] developed a<br />

class consistency visualization algorithm. Similar to our Histogram Density measures, the<br />

class consistency method proposes measures to rank 2D scatterplots. It filters the highest<br />

ranked scatterplots and presents them <strong>in</strong> an ord<strong>in</strong>ary scatterplot matrix.<br />

Parallel Coord<strong>in</strong>ates Measures<br />

Measures were not only used to rank a high number <strong>of</strong> visualizations regard<strong>in</strong>g their<br />

structures, but also with the purpose to optimize visualizations for high-dimensional data<br />

representation. One major factor handled by these measures is optimiz<strong>in</strong>g the order<strong>in</strong>g <strong>of</strong><br />

elements (like axes or data po<strong>in</strong>ts) <strong>in</strong> the visualization. Aim<strong>in</strong>g at dimension reorder<strong>in</strong>g,<br />

Ankerst et al. [9] presented a method based on similarity cluster<strong>in</strong>g <strong>of</strong> dimensions, plac<strong>in</strong>g<br />

similar dimensions close to each other. Yang [159] developed a method to generate <strong>in</strong>terest<strong>in</strong>g<br />

projections also based on similarity between the dimensions. Similar dimensions<br />

are clustered and used to create a lower-dimensional projection <strong>of</strong> the data.<br />

As an alternative to the methods for dimension reorder<strong>in</strong>g for parallel coord<strong>in</strong>ates, we<br />

propose a method based on the structure presented on the low-dimensional embedd<strong>in</strong>gs<br />

<strong>of</strong> the data set. Three di erent k<strong>in</strong>ds <strong>of</strong> measures to rank these embedd<strong>in</strong>gs are presented


22 Chapter 2. <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> Analysis<br />

<strong>in</strong> Section 3.1.4 for class and non-class based visualizations.<br />

Evaluat<strong>in</strong>g Measures<br />

A common denom<strong>in</strong>ator <strong>of</strong> all these works is the total absence <strong>of</strong> user studies able to <strong>in</strong>spect<br />

the relationship between human-detected and mach<strong>in</strong>e-detected data patterns. While it<br />

is certa<strong>in</strong>ly clear how these measures can help users deal with large data spaces, there<br />

are a number <strong>of</strong> open issues related to the human perception <strong>of</strong> the structures captured<br />

automatically by the suggested algorithms. In Section 3.2 we focus on the question <strong>of</strong><br />

whether there is a correlation between what the human eye perceives and what the mach<strong>in</strong>e<br />

detects.<br />

Despite the lack <strong>of</strong> user studies specifically focused on the issues discussed above,<br />

there are a number <strong>of</strong> user studies focused on the detection <strong>of</strong> visual patterns which are<br />

worth mention<strong>in</strong>g here. A large literature exists on the detection <strong>of</strong> pre-attentive features,<br />

notably the work <strong>of</strong> Healey focused on visualization [67] and <strong>of</strong> Gestalt Laws [148], which<br />

are <strong>of</strong>ten taken as the basis for the detection <strong>of</strong> patterns from visual representations. Some<br />

more specific works focused on visualization are: [25] and [68] based on the perception <strong>of</strong><br />

density <strong>in</strong> pixel-based scatterplots and <strong>in</strong> visualizations based on “pexels” (perceptual<br />

texture elements) respectively, [81] on the study <strong>of</strong> thresholds for the detection <strong>of</strong> patterns<br />

<strong>in</strong> parallel coord<strong>in</strong>ates, and [65] on the correlation between the visualization performance<br />

an similarity with natural images. The study presented <strong>in</strong> [118] is also relevant and very<br />

similar to ours presented <strong>in</strong> Section 3.2 <strong>in</strong> terms <strong>of</strong> experiment design. Users ranked a<br />

series <strong>of</strong> images <strong>in</strong> terms <strong>of</strong> their perception <strong>of</strong> the degree <strong>of</strong> clutter exposed by the image,<br />

and the study correlated the degree <strong>of</strong> correlation between the user rank and the rank<br />

given by the suggested measure named feature congestion.<br />

2.4 <strong>Visual</strong> <strong>Analytics</strong> for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

2.4.1 <strong>Visual</strong> Interactive Systems for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> Analysis<br />

As presented <strong>in</strong> the previous chapter, comb<strong>in</strong><strong>in</strong>g data visualization with <strong>in</strong>teractive and<br />

automated components speeds up the analysis <strong>of</strong> high-dimensional data sets. As a consequence,<br />

many <strong>in</strong>teractive systems have been developed recently to support the user <strong>in</strong><br />

analyz<strong>in</strong>g high-dimensional data sets. S<strong>in</strong>ce there is a large number <strong>of</strong> <strong>in</strong>teractive systems<br />

<strong>in</strong> the literature, present<strong>in</strong>g a full summary would overload this section. Hence <strong>in</strong><br />

the follow<strong>in</strong>g paragraphs, we identify only the four ma<strong>in</strong> doma<strong>in</strong>s related to this thesis<br />

and enumerate a selection <strong>of</strong> visual <strong>in</strong>teractive systems for visual feature selection, visual<br />

cluster<strong>in</strong>g, visual classification, and dimension reorder<strong>in</strong>g.<br />

<strong>Visual</strong> Feature Selection<br />

Reduc<strong>in</strong>g high-dimensional data to a lower subset <strong>of</strong> features that express the data characteristics,<br />

is a crucial task <strong>in</strong> high-dimensional data analysis. <strong>Data</strong> features are therefore<br />

compared, for example comput<strong>in</strong>g correlations, data variation, etc. to identify their impor-


2.4.1 <strong>Visual</strong> Interactive Systems for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> Analysis 23<br />

tance <strong>in</strong> express<strong>in</strong>g the data characteristics. S<strong>in</strong>ce fully automated feature selection methods<br />

<strong>of</strong>ten are <strong>in</strong>feasible, due to the data complexity and dimensionality, visual-<strong>in</strong>teractive<br />

systems have been developed to deal with this problem. We illustrate three examples for<br />

such systems <strong>in</strong> Figure 2.3 with a short description, and po<strong>in</strong>t to more literature <strong>in</strong> this<br />

field <strong>in</strong> the next paragraphs.<br />

A<br />

B<br />

C<br />

Figure 2.3: <strong>Visual</strong> <strong>in</strong>teractive feature selection systems. A: Rank-by-Feature Framework presented<br />

<strong>in</strong> [125]. B: Feature selection supported by quality measures [82]. C: DimStiller for feature selection<br />

[76].<br />

In exist<strong>in</strong>g works <strong>in</strong>volv<strong>in</strong>g visual-<strong>in</strong>teractive selections or comparison <strong>of</strong> features, the<br />

Rank-by-Feature Framework [125] (see Figure 2.3A) provides a sorted visual overview<br />

<strong>of</strong> the correlation among pairs <strong>of</strong> features. In [82], the selection <strong>of</strong> <strong>in</strong>put features was<br />

supported by a measure <strong>of</strong> the <strong>in</strong>terest<strong>in</strong>gness <strong>of</strong> the visual view provided by candidate<br />

features (see Figure 2.3B). An <strong>in</strong>teractive dimensionality reduction workflow was presented<br />

<strong>in</strong> [76], rely<strong>in</strong>g on visual approaches to guide users <strong>in</strong> select<strong>in</strong>g features (see Figure 2.3C).<br />

In [33] and [34], <strong>in</strong>teractive visual comparison was proposed to relate data described<br />

<strong>in</strong> di erent given feature spaces based on 2D mapp<strong>in</strong>gs and tree structures extracted from<br />

the di erent data spaces. Furthermore, <strong>in</strong> [93] a visual design based on network and heat<br />

map visualization was proposed to relate cluster<strong>in</strong>gs <strong>in</strong> di erent subsets <strong>of</strong> dimensions.<br />

In [159], dimensions are hierarchically clustered based on a simple value-oriented similarity<br />

measure. Based on this structure, user navigation can take place to identify <strong>in</strong>terest<strong>in</strong>g<br />

subspaces. In a recent work [161], the output <strong>of</strong> this simple search method was visualized<br />

by tree- and matrix-based views, where each dimension comb<strong>in</strong>ation was represented by<br />

a s<strong>in</strong>gle MDS plot.<br />

In summary, many <strong>of</strong> these methods are applicable to compare data regard<strong>in</strong>g di erent


24 Chapter 2. <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> Analysis<br />

criteria. However, most <strong>of</strong> them assume the feature selection to be performed globally and<br />

do not take the subspace search problem directly <strong>in</strong>to account. One focus <strong>of</strong> this thesis<br />

is to show that local selection <strong>of</strong> features is essential when analyz<strong>in</strong>g patterns <strong>of</strong> highdimensional<br />

data. The analysis is then performed <strong>in</strong> di erent subspaces <strong>of</strong> the data and<br />

related work on visual analysis tools that deal especially with subspaces will be presented<br />

<strong>in</strong> the next subsection.<br />

<strong>Visual</strong> Cluster<strong>in</strong>g<br />

Identification and relation <strong>of</strong> groups <strong>of</strong> data is a key explorative data analysis task. Often,<br />

user <strong>in</strong>teraction is needed to identify and revise the number and characteristics <strong>of</strong> data<br />

clusters found by automatic search methods. To this end, visual-<strong>in</strong>teractive approaches are<br />

useful. Although, many methods have been proposed, we can only highlight few <strong>of</strong> them<br />

<strong>in</strong> an exemplary manner. In [124], <strong>in</strong>teractive exploration <strong>of</strong> hierarchically clustered data<br />

along a dendrogram data structure is proposed to help users f<strong>in</strong>d the right level <strong>of</strong> clusters<br />

for their tasks (see Figure 2.4A). In [159], the parallel coord<strong>in</strong>ates approach serves as a<br />

basic display to show data cluster<strong>in</strong>g results allow<strong>in</strong>g to compare clusters along their highdimensional<br />

data space. Also, 2D projections, possibly <strong>in</strong> conjunction with glyph-based<br />

representation <strong>of</strong> clusters, are widely employed, a recent example is [35] (see Figure 2.4B).<br />

A<br />

B<br />

Figure 2.4: Interactive visual analysis systems for cluster<strong>in</strong>g <strong>in</strong> high-dimensional visualization. A:<br />

Interactive exploration <strong>of</strong> hierarchically clustered data along a dendrogram [124]. B: (a) Group<strong>in</strong>g<br />

icons to form clusters based on visual similarity. (b) User-def<strong>in</strong>ed group<strong>in</strong>g <strong>of</strong> icons [35].<br />

These approaches to visualization and cluster<strong>in</strong>g <strong>in</strong> high-dimensional data spaces all<br />

have <strong>in</strong> common that they are based on a given full (or reduced) dimensionality <strong>of</strong> the<br />

<strong>in</strong>put data set. Thereby, they show only a s<strong>in</strong>gular perspective <strong>of</strong> the usually multi-faceted<br />

high-dimensional data, which might not be the most relevant one. As we will show <strong>in</strong> this<br />

thesis, it is also useful to explore high-dimensional data for patterns <strong>in</strong> di erent subsets<br />

<strong>of</strong> its full high-dimensional <strong>in</strong>put space to <strong>in</strong>crease potential data <strong>in</strong>sight.<br />

<strong>Visual</strong> Classification<br />

Classification is us<strong>in</strong>g a model that dist<strong>in</strong>guishes data classes, and is created based on a<br />

labeled tra<strong>in</strong><strong>in</strong>g data set, to label new data. The classification model can be represented<br />

by decision trees. With pure automatic approaches, problems like over-fitt<strong>in</strong>g the model<br />

or tree prun<strong>in</strong>g, are di cult to tackle [86]. Us<strong>in</strong>g visualization can help to overcome


2.4.1 <strong>Visual</strong> Interactive Systems for <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> Analysis 25<br />

these problems, for example by <strong>in</strong>corporat<strong>in</strong>g the user <strong>in</strong> the tree construct<strong>in</strong>g process.<br />

Ankerst et al. present <strong>in</strong> [11] a user-centered approach that comb<strong>in</strong>es the doma<strong>in</strong> knowledge<br />

<strong>of</strong> users, with computation strengths <strong>of</strong> the computer to create rules that satisfy the<br />

user’s constra<strong>in</strong>s and generate visualizations <strong>of</strong> these patterns. Additionally, the pattern<br />

recognition <strong>of</strong> the human supported by adequate data visualizations can be used to <strong>in</strong>crease<br />

the e ectivity <strong>of</strong> decision trees. In Figure 2.5A the visual classification shows the decision<br />

tree, visualiz<strong>in</strong>g each attribute-value by a colored pixel arranged <strong>in</strong> bars. Each attribute<br />

bar is sorted, and the purest value distribution is selected as split attribute <strong>of</strong> the decision<br />

tree. This procedure is repeated until all leaves conta<strong>in</strong> pure classes. The split is marked<br />

with a black vertical l<strong>in</strong>e, and the leaves are underl<strong>in</strong>ed with a black l<strong>in</strong>e. Compared to<br />

standard visualizations <strong>of</strong> decision trees, additional <strong>in</strong>formation is encoded <strong>in</strong> a compact<br />

way, namely: size <strong>of</strong> the nodes (number <strong>of</strong> tra<strong>in</strong><strong>in</strong>g records for the correspond<strong>in</strong>g node),<br />

quality <strong>of</strong> the split (visible <strong>in</strong> the purity <strong>of</strong> the result<strong>in</strong>g partitions), class distribution<br />

(frequency and location <strong>of</strong> the tra<strong>in</strong><strong>in</strong>g <strong>in</strong>stances <strong>of</strong> all classes).<br />

A<br />

B<br />

Figure 2.5: Interactive visual analysis systems for classification <strong>in</strong> high-dimensional data. A: <strong>Visual</strong><br />

classification from [11] illustrates the decision tree for DNA tra<strong>in</strong><strong>in</strong>g data hav<strong>in</strong>g 19 attributes,<br />

visualiz<strong>in</strong>g each attribute-value by a colored pixel arranged <strong>in</strong> bars. B: Decision tree construction<br />

system [142], represent<strong>in</strong>g the tree <strong>in</strong> a node-l<strong>in</strong>k diagram, display<strong>in</strong>g split po<strong>in</strong>ts on the l<strong>in</strong>ks and<br />

the split attributes on the node.<br />

Figure 2.5B shows a recent example from [142] <strong>of</strong> an <strong>in</strong>teractive system for decision<br />

tree construction. Here the authors have the same goal, e.g. to br<strong>in</strong>g the doma<strong>in</strong> specific


26 Chapter 2. <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> Analysis<br />

knowledge <strong>of</strong> the user <strong>in</strong>to the construction <strong>of</strong> the tree. A tight <strong>in</strong>tegration <strong>of</strong> visualization,<br />

<strong>in</strong>teraction and automation supports doma<strong>in</strong> experts <strong>in</strong> grow<strong>in</strong>g, prun<strong>in</strong>g, optimiz<strong>in</strong>g and<br />

analyz<strong>in</strong>g decision trees [142]. Compared to the previous example, here the tree representation<br />

is a more classic one s<strong>in</strong>ce the tree is represented by node-l<strong>in</strong>k diagrams. Internal<br />

and leaf nodes are represented by node glyphs, and each parent-child relationship is represented<br />

by a l<strong>in</strong>k from patent to child node. The advantage <strong>of</strong> this visual representation<br />

is that it allows for an easier count<strong>in</strong>g <strong>of</strong> the number <strong>of</strong> leafs while at the same time it<br />

shows which nodes are on the same level [142]. The ma<strong>in</strong> view displays split po<strong>in</strong>ts on<br />

the l<strong>in</strong>ks, us<strong>in</strong>g the width to encode the number <strong>of</strong> items and color the class membership<br />

<strong>of</strong> the items. The split attribute is shown on the nodes <strong>of</strong> the tree. These are visualized<br />

as rectangles conta<strong>in</strong><strong>in</strong>g relevant <strong>in</strong>formation like split attribute, class distribution, split<br />

po<strong>in</strong>ts, and class histogram. Additional l<strong>in</strong>ked views support the user <strong>in</strong> construct<strong>in</strong>g and<br />

optimiz<strong>in</strong>g the decision tree.<br />

Dimension Reorder<strong>in</strong>g<br />

As already discussed, dimension order<strong>in</strong>g is a relevant component <strong>of</strong> high-dimensional data<br />

visualization and exploration, as di erent order<strong>in</strong>gs can expose di erent patterns. Ankerst<br />

et al. <strong>in</strong>troduced the problem <strong>of</strong> dimensional order<strong>in</strong>g as an optimization problem <strong>in</strong> [9]<br />

and demonstrated that it is a NP-complete problem that must thus be solved through<br />

heuristics. Peng et al. <strong>in</strong> [112] applies dimension reorder<strong>in</strong>g on a series <strong>of</strong> n-dimensional<br />

visualization techniques to reduce clutter. Matrix based visualizations, start<strong>in</strong>g from the<br />

sem<strong>in</strong>al work <strong>of</strong> Bert<strong>in</strong> [22] have also been heavily researched <strong>in</strong> terms <strong>of</strong> the patterns<br />

they can expose through reorder<strong>in</strong>g. In Section 5.1 we use dimensional reorder<strong>in</strong>g and<br />

cluster reorder<strong>in</strong>g to make relationships among dimensions and clusters apparent <strong>in</strong> our<br />

ClustNails system.<br />

In [59] Guo also addresses ways to <strong>in</strong>tegrate visual and computational measures for<br />

pick<strong>in</strong>g and order<strong>in</strong>g variables for display on parallel coord<strong>in</strong>ates. He describes a humancentered<br />

exploration environment, which <strong>in</strong>corporates a coord<strong>in</strong>ated suite <strong>of</strong> computational<br />

and visualization methods to explore high-dimensional data and f<strong>in</strong>d patterns <strong>in</strong><br />

this spaces. The ma<strong>in</strong> di erence between this approach and our approach presented <strong>in</strong><br />

Section 3.1.4 is that Guo searches for locally def<strong>in</strong>ed patterns <strong>in</strong> subspaces, while our work<br />

concentrates on f<strong>in</strong>d<strong>in</strong>g global patterns <strong>in</strong> a 2-dimensional projection <strong>of</strong> the data set.<br />

To summarize, order<strong>in</strong>g plays and important role <strong>in</strong> di erent areas: like order<strong>in</strong>g axes<br />

<strong>of</strong> parallel coord<strong>in</strong>ates, order<strong>in</strong>g as a way to reduce clutter <strong>in</strong> scatterplot matrices, order<strong>in</strong>g<br />

to support similarity search <strong>of</strong> glyph-based visualizations or pixel-based displays.<br />

2.4.2 Subspace Cluster Analysis and <strong>Visual</strong>ization<br />

As traditional full-space cluster<strong>in</strong>g is <strong>of</strong>ten not e ective for reveal<strong>in</strong>g a mean<strong>in</strong>gful cluster<strong>in</strong>g<br />

structure for high-dimensional data (see Section 2.3.1), <strong>in</strong> the emerg<strong>in</strong>g research<br />

field <strong>of</strong> subspace cluster<strong>in</strong>g [90] several approaches aim at discover<strong>in</strong>g mean<strong>in</strong>gful clusters<br />

<strong>in</strong> locally relevant subspaces. The problem <strong>of</strong> f<strong>in</strong>d<strong>in</strong>g clusters <strong>in</strong> high-dimensional data<br />

can be divided <strong>in</strong>to two sub-problems: subspace search and cluster search. The first one<br />

aims at f<strong>in</strong>d<strong>in</strong>g the subspaces where clusters exist, the second one at f<strong>in</strong>d<strong>in</strong>g the actual<br />

clusters. The large majority <strong>of</strong> exist<strong>in</strong>g algorithms considers the two problems simultane-


2.4.2 Subspace Cluster Analysis and <strong>Visual</strong>ization 27<br />

ously and produces a set <strong>of</strong> clusters, where each cluster is typically represented by a set <strong>of</strong><br />

clustered objects (rows <strong>of</strong> the orig<strong>in</strong>al data table) and the subset <strong>of</strong> relevant dimensions<br />

(columns <strong>of</strong> the orig<strong>in</strong>al data table). Several methods have been proposed that di er to<br />

the cluster<strong>in</strong>g search strategy and constra<strong>in</strong>ts with respect to the overlap <strong>of</strong> clusters and<br />

dimensions [38, 84, 107]. Kriegel et al. [90] categorize these algorithms <strong>in</strong>to four classes:<br />

(1) projected cluster<strong>in</strong>g; (2) “s<strong>of</strong>t” projected cluster<strong>in</strong>g; (3) subspace cluster<strong>in</strong>g; (4) hybrid.<br />

The first two generate clusters that do not overlap, that is, every object belongs to<br />

only one cluster. Subspace cluster<strong>in</strong>g and hybrid may generate clusters that do overlap.<br />

While extensive research has been carried out <strong>in</strong> design<strong>in</strong>g subspace cluster<strong>in</strong>g algorithms,<br />

surpris<strong>in</strong>gly little attention has been paid to develop visualization support for<br />

subspace cluster<strong>in</strong>g. To our knowledge only a few subspace cluster visualization systems<br />

exist.<br />

(a)<br />

(b)<br />

Figure 2.6: (a) VISA system [14]. Left: MDS projection for the global view <strong>of</strong> clusters. Right:<br />

Matrix <strong>of</strong> subspace clusters for <strong>in</strong>-depth view. (b) Heidi Matrix [141] over a subspace.<br />

The VISA system [14] implements both a global view and an <strong>in</strong>-depth view (see<br />

Figure 2.6(a)) to help <strong>in</strong>terpret the subspace cluster<strong>in</strong>g result. In the global view, the<br />

subspace clusters are projected onto a 2D display us<strong>in</strong>g a multidimensional scal<strong>in</strong>g (MDS)<br />

projection. The aim is to show the similarity between clusters <strong>in</strong> terms <strong>of</strong> the number<br />

<strong>of</strong> records and dimensions <strong>in</strong> each cluster. Each cluster is represented as a colored circle<br />

where color represents the number <strong>of</strong> dimensions and the size represents the number <strong>of</strong><br />

<strong>in</strong>stances. The <strong>in</strong>-depth view shows the detailed characteristics <strong>of</strong> the cluster<strong>in</strong>g result<br />

<strong>in</strong>clud<strong>in</strong>g data items <strong>in</strong> each cluster and their values us<strong>in</strong>g a matrix representation. It<br />

uses di erent color codes to visualize all characteristics <strong>of</strong> an object: black for unselected<br />

dimensions, brightness for areas <strong>of</strong> <strong>in</strong>terest, and hue for value. The MDS projection <strong>in</strong><br />

VISA provides a good overview <strong>of</strong> the cluster<strong>in</strong>g results. However, us<strong>in</strong>g circles <strong>of</strong> di erent<br />

sizes <strong>in</strong> the MDS projection <strong>in</strong> VISA can be problematic; the distance between two clusters<br />

can be obscured by the radius <strong>of</strong> the circles, and the overlap between clusters <strong>of</strong>ten causes<br />

a cluttered display. The <strong>in</strong>-depth view shows detailed characteristics <strong>of</strong> the cluster<strong>in</strong>g<br />

result, but as shown <strong>in</strong> Figure 2.6(a), both hue and brightness are relatively weak at<br />

show<strong>in</strong>g di erence/variations between numbers and values <strong>in</strong> unselected dimension.<br />

Heidi Matrix [141] uses a complex arrangement <strong>of</strong> subspaces <strong>in</strong> a matrix representation.<br />

This matrix is based on the computation <strong>of</strong> the k-Nearest Neighbors (kNN) <strong>in</strong><br />

each subspace (see Figure 2.6(b)). Rows and columns represent the data items, and each


28 Chapter 2. <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> Analysis<br />

entry (i, j) <strong>in</strong> the matrix represents the number <strong>of</strong> subspaces <strong>in</strong> which i and j are neighbors.<br />

A categorical color<strong>in</strong>g scheme is used to color the cells accord<strong>in</strong>g to the particular<br />

comb<strong>in</strong>ation <strong>of</strong> subspaces <strong>in</strong> which two data items are neighbors. In addition, rows and<br />

columns are ordered accord<strong>in</strong>g to the output generated by a cluster<strong>in</strong>g algorithm. The<br />

biggest advantage <strong>of</strong> Heidi Matrix is that it displays the full <strong>in</strong>formation <strong>of</strong> the data and<br />

the subspace cluster<strong>in</strong>g result. However, the rather abstract visual mapp<strong>in</strong>g scheme makes<br />

<strong>in</strong>terpretation <strong>of</strong> the results di cult and to the best <strong>of</strong> our knowledge its e ectiveness has<br />

not been evaluated yet. The scalability <strong>of</strong> the visualization is another critical issue because<br />

it requires n ◊ n display space, where n is the number <strong>of</strong> data items.<br />

Figure 2.7: <strong>Visual</strong>ization techniques applied <strong>in</strong> Ferdosi’s work [52]. Left: 1D subspace. Middle:<br />

2D subspace. Right: Subspace with 3 or more dimensions.<br />

Ferdosi et al. [52] proposed an algorithm for f<strong>in</strong>d<strong>in</strong>g <strong>in</strong>terest<strong>in</strong>g subspaces <strong>in</strong> astronomical<br />

data as well as a visual system for display<strong>in</strong>g the results. The algorithm identifies<br />

candidate subspaces from data and ranks those by a quality metric based on density estimation<br />

and morphological operators. The result subspaces are visualized <strong>in</strong> di erent<br />

forms: l<strong>in</strong>e graphs for 1-dimensional subspaces, 2D scatterplots for 2-dimensional subspaces,<br />

and pr<strong>in</strong>ciple component analysis (PCA) projections for subspaces with higher<br />

dimensionalities (see Figure 2.7). Ferdosi’s work provides some <strong>in</strong>terest<strong>in</strong>g <strong>in</strong>sight <strong>in</strong>to<br />

subsets <strong>of</strong> dimensions <strong>in</strong> astronomical data with a high density <strong>of</strong> data objects. However,<br />

the algorithm does not assign objects to subspaces. Hence, the subspace cluster<strong>in</strong>g <strong>in</strong>formation<br />

is partially miss<strong>in</strong>g from both the data m<strong>in</strong><strong>in</strong>g and the visualization compared to<br />

VISA and Heidi Matrix, mean<strong>in</strong>g there is no direct way <strong>of</strong> compar<strong>in</strong>g subspaces.<br />

In all <strong>of</strong> the above mentioned visualization systems, the visualization <strong>of</strong> overlapp<strong>in</strong>g<br />

dimensions and overlapp<strong>in</strong>g clusters is lack<strong>in</strong>g. It is di cult to see and compare such<br />

overlapp<strong>in</strong>g <strong>in</strong>formation <strong>in</strong> the visual representations. In Section 5.1 we propose a visual<br />

tool to <strong>in</strong>vestigate subspace cluster<strong>in</strong>g results and represent also dimension and object<br />

overlap among clusters.<br />

We note that if we apply one <strong>of</strong> these subspace cluster<strong>in</strong>g visualizations, we immediately<br />

<strong>in</strong>herit two ma<strong>in</strong> challenges <strong>of</strong> this paradigm that is still considered an open research issues,<br />

namely: the e ciency challenge (relat<strong>in</strong>g to subspace cluster search) and the redundancy<br />

challenge (relat<strong>in</strong>g to the typical redundancy <strong>of</strong> the outputs generated). In Section 5.2 the<br />

redundancy problem is addressed by our proposed analytical workflow.


3<br />

Quality Measures based <strong>Visual</strong> Analysis <strong>of</strong><br />

<strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

Contents<br />

„Measure what is measurable, and make measurable what is not so.”<br />

Galileo Galilei<br />

3.1 Quality Measures for Scatterplots and Parallel Coord<strong>in</strong>ates . . 30<br />

3.1.1 Overview and Problem Description . . . . . . . . . . . . . . . . . 30<br />

3.1.2 Quality Measures for Scatterplots with Unclassified <strong>Data</strong> . . . . 32<br />

3.1.3 Quality Measures for Scatterplots with Classified <strong>Data</strong> . . . . . . 34<br />

3.1.4 Quality Measures for Parallel Coord<strong>in</strong>ates with Unclassified <strong>Data</strong> 38<br />

3.1.5 Quality Measures for Parallel Coord<strong>in</strong>ates with Classified <strong>Data</strong> . 40<br />

3.1.6 Application on Real <strong>Data</strong> Sets . . . . . . . . . . . . . . . . . . . 41<br />

3.1.7 Evaluation <strong>of</strong> the Measures’ Performance Us<strong>in</strong>g Synthetic <strong>Data</strong> . 49<br />

3.1.8 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . 53<br />

3.2 Quality Measures and Human Perception – An Empirical Study 54<br />

3.2.1 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54<br />

3.2.2 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 57<br />

3.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59<br />

3.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62<br />

3.2.5 Guidel<strong>in</strong>es . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63<br />

3.2.6 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . 63<br />

V<br />

isual exploration <strong>of</strong> multivariate data typically requires projection onto lower-dimensional<br />

representations. The number <strong>of</strong> possible representations grows rapidly with<br />

the number <strong>of</strong> dimensions, and manual exploration quickly becomes <strong>in</strong>e ective or even<br />

unfeasible. In this chapter, we propose automatic analysis methods to extract potentially<br />

relevant visual structures from a set <strong>of</strong> candidate visualizations. Based on these features,<br />

the visualizations are ranked <strong>in</strong> accordance with a specified user task. The user is provided<br />

with a manageable number <strong>of</strong> potentially useful candidate visualizations that can be used<br />

as a start<strong>in</strong>g po<strong>in</strong>t for <strong>in</strong>teractive data analysis. This can e ectively ease the task <strong>of</strong> f<strong>in</strong>d<strong>in</strong>g<br />

truly useful visualizations and potentially speed up the data exploration task. Therefore<br />

<strong>in</strong> Section 3.1, we present quality measures for class-based as well as non class-based<br />

scatterplots and parallel coord<strong>in</strong>ates visualizations. The proposed analysis methods are<br />

evaluated on real and synthetic data sets and the results are presented <strong>in</strong> Section 3.1.6<br />

and 3.1.7. Section 3.2 presents an empirical study to compare the measures rank<strong>in</strong>g with<br />

the user perception. The study helped us to derive further factors that we must take <strong>in</strong>to<br />

account when design<strong>in</strong>g new measures that have to fit the users’ perception.


30 Chapter 3. Quality Measures based <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

Parts <strong>of</strong> this chapter appeared <strong>in</strong> the follow<strong>in</strong>g publications [132, 133, 134] 1 .<br />

3.1 Quality Measures for Scatterplots and Parallel Coord<strong>in</strong>ates<br />

In this section, we present an automated approach that supports the user <strong>in</strong> the exploration<br />

process <strong>of</strong> high-dimensional data. The basic idea is to generate di erent projections from<br />

the high-dimensional data set and to automatically identify potentially relevant visual or<br />

data-structures from this set <strong>of</strong> possible candidates. These structures are used to determ<strong>in</strong>e<br />

the relevance <strong>of</strong> each projection to common predef<strong>in</strong>ed analysis tasks. The user may then<br />

use the projection with the highest relevance as the start<strong>in</strong>g po<strong>in</strong>t <strong>of</strong> the visual <strong>in</strong>teractive<br />

analysis. We present relevance measures for typical analysis tasks based on scatterplots<br />

and parallel coord<strong>in</strong>ates. The experiments on class-labeled and non class-labeled data<br />

sets demonstrate the potential <strong>of</strong> our quality measures to f<strong>in</strong>d <strong>in</strong>terest<strong>in</strong>g projections and<br />

visualizations and thus speed up the exploration process.<br />

3.1.1 Overview and Problem Description<br />

Increas<strong>in</strong>g dimensionality and grow<strong>in</strong>g volumes <strong>of</strong> data lead to the necessity <strong>of</strong> e ective exploration<br />

techniques to present the hidden <strong>in</strong>formation and structures <strong>of</strong> high-dimensional<br />

data sets. To support visual exploration, the high-dimensional data is commonly mapped<br />

to low-dimensional views, also called projections. Depend<strong>in</strong>g on the technique, exponentially<br />

many di erent low-dimensional views exist that cannot be analyzed manually.<br />

As already presented <strong>in</strong> Section 2.2.1, scatterplots and parallel coord<strong>in</strong>ates plots are<br />

commonly used visualization techniques to deal with multivariate data sets. This lowdimensional<br />

embedd<strong>in</strong>gs <strong>of</strong> the high-dimensional data <strong>in</strong> a 2D view can be <strong>in</strong>terpreted<br />

easily by the users. We have also seen that this techniques entail di erent challenges for<br />

high-dimensional data sets. For scatterplots, the high number <strong>of</strong> possible 2D projections<br />

for a high-dimensional data sets is challeng<strong>in</strong>g. S<strong>in</strong>ce there are n2 ≠n<br />

2<br />

di erent plots for a n-<br />

dimensional data set <strong>in</strong> a scatterplot matrix, an automatic analysis technique to preselect<br />

the important projections is useful and necessary.<br />

For parallel coord<strong>in</strong>ates one problem is the large number <strong>of</strong> possible arrangements <strong>of</strong><br />

the dimension axes. It has been shown <strong>in</strong> [30] that for a n-dimensional data set n+1<br />

2<br />

permutations<br />

are needed to visualize all relations between dimensions, but there are n! possible<br />

arrangements. An automated analysis <strong>of</strong> the visualizations can help f<strong>in</strong>d<strong>in</strong>g the best order<strong>in</strong>g<br />

out <strong>of</strong> all possible arrangements. We attempt to analyze the pairwise comb<strong>in</strong>ations <strong>of</strong><br />

dimensions that are later assembled to f<strong>in</strong>d the best visualizations by reduc<strong>in</strong>g the visual<br />

1 Please note that parts <strong>of</strong> the publications used here are slightly changed to adapt to the dissertation’s<br />

term<strong>in</strong>ology. Due to readability issues and be<strong>in</strong>g an author <strong>in</strong> lead<strong>in</strong>g role for these publications, I decided<br />

not to quote these excerpts.<br />

The <strong>in</strong>tense collaboration for [133] with G. Albuquerque and M. Eisemann from Braunschweig, brought up<br />

new image quality measures that they implemented and described for our jo<strong>in</strong>ed publication. I participated<br />

<strong>in</strong> some <strong>of</strong> the discussions. Together we ran experiments for the application section on real data sets and<br />

described them <strong>in</strong> the paper. The evaluation part on the synthetic data was completely designed by<br />

myself. I decided to <strong>in</strong>clude the full description <strong>of</strong> the metrics <strong>in</strong> my thesis for a better understand<strong>in</strong>g<br />

<strong>of</strong> the experiments and the discussions about the outcome. Major parts <strong>of</strong> Section 3.1.2, Section 3.1.3,<br />

Section 3.1.4 and Section 3.1.5 are therefore credited to aforementioned authors.


3.1.1 Overview and Problem Description 31<br />

analysis to n 2 visualizations. We propose rank<strong>in</strong>g functions to judge the quality <strong>of</strong> a visual<br />

embedd<strong>in</strong>g. This rank<strong>in</strong>g functions are called quality measures and automatically select<br />

the best visual representation with respect to a given task.<br />

HD <strong>Data</strong><br />

Set <strong>of</strong><br />

<strong>Visual</strong>izations<br />

Quality Measures<br />

Ranked<br />

<strong>Visual</strong>izations<br />

2D projections<br />

<strong>in</strong> scatterplots<br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

& Projection<br />

2000 4000 6000 8000<br />

0 200 400 600 800<br />

dim 4<br />

dim 22<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

● ●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

● ●<br />

● ● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●●●●●●●●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●●●●●●●●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

● ● ● ●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

100 200 300 400 500 600<br />

0 200 400 600 800 1000<br />

dim 5<br />

dim 7<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

100 200 300 400 500 600<br />

0 200 400 600 800<br />

dim 5<br />

dim 22<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●●<br />

●<br />

● ● ● ●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●●<br />

●●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

● ● ● ●<br />

● ● ● ●<br />

● ● ● ●<br />

● ● ● ●<br />

● ● ● ●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

0 2000 4000 6000 8000<br />

0 200 400 600 800<br />

dim 6<br />

dim 22<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

0 2000 4000 6000<br />

−300 −200 −100 0 100 200<br />

Comp.1<br />

Comp.2<br />

● ●<br />

● ●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●●<br />

●●●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●●<br />

●<br />

● ●<br />

● ●●<br />

●<br />

● ●●<br />

● ●<br />

● ●●<br />

●<br />

● ●●<br />

●<br />

●<br />

● ●●<br />

●<br />

● ●●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

● ●●●<br />

●<br />

● ●●●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

● ●<br />

● ●<br />

● ● ● ●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

● ●<br />

● ●<br />

●<br />

● ●●<br />

●●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

● ●●<br />

● ●● ●<br />

● ●<br />

●<br />

●●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

● ●<br />

● ● ●<br />

● ●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

●●●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

● ●<br />

● ●<br />

● ●<br />

● ●<br />

● ●<br />

● ●<br />

● ●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

● ●<br />

● ●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●●●●●●●●●<br />

● ●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

● ●<br />

● ●<br />

●<br />

● ●<br />

● ●●●●<br />

● ●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ● ● ● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●●<br />

●<br />

● ● ●●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●●●●<br />

● ●<br />

●<br />

● ●●● ● ●<br />

●<br />

● ●●<br />

●<br />

● ●●<br />

● ●<br />

●<br />

●<br />

● ●●<br />

● ●<br />

●<br />

● ●<br />

● ●<br />

● ● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

● ●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

● ●<br />

●<br />

● ●<br />

● ●<br />

●<br />

● ●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●● ●<br />

●<br />

●<br />

●● ●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

● ●●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●●●<br />

●<br />

● ●<br />

●●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●●<br />

● ●<br />

●●●<br />

●<br />

●●<br />

●<br />

●<br />

●●●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

0 2000 4000 6000<br />

−200 0 200 400 600<br />

Comp.1<br />

Comp.2<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

● ●<br />

● ● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

● ●<br />

● ● ● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

● ●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●● ●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ● ●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ● ●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

● ●●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●●<br />

● ● ●●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●●<br />

● ●<br />

●<br />

●●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

Task<br />

2000 4000 6000 8000<br />

0 200 400 600 800<br />

dim 4<br />

dim 22<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

● ●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

● ●<br />

● ● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●●●●●●●●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●●●●●●●●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

● ● ● ●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />


32 Chapter 3. Quality Measures based <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

An overview <strong>of</strong> our techniques is shown <strong>in</strong> Table 3.1. For scatterplots with unclassified<br />

data, we developed the Rotat<strong>in</strong>g Variance Measure which favors xy-plots with a high<br />

correlation between the two dimensions. For classified data, we propose measures that<br />

consider the class <strong>in</strong>formation while comput<strong>in</strong>g the rank<strong>in</strong>g value <strong>of</strong> the images. We<br />

developed four methods, a Class Density Measure, aClass Separat<strong>in</strong>g Measure, a1D-<br />

Histogram Density Measure, and a 2D-Histogram Density Measure. They have the goal<br />

to f<strong>in</strong>d the best scatterplots show<strong>in</strong>g the classes separated. For parallel coord<strong>in</strong>ates with<br />

unclassified data, we propose a Hough Space Measure that searches for <strong>in</strong>terest<strong>in</strong>g patterns<br />

such as clustered l<strong>in</strong>es <strong>in</strong> the views. For classified data, we propose two measures: the<br />

Overlap Measure that focuses on f<strong>in</strong>d<strong>in</strong>g views with as little overlap as possible between<br />

the classes, so that the classes separate well, and the Similarity Measure that looks for<br />

correlations between the l<strong>in</strong>es. All the measures, except the 1D and 2D-Histogram Density<br />

Measures, are computed directly over the visualization images and do not consider possible<br />

<strong>in</strong>tra- and <strong>in</strong>terclass overplott<strong>in</strong>g <strong>of</strong> po<strong>in</strong>ts.<br />

As example analysis tasks for unclassified data sets, we choose correlation search <strong>in</strong><br />

scatterplots (Section 3.1.2) and cluster search (i.e. similar l<strong>in</strong>es) <strong>in</strong> parallel coord<strong>in</strong>ates<br />

(Section 3.1.4). If class <strong>in</strong>formation is given, the tasks are to f<strong>in</strong>d views where dist<strong>in</strong>ct<br />

clusters <strong>in</strong> the data set are also well separated <strong>in</strong> the visualization (Section 3.1.3) or show<br />

a high level <strong>of</strong> <strong>in</strong>ter- and <strong>in</strong>traclass similarity (Section 3.1.5).<br />

3.1.2 Quality Measures for Scatterplots with Unclassified <strong>Data</strong><br />

Our scatterplot measures aim to assess the distribution <strong>of</strong> the data regard<strong>in</strong>g correlation<br />

and density <strong>of</strong> po<strong>in</strong>ts and the separateness <strong>of</strong> classes. In this section, we therefore propose<br />

analysis functions to compute the correlation <strong>of</strong> po<strong>in</strong>ts <strong>in</strong> scatterplots with unclassified<br />

data. Additionally, new methods to measure the density <strong>of</strong> the classes and for assess<strong>in</strong>g<br />

the separateness <strong>of</strong> classes <strong>in</strong> scatterplots with classified data are proposed <strong>in</strong> the next<br />

Section 3.1.3. In the case <strong>of</strong> unclassified, but well separable data, class labels can be<br />

automatically assigned us<strong>in</strong>g cluster<strong>in</strong>g algorithms.<br />

Rotat<strong>in</strong>g Variance Measure 2<br />

<strong>High</strong> correlations are represented as long, sk<strong>in</strong>ny structures <strong>in</strong> the scatterplot visualization.<br />

Due to outliers even almost perfect correlations can lead to skewed distributions <strong>in</strong> the<br />

plot and attention needs to be paid to this fact. The Rotat<strong>in</strong>g Variance Measure (RVM)<br />

is aimed at f<strong>in</strong>d<strong>in</strong>g l<strong>in</strong>ear and nonl<strong>in</strong>ear correlations between the pairwise dimensions <strong>of</strong> a<br />

given data set.<br />

To compute the measure over the image representation we first transform the discrete<br />

scatterplot visualization <strong>in</strong>to a cont<strong>in</strong>uous density field. For each screen pixel s and its<br />

position x =(x, y) the distance to its k-th nearest sample po<strong>in</strong>ts N s <strong>in</strong> the visualization<br />

is computed. To obta<strong>in</strong> an estimate <strong>of</strong> the local density fl at a pixel s, wedef<strong>in</strong>efl =1/r,<br />

where r is the radius <strong>of</strong> the enclos<strong>in</strong>g sphere <strong>of</strong> the k-nearest neighbors <strong>of</strong> s given by<br />

r = max iœNs ||x ≠ x i ||. (3.1)<br />

2 Implemented and described by our partners from Braunschweig: G. Albuquerque and M. Eisemann<br />

for the collaborative publication [133]. Adapted and slightly changed for the thesis by myself.


3.1.2 Quality Measures for Scatterplots with Unclassified <strong>Data</strong> 33<br />

(a)<br />

(b)<br />

Figure 3.2: Scatterplot example and its respective density image. For each pixel we compute the<br />

mass distribution along di erent directions and save the smallest value, here depicted by the blue<br />

l<strong>in</strong>e.<br />

Choos<strong>in</strong>g the k-th neighbor <strong>in</strong>stead <strong>of</strong> the nearest elim<strong>in</strong>ates the <strong>in</strong>fluence <strong>of</strong> outliers. k<br />

is chosen to be between 2 and n ≠ 1, so that the m<strong>in</strong>imum value <strong>of</strong> r is mapped to 1. We<br />

used k = 4 throughout the application Section 3.1.6. Other density estimations could <strong>of</strong><br />

course be used as well.<br />

<strong>Visual</strong>izations conta<strong>in</strong><strong>in</strong>g high correlations should generally have correspond<strong>in</strong>g density<br />

fields with a small band <strong>of</strong> larger values while views with lower correlation should have<br />

a density field consist<strong>in</strong>g <strong>of</strong> many local maxima spread <strong>in</strong> the image. We can estimate<br />

this amount <strong>of</strong> spread for every pixel by comput<strong>in</strong>g the normalized mass distribution by<br />

tak<strong>in</strong>g s samples along di erent l<strong>in</strong>es l ◊ centered at the correspond<strong>in</strong>g pixel positions x l◊<br />

and with length equal to the image width, see Figure 3.2. For these sampled l<strong>in</strong>es we<br />

compute the weighted distribution for each pixel position x i :<br />

‹ ◊ i =<br />

q sj=1<br />

p s j<br />

l ◊<br />

||x i ≠ x s j<br />

||<br />

q sj=1<br />

p s j<br />

l ◊<br />

(3.2)<br />

‹ i = m<strong>in</strong><br />

◊œ[0,2fi] ‹i ◊ (3.3)<br />

where p s j<br />

l ◊<br />

is the j-th sample along l<strong>in</strong>e l ◊ and x s j<br />

is its correspond<strong>in</strong>g position <strong>in</strong> the image.<br />

For pixels positioned at a maximum <strong>of</strong> a density image convey<strong>in</strong>g a real correlation the<br />

distribution value will be very small, if the l<strong>in</strong>e is orthogonal to the local ma<strong>in</strong> direction<br />

<strong>of</strong> the correlation at the current position <strong>in</strong> comparison to other positions <strong>in</strong> the image.<br />

Note that such a l<strong>in</strong>e can be found even <strong>in</strong> non-l<strong>in</strong>ear correlations. On the other hand,<br />

pixels <strong>in</strong> density images convey<strong>in</strong>g low correlation will always have only large ‹ values.<br />

For each column <strong>in</strong> the image, we compute the m<strong>in</strong>imum value and sum up the result.<br />

The f<strong>in</strong>al RVM value is therefore def<strong>in</strong>ed as:<br />

RV M =<br />

1<br />

qx m<strong>in</strong> y ‹(x, y) , (3.4)<br />

where ‹(x, y) is the mass distribution value at pixel position (x, y).


34 Chapter 3. Quality Measures based <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

3.1.3 Quality Measures for Scatterplots with Classified <strong>Data</strong><br />

Most <strong>of</strong> the known techniques calculate the quality <strong>of</strong> a projection without tak<strong>in</strong>g the class<br />

distribution <strong>in</strong>to account. In classified data plots we can search for the class distribution <strong>in</strong><br />

the projection, where good views should show good class separation, i.e. m<strong>in</strong>imal overlap<br />

<strong>of</strong> classes.<br />

In this section, we propose three approaches to rank the scatterplots <strong>of</strong> multivariate<br />

classified data sets, <strong>in</strong> order to determ<strong>in</strong>e the best views <strong>of</strong> the high-dimensional structures.<br />

Class Density Measure 3<br />

The Class Density Measure (CDM) evaluates orthogonal projections, i.e. scatterplots,<br />

accord<strong>in</strong>g to their separation properties. Therefore, CDM computes a score for each<br />

candidate plot that reflects the separation properties <strong>of</strong> the classes consider<strong>in</strong>g also the<br />

density <strong>of</strong> each class. The candidate plots are then ranked accord<strong>in</strong>g to their score, so<br />

that the user can start <strong>in</strong>vestigat<strong>in</strong>g highly ranked plots <strong>in</strong> the exploration process.<br />

In case we are given only the visualization without the data, we assume that every<br />

color used <strong>in</strong> the visualization represents one class. We therefore separate the classes<br />

first <strong>in</strong>to dist<strong>in</strong>ct images, so that each image conta<strong>in</strong>s only the <strong>in</strong>formation <strong>of</strong> one <strong>of</strong> the<br />

classes. Please note that the overplott<strong>in</strong>g <strong>of</strong> classes <strong>in</strong>fluences the computation <strong>of</strong> the<br />

measure. If the data is available, this is no longer a problem s<strong>in</strong>ce all the classes can be<br />

plotted separately <strong>in</strong> one image. S<strong>in</strong>ce a cont<strong>in</strong>uous representation for each class-image is<br />

necessary to compute the overlap between the classes, we estimate a cont<strong>in</strong>uous, smooth<br />

density function based on local neighborhoods. For each screen pixel s the distance to its<br />

k-th nearest neighbors N s <strong>of</strong> the same class is computed and the local density is derived<br />

as described earlier <strong>in</strong> this section.<br />

Hav<strong>in</strong>g these cont<strong>in</strong>uous density functions available for each class, we estimate the<br />

mutual overlap by comput<strong>in</strong>g the sum <strong>of</strong> the absolute di erence between each pair and<br />

sum up the result:<br />

M≠1 ÿ Mÿ Pÿ<br />

CDM =<br />

|p i k ≠ p i l|, (3.5)<br />

k=1 l=k+1 i=1<br />

with M be<strong>in</strong>g the number <strong>of</strong> density images, i.e. classes respectively, p i k is the i-th pixel<br />

value <strong>in</strong> the density image computed for the class k, and P is the number <strong>of</strong> pixels. If<br />

the range <strong>of</strong> the pixel values is normalized to [0, 1] the range for the CDM is between<br />

0 and P , consider<strong>in</strong>g 2 classes (M=2). This value is large, if the densities at each pixel<br />

di er as much as possible, i.e. if one class has a high density value compared to all others.<br />

Consequently, the visualization with the fewest overlap <strong>of</strong> the classes will be given the<br />

highest value. Another property <strong>of</strong> this measure is not only <strong>in</strong> assess<strong>in</strong>g well separated<br />

but also dense clusters that ease the <strong>in</strong>terpretability <strong>of</strong> the data <strong>in</strong> the visualization. Note<br />

that non-overlapp<strong>in</strong>g classes <strong>in</strong> scatterplots produce di erent density images us<strong>in</strong>g our<br />

algorithm. If the clusters are similar, the density images are di erent, which results <strong>in</strong> a<br />

high value for the CDM measure.<br />

3 Implemented and described by our partners from Braunschweig, G. Albuquerque and M. Eisemann,<br />

for the collaborative publication [133]. Adapted and slightly changed for the thesis by myself.


3.1.3 Quality Measures for Scatterplots with Classified <strong>Data</strong> 35<br />

Class Separat<strong>in</strong>g Measure 4<br />

The CDM <strong>in</strong>troduced before f<strong>in</strong>ds views with few overlap between classes and dense clusters<br />

<strong>in</strong> high-dimensional data sets. The CDM measure is computed over density images<br />

with a rapid fallo function. The local density fl was def<strong>in</strong>ed <strong>in</strong> Section 3.1.2 as fl =1/r.<br />

By chang<strong>in</strong>g this function, we are able to control the balance between the property <strong>of</strong><br />

separation and dense cluster<strong>in</strong>g. Choos<strong>in</strong>g a function with an <strong>in</strong>creas<strong>in</strong>g value for r can<br />

yield better separated clusters but with a lower cluster<strong>in</strong>g property.<br />

In our experiments, we found that us<strong>in</strong>g fl = r <strong>in</strong>stead fl =1/r, provides a good<br />

trade-o between class separability and cluster<strong>in</strong>g. In extension to the CDM measure, we<br />

therefore propose the Class Separat<strong>in</strong>g Measure (CSM). The ma<strong>in</strong> di erence between these<br />

two measures is <strong>in</strong> the computation <strong>of</strong> the cont<strong>in</strong>uous representation <strong>of</strong> the scatterplot,<br />

henceforth termed distance field for the CSM (with fl = r), and density image for the<br />

CDM (with fl =1/r).<br />

To compute a distance field, the local distance at a screen pixel s is def<strong>in</strong>ed as r, where<br />

r is the radius <strong>of</strong> the enclos<strong>in</strong>g sphere <strong>of</strong> the k-nearest neighbors <strong>of</strong> s, as described earlier<br />

<strong>in</strong> Section 3.1.2. Once we have the distance field <strong>of</strong> each class, the CSM is computed as<br />

the sum <strong>of</strong> the absolute di erence between them (note that for the CDM measure the<br />

<strong>in</strong>verse <strong>of</strong> the distance was used):<br />

M≠1 ÿ Mÿ Pÿ<br />

CSM =<br />

|p i k ≠ p i l|, (3.6)<br />

k=1 l=k+1 i=1<br />

with M be<strong>in</strong>g the number <strong>of</strong> distance field images, i.e. classes respectively, p i k is the i-th<br />

pixel value <strong>in</strong> the distance field computed for the class k, and P is the number <strong>of</strong> pixels.<br />

Compar<strong>in</strong>g the CSM and the CDM, the Class Separat<strong>in</strong>g Measure has a bias towards<br />

large distances between clusters while the Class Density Measure has a bias towards dense<br />

clusters. We consider separation and density <strong>of</strong> the clusters as two di erent user tasks.<br />

Frequently, views with well separated clusters are not necessarily the ones with dense clusters.<br />

When a view presents both properties simultaneously, it is assigned with a higher<br />

value by the two measures, produc<strong>in</strong>g a similar rank for both measures. The user has the<br />

opportunity to choose his measure accord<strong>in</strong>g to the task, or even comb<strong>in</strong>e both measures,<br />

to f<strong>in</strong>d projections support<strong>in</strong>g both tasks. A comparison between the Class Separat<strong>in</strong>g<br />

and Class Density measures with a real example is presented <strong>in</strong> Section 3.1.6.<br />

Histogram Density Measures 5<br />

The Histogram Density Measures (1D and 2D-HDM) are density measures for scatterplots<br />

that extend the previously presented approaches by <strong>in</strong>clud<strong>in</strong>g non-orthogonal views <strong>in</strong><br />

the ranked result lists. They consider the class distribution <strong>of</strong> the data po<strong>in</strong>ts us<strong>in</strong>g<br />

histograms. S<strong>in</strong>ce we are <strong>in</strong>terested <strong>in</strong> plots that show good class separations, HDM looks<br />

for correspond<strong>in</strong>g histograms that show significant separation properties given by pure<br />

histogram b<strong>in</strong>s. To determ<strong>in</strong>e the best low-dimensional embedd<strong>in</strong>g <strong>of</strong> the high-dimensional<br />

data us<strong>in</strong>g HDM, a two step computation is conducted.<br />

4 Implemented and described by our partners from Braunschweig, G. Albuquerque and M. Eisemann,<br />

for the collaborative publication [132]. Adapted and slightly changed for the thesis by myself.<br />

5 Implemented and described by myself.


36 Chapter 3. Quality Measures based <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

First, all 2D scatterplots <strong>of</strong> the data set are ranked with the 1D-HDM to search <strong>in</strong><br />

the 1D l<strong>in</strong>ear projections which dimensions are represent<strong>in</strong>g the classes best separated.<br />

For each projection, we therefore rank them by the entropy value <strong>of</strong> the 1D projections<br />

separated <strong>in</strong> small equidistant parts, called histogram b<strong>in</strong>s. p c is the number <strong>of</strong> po<strong>in</strong>ts <strong>of</strong><br />

class c <strong>in</strong> one b<strong>in</strong>. The entropy, average <strong>in</strong>formation content <strong>of</strong> that b<strong>in</strong>, is calculated as:<br />

H(p) =≠ ÿ c<br />

p c<br />

q<br />

c p c<br />

log 2<br />

p c<br />

q<br />

c p c<br />

. (3.7)<br />

H(p) is 0, if a b<strong>in</strong> has only po<strong>in</strong>ts <strong>of</strong> one class, and log 2 M, if it conta<strong>in</strong>s equivalent po<strong>in</strong>ts<br />

<strong>of</strong> all M classes. Each projection is ranked with the 1D-HDM :<br />

HDM 1D = 100 ≠ 1 ÿ<br />

( ÿ p c H(p)) (3.8)<br />

Z<br />

x c<br />

= 100 ≠ 1 ÿ ÿ<br />

p c (≠ ÿ p c p c<br />

q<br />

Z<br />

x c c c p log 2 q<br />

c c p ). (3.9)<br />

c<br />

where 1 Z<br />

is a normalization factor, to obta<strong>in</strong> rank<strong>in</strong>g values between 0 and 100, hav<strong>in</strong>g<br />

100 as best value:<br />

1<br />

Z = 100<br />

log 2 M q q<br />

x c p . (3.10)<br />

c<br />

Figure 3.3: 2D view and rotated projection axes. The projection on the rotated plane has less<br />

overlap, and the structures <strong>of</strong> the data can be seen even <strong>in</strong> the projection. This is not possible for<br />

a projection on the orig<strong>in</strong>al axes.<br />

In some data sets, paraxial projections are not able to show the structure <strong>of</strong> highdimensional<br />

data. In these cases, simple rotation <strong>of</strong> the projection axes can improve the<br />

quality <strong>of</strong> the measure. In Figure 3.3 we show an example, where a rotation is improv<strong>in</strong>g<br />

the projection quality. While the paraxial projection <strong>of</strong> these classes cannot show these<br />

structures on the axes, the rotated (dotted projection) axes have less overlay for a projection<br />

on the x Õ axis. Consequently, we rotate the projection plane and compute the


11 12 13 14<br />

dim 2<br />

(5,8,11,12)<br />

−5 0 5 10<br />

Comp.1<br />

(8,11,12)<br />

−8 −6 −4 −2 0 2 4<br />

Comp.1<br />

(5,8,11)<br />

−5 0 5 10<br />

Comp.1<br />

(5,8,12)<br />

−5 0 5 10<br />

Comp.1<br />

(8,11,12)<br />

−8 −6 −4 −2 0 2 4<br />

Comp.1<br />

3.1.3 Quality Measures for Scatterplots with Classified <strong>Data</strong> 37<br />

1D-HDM for di erent angles ◊. For each plot we choose the best 1D-HDM value out <strong>of</strong><br />

di erent rotation angles. We experimentally found ◊ =9m degree, with m œ [0, 20), to<br />

be work<strong>in</strong>g well for all our data sets. Figure 3.4 sketches this first step, show<strong>in</strong>g how we<br />

measure di erent rotations for one plot (represented by the distribution histograms) to<br />

f<strong>in</strong>d his best measure value represent<strong>in</strong>g the visual quality <strong>of</strong> the plot.<br />

1D-HDM<br />

dim 8<br />

1 2 3 4 5<br />

all rotations<br />

0 10 20 30 40<br />

0 10 20 30 40<br />

0 10 20 30 40<br />

...<br />

0 10 20 30 40<br />

best 1D-HDM<br />

0 10 20 30 40<br />

Figure 3.4: First step <strong>of</strong> the HDM approach: each plot is ranked for di erent rotations with the<br />

1D-HDM. The best measure value is taken for the plot.<br />

Second, a subset <strong>of</strong> the best ranked dimensions are chosen to be further <strong>in</strong>vestigated<br />

<strong>in</strong> higher dimensions. All the comb<strong>in</strong>ations <strong>of</strong> the selected dimensions enter a PCA computation.<br />

PCA [83] transforms a high-dimensional data set with correlated dimensions, <strong>in</strong><br />

a lower-dimensional data set with uncorrelated dimensions, called pr<strong>in</strong>cipal components.<br />

For more properties <strong>of</strong> PCA please refer back to Section 2.1.2<br />

For every comb<strong>in</strong>ation <strong>of</strong> selected dimensions, after the PCA is computed, the first two<br />

components <strong>of</strong> the PCA are plotted to be ranked by the 2D-HDM (see Figure 3.5). The<br />

2D-HDM is an extended version <strong>of</strong> the 1D-HDM, for which a 2-dimensional histogram<br />

is computed on the scatterplot. The quality is measured, exactly as for the 1D-HDM<br />

by summ<strong>in</strong>g up a weighted sum <strong>of</strong> the entropy <strong>of</strong> one b<strong>in</strong>. The measure is normalized<br />

between 0 and 100, hav<strong>in</strong>g 100 for the best data po<strong>in</strong>ts visualization, where each b<strong>in</strong><br />

conta<strong>in</strong>s po<strong>in</strong>ts <strong>of</strong> only one class. The b<strong>in</strong> neighborhood has here been considered s<strong>in</strong>ce<br />

for each b<strong>in</strong> p c we sum the <strong>in</strong>formation <strong>of</strong> the b<strong>in</strong> itself and the direct neighborhood,<br />

labeled as u c . Consequently, the 2D-HDM is:<br />

HDM 2D = 100 ≠ 1 ÿ ÿ<br />

u c (≠ ÿ Z<br />

x,y c c<br />

u c<br />

q<br />

c u c<br />

log 2<br />

u c<br />

q<br />

c u c<br />

) (3.11)<br />

with the adapted normalization factor:<br />

1<br />

Z = 100<br />

log 2 M q x,y (q c u c) . (3.12)<br />

selected with 1D-HDM<br />

2D-HDM<br />

k best<br />

dimensions<br />

that<br />

separate<br />

PCA(Subset)<br />

Comp.2<br />

−8 −6 −4 −2 0 2 4<br />

Comp.2<br />

−1 0 1 2 3<br />

Comp.2<br />

−4 −2 0 2 4 6 8<br />

...<br />

Comp.2<br />

−4 −3 −2 −1 0 1 2<br />

best 2D-HDM<br />

Comp.2<br />

−1 0 1 2 3<br />

Figure 3.5: Second step <strong>of</strong> the HDM approach: PCA is computed on the k best selected dimensions<br />

and on all the possible subsets greater than 3 dimensions. The first two components are plotted<br />

<strong>in</strong> scatterplots, that are ranked with the 2D-HDM. The best measure value <strong>in</strong>dicates the best<br />

scatterplot where the class <strong>in</strong>formation is separated.


38 Chapter 3. Quality Measures based <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

3.1.4 Quality Measures for Parallel Coord<strong>in</strong>ates with Unclassified <strong>Data</strong><br />

When analyz<strong>in</strong>g parallel coord<strong>in</strong>ates plots, we focus on the detection <strong>of</strong> plots that either<br />

show significant correlation between attribute dimensions or good cluster<strong>in</strong>g properties<br />

<strong>in</strong> certa<strong>in</strong> attribute ranges. There exist a number <strong>of</strong> analytical approaches for parallel<br />

coord<strong>in</strong>ates to generate dimension order<strong>in</strong>gs that try to fulfill these tasks [9, 159]. However,<br />

they <strong>of</strong>ten do not generate an optimal parallel plot for correlation and cluster<strong>in</strong>g properties,<br />

because <strong>of</strong> local e ects that are not taken <strong>in</strong>to account by most analytical functions. We<br />

therefore present analysis functions that do not only take the properties <strong>of</strong> the data <strong>in</strong>to<br />

account, but also considers the properties <strong>of</strong> the result<strong>in</strong>g plot.<br />

Hough Space Measure 6<br />

Our analysis is based on f<strong>in</strong>d<strong>in</strong>g patterns like clustered l<strong>in</strong>es with similar positions and<br />

directions. Our algorithm for detect<strong>in</strong>g these clusters is based on the Hough transform [73].<br />

Straight l<strong>in</strong>es <strong>in</strong> the image space can be described as y = ax + b. The ma<strong>in</strong> idea <strong>of</strong> the<br />

Hough transform is to def<strong>in</strong>e a straight l<strong>in</strong>e accord<strong>in</strong>g to its parameters, i.e. the slope a<br />

and the <strong>in</strong>terception b. Due to a practical di culty (the slope <strong>of</strong> vertical l<strong>in</strong>es is <strong>in</strong>f<strong>in</strong>ite)<br />

the normal representation <strong>of</strong> a l<strong>in</strong>e is:<br />

fl = x · cos◊ + y · s<strong>in</strong>◊, (3.13)<br />

where fl is the length <strong>of</strong> the normal from the orig<strong>in</strong> to the l<strong>in</strong>e and ◊ is the angle between<br />

this normal and the x-axis. Us<strong>in</strong>g this representation, for each non-background pixel <strong>in</strong><br />

the visualization, we have a dist<strong>in</strong>ct s<strong>in</strong>usoidal curve <strong>in</strong> the fl◊-plane, also called Hough<br />

or accumulator space. An <strong>in</strong>tersection <strong>of</strong> these curves <strong>in</strong>dicates that the correspond<strong>in</strong>g<br />

pixels belong to the l<strong>in</strong>e def<strong>in</strong>ed by the parameters (fl i ,◊ i ) <strong>in</strong> the orig<strong>in</strong>al space. Figure 3.6<br />

shows two synthetic examples <strong>of</strong> parallel coord<strong>in</strong>ates and their respective Hough spaces:<br />

Figure 3.6(a) presents two well def<strong>in</strong>ed l<strong>in</strong>e clusters and is more <strong>in</strong>terest<strong>in</strong>g for the cluster<br />

identification task than Figure 3.6(b), where no l<strong>in</strong>e cluster can be identified. Note that<br />

the bright areas <strong>in</strong> the fl◊-plane represent the clusters <strong>of</strong> l<strong>in</strong>es with similar fl and ◊.<br />

To reduce the bias towards long l<strong>in</strong>es, e.g. diagonal l<strong>in</strong>es, we scale the pairwise visualization<br />

images to an n ◊ n resolution, usually 512 ◊ 512. The accumulator space is<br />

quantized <strong>in</strong>to a w ◊ h cell grid, where w and h control the similarity sensibility <strong>of</strong> the<br />

l<strong>in</strong>es. We use 50 ◊ 50 grids for comput<strong>in</strong>g the results presented <strong>in</strong> Section 3.1.6 and <strong>in</strong><br />

Section 3.1.7. A lower value for w and h reduces the sensibility <strong>of</strong> the algorithm because<br />

l<strong>in</strong>es with a slightly di erent fl and ◊ are mapped to the same accumulator cells.<br />

Based on our def<strong>in</strong>ition, good visualizations must conta<strong>in</strong> fewer well def<strong>in</strong>ed clusters,<br />

which are represented by accumulator cells with high values. To identify these cells,<br />

we compute the median value m as an adaptive threshold that divides the accumulator<br />

function h(x) <strong>in</strong>to two identical parts:<br />

q h(x)<br />

2<br />

g(x) =<br />

= ÿ g(x), where (3.14)<br />

I<br />

x if x Æ m;<br />

6 Implemented and described by our partners from Braunschweig, G. Albuquerque and M. Eisemann,<br />

for the collaborative publication [133]. Adapted and slightly changed for the thesis by myself.<br />

m<br />

else.


3.1.4 Quality Measures for Parallel Coord<strong>in</strong>ates with Unclassified <strong>Data</strong> 39<br />

(a)<br />

(b)<br />

Figure 3.6: Synthetic examples <strong>of</strong> parallel coord<strong>in</strong>ates and their respective Hough spaces: (a)<br />

presents two well def<strong>in</strong>ed l<strong>in</strong>e clusters and is more <strong>in</strong>terest<strong>in</strong>g for the cluster identification task<br />

than (b), where no l<strong>in</strong>e cluster can be identified. Note that the bright areas <strong>in</strong> the fl◊-plane<br />

represent the clusters <strong>of</strong> l<strong>in</strong>es with similar fl and ◊.<br />

Us<strong>in</strong>g the median value, only a few clusters are selected <strong>in</strong> an accumulator space with high<br />

contrast between the cells (see Figure 3.6(a)) while <strong>in</strong> a uniform accumulator space many<br />

clusters are selected (see Figure 3.6(b)). This adaptive threshold is not only necessary to<br />

select possible l<strong>in</strong>e clusters <strong>in</strong> the accumulator space, but also to avoid the <strong>in</strong>fluence <strong>of</strong><br />

outliers and occlusion between the l<strong>in</strong>es. In the occlusion case, a po<strong>in</strong>t that belongs to<br />

two or more l<strong>in</strong>es is computed just once <strong>in</strong> the accumulator space.<br />

The f<strong>in</strong>al quality value for a 2D visualization is computed by the number <strong>of</strong> accumulator<br />

cells n cells that have a higher value than m normalized by the total number <strong>of</strong> cells (w · h)<br />

to the <strong>in</strong>terval [0, 1]:<br />

s i,j =1≠ n cells<br />

w · h , (3.15)<br />

where i, j are the <strong>in</strong>dices <strong>of</strong> the respective dimensions, and the computed measure s i,j<br />

presents higher values for images conta<strong>in</strong><strong>in</strong>g well def<strong>in</strong>ed l<strong>in</strong>e clusters (similar l<strong>in</strong>es) and<br />

lower values for images conta<strong>in</strong><strong>in</strong>g l<strong>in</strong>es <strong>in</strong> many di erent directions and positions.<br />

Hav<strong>in</strong>g comb<strong>in</strong>ed the pairwise visualizations, we can now compute the overall quality<br />

measure by summ<strong>in</strong>g up the respective pairwise measurements. This overall quality<br />

measure <strong>of</strong> a parallel visualization conta<strong>in</strong><strong>in</strong>g n dimensions is:<br />

HSM = ÿ a i œI<br />

s ai ,a i+1<br />

, (3.16)<br />

where I is a vector conta<strong>in</strong><strong>in</strong>g any possible comb<strong>in</strong>ation <strong>of</strong> the n dimensions <strong>in</strong>dices. In this<br />

way we can measure the quality <strong>of</strong> any given visualization by us<strong>in</strong>g parallel coord<strong>in</strong>ates.<br />

Exhaustively comput<strong>in</strong>g all n-dimensional comb<strong>in</strong>ations <strong>in</strong> order to choose the best/worst<br />

ones, requires a very long computation time and becomes unfeasible for a large n. Inthese<br />

cases, search<strong>in</strong>g for the best n-dimensional comb<strong>in</strong>ations <strong>in</strong> a feasible time, an algorithm<br />

to solve a Travel<strong>in</strong>g Salesman Problem is used, e.g. the A*-Search algorithm [66] or others<br />

[12]. Instead <strong>of</strong> exhaustively comb<strong>in</strong><strong>in</strong>g all possible pairwise visualizations, these k<strong>in</strong>d <strong>of</strong><br />

algorithms would compose only the best overall visualization.


40 Chapter 3. Quality Measures based <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

3.1.5 Quality Measures for Parallel Coord<strong>in</strong>ates with Classified <strong>Data</strong><br />

While analyz<strong>in</strong>g parallel coord<strong>in</strong>ates visualizations with class <strong>in</strong>formation, we consider<br />

two ma<strong>in</strong> issues. First, <strong>in</strong> good parallel coord<strong>in</strong>ates visualizations, the l<strong>in</strong>es that belong<br />

to a determ<strong>in</strong>ed class must be quite similar (<strong>in</strong>cl<strong>in</strong>ation and position similarity). Second,<br />

visualizations where the classes can be separately observed and that conta<strong>in</strong> less overlapp<strong>in</strong>g<br />

are also considered to be good. We developed two measures for classified parallel<br />

coord<strong>in</strong>ates that take these matters <strong>in</strong>to account: the Similarity Measure that encourages<br />

<strong>in</strong>ner class similarities, and the Overlap Measure that analyzes the overlap between<br />

classes. Both are based on the Hough Space Measure for unclassified data presented <strong>in</strong> the<br />

previous Section 3.1.4.<br />

Similarity Measure 7<br />

The Similarity Measure (SM) is a direct extension <strong>of</strong> the HSM presented before for unclassified<br />

data. For visualizations conta<strong>in</strong><strong>in</strong>g class <strong>in</strong>formation, the di erent classes are usually<br />

represented by di erent colors. We separate the classes <strong>in</strong>to dist<strong>in</strong>ct images, conta<strong>in</strong><strong>in</strong>g<br />

only the pixels <strong>in</strong> the respective class color, and compute a quality measure s k for each<br />

class, us<strong>in</strong>g Equation 3.15. Thereafter, an overall quality value SM is computed as the<br />

sum <strong>of</strong> all class quality measures:<br />

SM = ÿ s k . (3.17)<br />

k<br />

Us<strong>in</strong>g this measure, we encourage visualizations with strong <strong>in</strong>ner class similarities and<br />

slightly penalize overlapped classes. Note that due to the classes overlap, some classes<br />

have many miss<strong>in</strong>g pixels, which results <strong>in</strong> a lower s k value compared to other visualizations<br />

where less or no overlap between the classes exists.<br />

Overlap Measure 8<br />

In order to penalize overlap between classes, we analyze the di erence between the classes<br />

<strong>in</strong> the Hough space (see Section 3.1.4). As <strong>in</strong> the SM, for the Overlap Measure we also<br />

separate the classes to di erent images and compute the Hough transform over each image.<br />

Once we have a Hough space h for each class, we compute the quality measure as the sum<br />

<strong>of</strong> the absolute di erence between the classes:<br />

M≠1 ÿ Mÿ Pÿ<br />

OM =<br />

|hk i ≠ hl|. i (3.18)<br />

k=1 l=k+1 i=1<br />

Here M is the number <strong>of</strong> Hough space images, i.e. classes respectively and P is the number<br />

<strong>of</strong> pixels <strong>in</strong> each image. The measure value is high if the Hough spaces are disjo<strong>in</strong>t, i.e. if<br />

there is no large overlap between the classes. Therefore, the visualization with the smallest<br />

overlap between the classes receives the highest measure values.<br />

7 Implemented and described by our partners from Braunschweig, G. Albuquerque and M. Eisemann,<br />

for the collaborative publication [133]. Adapted and slightly changed for the thesis by myself.<br />

8 Implemented and described by our partners from Braunschweig, G. Albuquerque and M. Eisemann,<br />

for the collaborative publication [133]. Adapted and slightly changed for the thesis by myself.


3.1.6 Application on Real <strong>Data</strong> Sets 41<br />

Another valuable use <strong>of</strong> this measure is to encourage or search for similarities between<br />

di erent classes. In this case, the overlap between the classes is desired, and the previously<br />

computed measure can be <strong>in</strong>verted to compute suitable quality values:<br />

OM <strong>in</strong>v = 1<br />

OM . (3.19)<br />

3.1.6 Application on Real <strong>Data</strong> Sets<br />

To evaluate our measures we tested them on a variety <strong>of</strong> di erent real data sets. We applied<br />

our Class Density Measure (CDM), Class Separat<strong>in</strong>g Measure (CSM), Histogram Density<br />

Measure (HDM), Similarity Measure (SM), and Overlap Measure (OM) on classified data<br />

to f<strong>in</strong>d views that try to either separate or show similarities between the classes. For<br />

unclassified data, we applied our Rotat<strong>in</strong>g Variance Measure (RVM) and Hough Space<br />

Measure (HSM) <strong>in</strong> order to f<strong>in</strong>d l<strong>in</strong>ear or non-l<strong>in</strong>ear correlations and clusters <strong>in</strong> the data<br />

sets, respectively.<br />

Except for the HDM, we chose to present only relative measures, i.e. all calculated<br />

values are scaled so that the best found visualization is assigned 100 and the worst 0.<br />

This scal<strong>in</strong>g is <strong>in</strong>tended to ease the <strong>in</strong>terpretability <strong>of</strong> the measure by the user. For<br />

the HDM, we chose to present the unchanged measure values, as the HDM allows an<br />

easy direct <strong>in</strong>terpretation, with a value <strong>of</strong> 100 be<strong>in</strong>g the best and 0 be<strong>in</strong>g the worst<br />

possible constellation. If not otherwise stated, our examples are pro<strong>of</strong>-<strong>of</strong>-concepts, and<br />

<strong>in</strong>terpretations <strong>of</strong> some <strong>of</strong> the results should be provided by doma<strong>in</strong> experts.<br />

<strong>Data</strong> Sets<br />

We used the data sets summarized <strong>in</strong> Table 3.2 to show the measures’ properties. In this<br />

table we present some <strong>in</strong>formation about the data. More details about the data sources<br />

and the dimensions names <strong>of</strong> each data set can be found <strong>in</strong> Appendix A.<br />

Table 3.2: Overview over the data sets used to show the measures properties.<br />

data set name records dimensions 9 classes source<br />

Cars 7404 22 2 partners<br />

Olives 572 8 9 [163]<br />

Park<strong>in</strong>son’s Disease 195 11 0 [95, 96]<br />

W<strong>in</strong>e 178 13 3 [53]<br />

Wiscons<strong>in</strong> Diagnostic Breast Cancer 569 30 2 [131]<br />

Cars conta<strong>in</strong>s 7404 cars listed with 23 di erent attributes, <strong>in</strong>clud<strong>in</strong>g price, power, fuel<br />

consumption, width, height and others, automatically collected from a national second<br />

hand car sell<strong>in</strong>g website 10 . We chose the attribute fuel as a class label, hav<strong>in</strong>g the data<br />

divided <strong>in</strong> two classes, benz<strong>in</strong>e and diesel. Our goal is to f<strong>in</strong>d the similarities and di erences<br />

between these.<br />

9 The number <strong>of</strong> dimensions doesn’t count the class attribute <strong>in</strong>.<br />

10 Collected by another <strong>in</strong>stitute from Braunschweig and provided to our partners there.


42 Chapter 3. Quality Measures based <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

Best ranked views us<strong>in</strong>g RVM<br />

100 - (dim9,dim12) 97 - (dim2,dim3) 75 - (dim2,dim4)<br />

Worst ranked views us<strong>in</strong>g RVM<br />

0 - (dim6,dim8) 0.3 - (dim7,dim8) 5.6 - (dim2,dim8)<br />

Figure 3.7: Results for the Park<strong>in</strong>son’s Disease data set us<strong>in</strong>g our RVM measure (Section 3.1.2).<br />

While clumpy low-correlation bear<strong>in</strong>g views are punished (bottom row), views conta<strong>in</strong><strong>in</strong>g higher<br />

correlation between the variables are preferred (top row).<br />

Olives is a classified data set with 572 olive oil samples from n<strong>in</strong>e di erent regions <strong>in</strong><br />

Italy [163]. For each sample the normalized concentrations <strong>of</strong> eight fatty acids are given.<br />

The large number <strong>of</strong> classes (regions) poses a challeng<strong>in</strong>g task to the algorithms try<strong>in</strong>g to<br />

f<strong>in</strong>d views <strong>in</strong> which all classes are well separated.<br />

Park<strong>in</strong>son’s Disease is a data set composed <strong>of</strong> 195 biomedical voice measures from<br />

31 people, <strong>of</strong> which 23 with Park<strong>in</strong>son’s disease [95, 96]. Each <strong>of</strong> the 12 dimensions is<br />

a particular voice measure. The voice record<strong>in</strong>gs from these <strong>in</strong>dividuals have been taken<br />

with the goal to discrim<strong>in</strong>ate healthy people from those with Park<strong>in</strong>son’s disease.<br />

W<strong>in</strong>e is a classified data set with 178 <strong>in</strong>stances and 13 attributes describ<strong>in</strong>g chemical<br />

properties <strong>of</strong> Italian w<strong>in</strong>es derived from three di erent cultivars.<br />

Wiscons<strong>in</strong> Diagnostic Breast Cancer (WDBC) data set consists <strong>of</strong> 569 samples with<br />

30 real-valued dimensions each [131]. The data is classified <strong>in</strong>to malign and benign cells.<br />

The task is to f<strong>in</strong>d the best separat<strong>in</strong>g dimensions show<strong>in</strong>g the two classes.<br />

Scatterplot Measures<br />

First we show the results for RVM on the Park<strong>in</strong>son’s Disease data set 11 .Thethreebest<br />

and the three worst ranked scatterplots by the RVM are shown <strong>in</strong> Figure 3.7, present<strong>in</strong>g<br />

the RVM value above each plot. <strong>High</strong> correlations have been measured <strong>in</strong> the plots<br />

(dim9,dim12 ), (dim2,dim3 ), as well as (dim2,dim4 ). However, visualizations conta<strong>in</strong><strong>in</strong>g<br />

11 For easier read<strong>in</strong>g <strong>of</strong> this paragraph, we renamed the orig<strong>in</strong>al dimension names. Please refer to<br />

Appendix A Table A.3 for the orig<strong>in</strong>al dimension names.


3.1.6 Application on Real <strong>Data</strong> Sets 43<br />

low correlation received a low value, as shown <strong>in</strong> the second row <strong>of</strong> this figure present<strong>in</strong>g<br />

the worst ranked views and their measure values. This example demonstrates that our<br />

target pattern, the correlated dimensions, are correctly identified by the RVM measure.<br />

Best ranked views us<strong>in</strong>g CDM<br />

100 - (dim4,dim5) 97 - (dim1,dim5) 84 - (dim1,dim4)<br />

Worst ranked views us<strong>in</strong>g CDM<br />

0 - (dim6,dim8) 15 - (dim6,dim7) 24 - (dim7,dim8)<br />

Figure 3.8: Results for the Olives data set us<strong>in</strong>g our CDM measure (Section 3.1.3). The di erent<br />

colors depict the di erent classes (regions) <strong>of</strong> the data set. While it is impossible for this data set<br />

to f<strong>in</strong>d views completely separat<strong>in</strong>g all classes, our CDM measure still found views where most <strong>of</strong><br />

the classes are mutually separated (top row). In the worst ranked views the classes clearly overlap<br />

with each other (bottom row).<br />

Best ranked PCA-views us<strong>in</strong>g HDM approach<br />

85.45 - PCA(dim(4,5,8)) 84.98 - PCA(dim(1,2,4,5)) 84.9 - PCA(all data dims)<br />

Figure 3.9: Results for the Olives data set us<strong>in</strong>g our HDM measure (Section 3.1.3). The best<br />

ranked plot is the PCA <strong>of</strong> dim(4,5,8) reveal<strong>in</strong>g a good view on all the classes, the second best is<br />

the PCA <strong>of</strong> dim(1,2,4) and the third is the PCA on all 8 dimensions. The di erences between<br />

the last two are small because the variance <strong>in</strong> that additional dimensions for the 3rd eigenvector<br />

relative to the 2nd, is not big. The di erence between the last two views and the first view is<br />

clearly visible (e.g. look<strong>in</strong>g at the yellow class).


44 Chapter 3. Quality Measures based <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

In Figure 3.8, we show the results for the Olives data set 12 us<strong>in</strong>g our CDM measure.<br />

Even though a view separat<strong>in</strong>g all n<strong>in</strong>e di erent olive classes does not exist, the CDM<br />

reliably choses three views that separate the data quite well <strong>in</strong> the dimensions (dim4,<br />

dim5 ), (dim1,dim5 ) as well as (dim1,dim4 ). The bottom row <strong>of</strong> this figure presents the<br />

worst ranked projections. We can see that <strong>in</strong> these cases it is impossible to identify any<br />

class structure <strong>in</strong> the views.<br />

We also applied our HDM technique to this data set. First the 1D-HDM tries to<br />

identify the best separat<strong>in</strong>g dimensions <strong>of</strong> the data set, as presented <strong>in</strong> Section 3.1.3.<br />

The dimensions dim1, dim2, dim4, dim5 and dim8 were ranked as the best separat<strong>in</strong>g<br />

dimensions by the 1D-HDM. We computed all subsets <strong>of</strong> these dimensions, computed the<br />

PCA on this subsets, and ranked the views <strong>of</strong> the first two PCA components with the<br />

2D-HDM. In the best ranked views, presented <strong>in</strong> Figure 3.9, the di erent classes are well<br />

separated. Compared to the upper row <strong>in</strong> Figure 3.8, the visualization utilizes the screen<br />

space better, which is due to the PCA transformation.<br />

Best ranked views us<strong>in</strong>g CSM<br />

100 - (dim7,dim13) 97 - (dim7,dim10) 93 - (dim7,dim12)<br />

Worst ranked views us<strong>in</strong>g CSM<br />

0 - (dim3,dim5) 0.05 - (dim1,dim5) 0.08 - (dim1,dim4)<br />

Figure 3.10: Results for the W<strong>in</strong>e data set us<strong>in</strong>g our CSM measure (Section 3.1.3). The best ranked<br />

plots present a large distance between the centers <strong>of</strong> the class clusters while the worst ranked views<br />

show only cluttered data.<br />

Compar<strong>in</strong>g our CSM and CDM measures, we can observe that they present dist<strong>in</strong>ct<br />

results on the same data sets. Apply<strong>in</strong>g the CSM to the W<strong>in</strong>e data set 13 reveals views<br />

that present a good separation between the classes. The best ranked plots are shown <strong>in</strong><br />

the upper row <strong>of</strong> Figure 3.10: (dim7,dim13 ), (dim7,dim10 ), and (dim7,dim12 ). They<br />

present a large distance between the centers <strong>of</strong> the class clusters. The worst ranked views,<br />

<strong>in</strong> opposite, show only cluttered data. In comparison, the result for CDM measure on<br />

12 For eas<strong>in</strong>g the read<strong>in</strong>g trough the paragraph, we renamed the orig<strong>in</strong>al dimension names. Please refer<br />

to Appendix A Table A.2 for the orig<strong>in</strong>al dimension names.<br />

13 For eas<strong>in</strong>g the read<strong>in</strong>g trough the paragraph, we renamed the orig<strong>in</strong>al dimension names. Please refer<br />

to Appendix A Table A.4 for the orig<strong>in</strong>al dimension names.


3.1.6 Application on Real <strong>Data</strong> Sets 45<br />

Best ranked views us<strong>in</strong>g CDM<br />

100 - (dim7,dim10) 89 - (dim1,dim7) 88 - (dim7,dim13)<br />

Worst ranked views us<strong>in</strong>g CDM<br />

0 - (dim3,dim5) 0.04 - (dim4,dim8) 0.07 - (dim8,dim9)<br />

Figure 3.11: Results for the W<strong>in</strong>e data set us<strong>in</strong>g our CDM measure (Section 3.1.3). Note that the<br />

second best ranked view, (dim1,dim7) (with CDM = 89), is not considered good us<strong>in</strong>g the CSM<br />

measure (CSM = 58).<br />

the W<strong>in</strong>e data set is depicted <strong>in</strong> the Figure 3.11. The best ranked plots (dim7,dim10 ),<br />

(dim1,dim7 ), and (dim7,dim13 ) present more dense clusters, as expected from the rank<strong>in</strong>g<br />

criteria <strong>of</strong> this measure. Note that the second best ranked view, (dim1,dim7 )(withCDM<br />

= 89), is not considered good us<strong>in</strong>g the CSM measure gett<strong>in</strong>g a lower rank and quality<br />

value (CSM = 58). Compar<strong>in</strong>g Figure 3.10 and Figure 3.11, we can observe that the CSM<br />

favors large distances between the clusters while the CDM assigns high values to views<br />

that present dense but separated clusters, even if the distances between them are much<br />

smaller.<br />

There are cases when just look<strong>in</strong>g at the best ranked and the worst ranked plots is<br />

not enough. By arrang<strong>in</strong>g all the scatterplots <strong>in</strong> a scatterplot matrix the analyst has<br />

the possibility to look at all orthogonal views <strong>of</strong> a data set at once. In our system the<br />

scatterplots are shown <strong>in</strong> the upper right half <strong>of</strong> the SPLOM while the other half is used<br />

to display the quality values <strong>of</strong> each plot. To guide the analysis the user can fade out<br />

lower ranked views, which helps to focus on those with a higher probability <strong>of</strong> <strong>in</strong>formation<br />

bear<strong>in</strong>g content. One drawback is that for a very large number <strong>of</strong> dimensions due to the<br />

quadratically number <strong>of</strong> scatterplots, this SPLOM cannot scale. Figure 3.12 shows an<br />

example. Both SPLOMs show the WDBC data set 14 , but the upper SPLOM shows the<br />

results for the RVM while the bottom SPLOM shows the results for the CDM measure.<br />

The threshold for both SPLOMs was set to 0.95 15 , so all plots with a lower rank have<br />

14 Please refer to Appendix A Table A.5 for details about the orig<strong>in</strong>al dimension names <strong>of</strong> the data set.<br />

15 Please note that the SPLOM shows the measure values between 0 and 1 while all the other results<br />

presented before where on a scale from 0 to 100.


46 Chapter 3. Quality Measures based <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

been faded out. As can be seen <strong>in</strong> the enlarged detail, di erent views come <strong>in</strong>to focus<br />

depend<strong>in</strong>g on the chosen measure. While the RVM considers plots with a high degree <strong>of</strong><br />

correlation as more important, the CDM focuses on separat<strong>in</strong>g the designated classes, here<br />

the malign and benign cells. It depends on the user task what pattern is more important.<br />

Figure 3.12: Results on the WDBC data set for the RVM (top) and the CDM (bottom). In this<br />

example, views with a quality value <strong>of</strong> less than 0.95 have been faded out. This way many irrelevant<br />

views can be faded out reduc<strong>in</strong>g the number <strong>of</strong> the plots to be <strong>in</strong>spected by the user <strong>in</strong> more detail<br />

to a better manageable number.


3.1.6 Application on Real <strong>Data</strong> Sets 47<br />

Parallel Coord<strong>in</strong>ates Measures<br />

To demonstrate the value <strong>of</strong> our approaches for parallel coord<strong>in</strong>ates, we present the best<br />

and worst ranked visualizations by our measures on di erent data sets. The correspond<strong>in</strong>g<br />

visualizations are shown <strong>in</strong> Figure 3.13, 3.14 and 3.15. For a better comparability the<br />

visualizations have been cropped after the display <strong>of</strong> the 4th dimension. In all experiments<br />

we used a size <strong>of</strong> 50 ◊ 50 for the Hough accumulator. The algorithms are quite robust<br />

with respect to the size, and us<strong>in</strong>g more cells generally only <strong>in</strong>creases computation time<br />

but has little <strong>in</strong>fluence on the result.<br />

Figure 3.13 shows the ranked results for the Park<strong>in</strong>sons Disease data set 16 us<strong>in</strong>g our<br />

Hough Space Measure.<br />

The HSM algorithm prefers views with more similarity <strong>in</strong> the distance and <strong>in</strong>cl<strong>in</strong>ation<br />

<strong>of</strong> the di erent l<strong>in</strong>es, result<strong>in</strong>g <strong>in</strong> the prom<strong>in</strong>ent small band <strong>in</strong> the visualization <strong>of</strong> the<br />

Park<strong>in</strong>sons Disease data set. This is similar to clusters <strong>in</strong> the projected views <strong>of</strong> these<br />

dimension, here between dim3 and dim12 as well as dim6 and dim11.<br />

best ranked views us<strong>in</strong>g HSM<br />

100 97<br />

97<br />

worst ranked views us<strong>in</strong>g HSM<br />

0 0.7 1.1<br />

Figure 3.13: Results for the non-classified version <strong>of</strong> the Park<strong>in</strong>sons Disease data set. Best and<br />

worst ranked visualizations us<strong>in</strong>g our HSM measure for non-classified data (ref. Section 3.1.4). Top<br />

row: The three best ranked visualizations and their respective normalized measures. Well def<strong>in</strong>ed<br />

clusters <strong>in</strong> the data set are favored. Bottom row: The three worst ranked visualizations. The large<br />

amount <strong>of</strong> spread exacerbates <strong>in</strong>terpretation. Note that the user task related to this measure is<br />

not to f<strong>in</strong>d possible correlation between the dimensions but to detect good separated clusters.<br />

Apply<strong>in</strong>g our Similarity Measure to the Cars data set we can see that there seem to be<br />

barely any good views to split the clusters <strong>of</strong> the data set (see Figure 3.14). We verified<br />

these by exhaustively look<strong>in</strong>g at all pairwise projections. However, the only dimension<br />

where the classes can be mostly separated and at least some form <strong>of</strong> cluster can be reliably<br />

found is dim6, <strong>in</strong> which cars us<strong>in</strong>g diesel (represented <strong>in</strong> red) generally have a lower value<br />

compared to benz<strong>in</strong>e (represented <strong>in</strong> black). Figure 3.14 shows the best ranked results <strong>in</strong><br />

the top row. Additionally, the similarity <strong>of</strong> the majority <strong>in</strong> dim15, dim18 and dim3 can be<br />

detected. Obviously cars us<strong>in</strong>g diesel are cheaper, this might be due to the age <strong>of</strong> the diesel<br />

cars, but age was unfortunately not <strong>in</strong>cluded <strong>in</strong> the data base. On the other hand, the<br />

worst ranked views us<strong>in</strong>g the SM (see Figure 3.14, bottom row) are barely <strong>in</strong>terpretable<br />

but at least we were unable to extract any useful <strong>in</strong>formation.<br />

In Figure 3.15 the results for our Overlap Measure applied to the WDBC data set are<br />

16 For eas<strong>in</strong>g the read<strong>in</strong>g trough the paragraph, we renamed the orig<strong>in</strong>al dimension names. Please refer<br />

to Appendix A Table A.3 for the orig<strong>in</strong>al dimension names.


48 Chapter 3. Quality Measures based <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

best ranked views us<strong>in</strong>g SM<br />

100 98<br />

98<br />

17 6 15 18 17 6 20 18<br />

3 20 18 15<br />

worst ranked views us<strong>in</strong>g SM<br />

0 0.1 0.2<br />

9 1 19 12<br />

5 19 1 9 9 1 12 19<br />

Figure 3.14: Results <strong>of</strong> the SM for the Cars data set. Cars us<strong>in</strong>g benz<strong>in</strong>e are shown <strong>in</strong> black,<br />

diesel <strong>in</strong> red. Best and worst ranked visualizations us<strong>in</strong>g our Hough Similarity Measure (Section<br />

3.1.5) for parallel coord<strong>in</strong>ates. Top row: The three best ranked visualizations and their respective<br />

normalized measures. Bottom row: The three worst ranked visualizations.<br />

best ranked views us<strong>in</strong>g OM<br />

100 99 99<br />

25 9 24 29<br />

22 9 24 29 25 9 22 29<br />

worst ranked views us<strong>in</strong>g OM<br />

0 0.1 0.2<br />

17 18 31 21<br />

13 31 18 17<br />

13 17 18 31<br />

Figure 3.15: Results <strong>of</strong> the OM for the WDBC data set. Malign nuclei are colored black while<br />

healthy nuclei are red. Best and worst ranked visualizations us<strong>in</strong>g our Overlap Measure (Section<br />

3.1.5) for parallel coord<strong>in</strong>ates. Top row: The three best ranked visualizations. Despite good similarity,<br />

which are similar to clusters, visualizations are favored that m<strong>in</strong>imize the overlap between<br />

the classes, so that the di erence between malign and benign cells becomes more clear. Bottom<br />

row: The three worst ranked visualizations. The overlap <strong>of</strong> the data complicates the analysis and<br />

the <strong>in</strong>formation is useless for the task <strong>of</strong> discrim<strong>in</strong>at<strong>in</strong>g malign and benign cells.<br />

shown. This result is very promis<strong>in</strong>g. In the top row, show<strong>in</strong>g the best plots, the malign<br />

and benign are well separated. It seems that the dimensions dim22 (radius (worst)), dim9<br />

(concave po<strong>in</strong>ts (mean)), dim24 (perimeter (worst)), dim29 (concave po<strong>in</strong>ts (mean)) and<br />

dim25 (area (worst)) separate the two classes well.


3.1.7 Evaluation <strong>of</strong> the Measures’ Performance Us<strong>in</strong>g Synthetic <strong>Data</strong> 49<br />

3.1.7 Evaluation <strong>of</strong> the Measures’ Performance Us<strong>in</strong>g Synthetic <strong>Data</strong><br />

The work presented by Johansson and Johansson [82] <strong>in</strong>troduces a system for dimensionality<br />

reduction by comb<strong>in</strong><strong>in</strong>g user-def<strong>in</strong>ed quality metrics us<strong>in</strong>g weighted functions to<br />

preserve as many important structures as possible <strong>in</strong> the reduced data set. The analyzed<br />

structures are cluster<strong>in</strong>g properties, outliers and dimension correlations. We used the synthetic<br />

data set presented <strong>in</strong> their paper to test our Hough Space Measure. This conta<strong>in</strong>s<br />

1320 data items and 100 variables, <strong>of</strong> which 14 conta<strong>in</strong> significant structures.<br />

The HSM algorithm prefers views with more similarity <strong>in</strong> the distance and <strong>in</strong>cl<strong>in</strong>ation<br />

<strong>of</strong> the di erent l<strong>in</strong>es. We computed our HSM on this synthetical data set and present the<br />

result <strong>in</strong> Figure 3.16. Here we can see the best ranked 4-dimensional parallel coord<strong>in</strong>ates<br />

plots for clustered data po<strong>in</strong>ts <strong>in</strong> the top row and the worst ranked plots <strong>in</strong> the bottom.<br />

At the top, the clusters <strong>of</strong> l<strong>in</strong>es are clearly visible <strong>in</strong> contrast to the bottom where no<br />

structures are visible. The five dimensions that are <strong>in</strong> the best plots are dimensions A,<br />

C, G, I, J. Four out <strong>of</strong> five dimensions are also determ<strong>in</strong>ed by [82] as the best dimensions<br />

for cluster<strong>in</strong>g. They use user-def<strong>in</strong>ed quality measures for their system to determ<strong>in</strong>e the<br />

best dimensions accord<strong>in</strong>g to di erent criteria. Our result<strong>in</strong>g dimensions are a subset <strong>of</strong><br />

their best 9 dimensions for show<strong>in</strong>g clustered data po<strong>in</strong>ts. This provides pro<strong>of</strong> that our<br />

measures are also designed <strong>in</strong> the way that users would rank their plots.<br />

best ranked views us<strong>in</strong>g HSM<br />

100 99.3<br />

98.8<br />

worst ranked views us<strong>in</strong>g HSM<br />

0 0 0.2<br />

Figure 3.16: Results <strong>of</strong> the HSM for the synthetic data set from [82] present<strong>in</strong>g the best and worst<br />

ranked visualizations us<strong>in</strong>g our HSM measure for non-classified data (ref. Section 3.1.4). Top<br />

row: The three best ranked visualizations and their respective normalized measures. Well def<strong>in</strong>ed<br />

clusters <strong>in</strong> the data set are favored. Bottom row: The three worst ranked visualizations. The large<br />

amount <strong>of</strong> spread exacerbates <strong>in</strong>terpretation. Note that the user task related to this measure is<br />

not to f<strong>in</strong>d high correlation between the dimensions but to detect good separated clusters.<br />

To show the e ectivity <strong>of</strong> our scatterplot measures and to expla<strong>in</strong> their di erences, we<br />

analyzed their results on a self-generated synthetical data set - synthetic2. We created a<br />

10-dimensional data set with two classes. By select<strong>in</strong>g just two classes, we aim to show<br />

the fundamental di erences between the measures that allow to detect hidden patterns.<br />

In three dimensions we hid target patterns to test how this projections are ranked by<br />

the measures. The patterns where created as follows: the first pattern <strong>in</strong> subspace (2, 5)<br />

conta<strong>in</strong>s two classes with means at m 1 =(6, 14) and m 2 = A(13, 6), eachB<br />

conta<strong>in</strong><strong>in</strong>g 500<br />

3 2.7<br />

samples from a multivariate normal distribution with C 1 =<br />

the covariance<br />

2.7 3<br />

matrix <strong>of</strong> the variables. In dimension 6 we def<strong>in</strong>ed two classes with means at m 3 =6<br />

respectively m 4 = 13 with 500 random samples <strong>of</strong> a normal distribution and with standard


50 Chapter 3. Quality Measures based <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

deviation std =1.5 for each class. With this def<strong>in</strong>ition <strong>of</strong> the dimensions three patterns<br />

<strong>in</strong> subspaces (2, 5), (2, 6) and (5, 6) occur.<br />

In the other 7 dimensions we def<strong>in</strong>ed random patterns. This are developed systematically,<br />

by tak<strong>in</strong>g for every dimension the mean m d = 10 and 1000 samples from a normal<br />

distribution start<strong>in</strong>g from a standard deviation std =0.5 and <strong>in</strong>creas<strong>in</strong>g this with 0.5 for<br />

each dimension. Therefore, the last random dimension has the std =3.5.<br />

Figure 3.17: Matrix for the synthetical data set with scatterplots above the ma<strong>in</strong> diagonal and<br />

parallel coord<strong>in</strong>ate plots bellow.<br />

In Figure 3.17, we present the scatterplot matrix <strong>of</strong> the synthetical data set show<strong>in</strong>g the<br />

scatterplots above the ma<strong>in</strong> diagonal and the parallel coord<strong>in</strong>ate plots under the diagonal.<br />

We ranked all these plots with our measures for scatterplots and parallel coord<strong>in</strong>ates.<br />

The results are presented <strong>in</strong> Figure 3.18. For every measure we show a po<strong>in</strong>t chart conta<strong>in</strong><strong>in</strong>g<br />

the sorted measure results. The target patterns are marked red <strong>in</strong> each plot. It<br />

can be seen that all measures ranked as best plot one <strong>of</strong> the target patterns.<br />

The scatterplot measures for classified data CDM and CSM found all the three target<br />

patterns as the best projections <strong>of</strong> the data set. This confirms our assumption that this<br />

measures search for the projections with the best class separability and the most dense<br />

classes. The RVM designed for data sets without classes was computed on the same data<br />

set with no class <strong>in</strong>formation. (Note that this means that RVM was measured on plots<br />

like <strong>in</strong> Figure 3.17 that have no di erent colors for the data po<strong>in</strong>ts.) The best ranked<br />

scatterplot by RVM is (2, 5) hav<strong>in</strong>g the most dense target pattern. RVM is aimed to f<strong>in</strong>d


3.1.7 Evaluation <strong>of</strong> the Measures’ Performance Us<strong>in</strong>g Synthetic <strong>Data</strong> 51<br />

Scatterplot Measures<br />

RV M<br />

Parallel Coord<strong>in</strong>ates Measures<br />

HSM<br />

0 20 40 60 80 100<br />

0 20 40 60 80 100<br />

0 10 20 30 40<br />

CDM<br />

0 10 20 30 40<br />

OM<br />

0 20 40 60 80 100<br />

0 20 40 60 80 100<br />

0 10 20 30 40<br />

CSM<br />

0 10 20 30 40<br />

SM<br />

0 20 40 60 80 100<br />

0 20 40 60 80 100<br />

0 10 20 30 40<br />

0 10 20 30 40<br />

1D ≠ HDM<br />

40 50 60 70 80 90 100<br />

0 10 20 30 40<br />

Figure 3.18: Results <strong>of</strong> the 7 measures for classified and unclassified data. The left column shows<br />

the result for the scatterplot measures and the right column for the parallel coord<strong>in</strong>ates measures.<br />

The ranks are sorted decreas<strong>in</strong>g and the target patterns are marked with red crosses.


52 Chapter 3. Quality Measures based <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

●<br />

●<br />

●<br />

Comp.2<br />

−5 0 5<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●●<br />

● ●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ● ● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

● ●<br />

●●<br />

●<br />

● ●<br />

●<br />

● ● ●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●●<br />

●<br />

● ●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●●●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●● ●<br />

● ● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

−5 0 5 10<br />

Comp.1<br />

Figure 3.19: Scatterplot <strong>of</strong> the first two components <strong>of</strong> the PCA over dimensions 2, 5 and 6.<br />

the scatterplots with the highest correlations. We can see that <strong>in</strong> subspace (2, 5) is the<br />

target pattern with the highest correlation. The second target pattern <strong>in</strong> (2, 6) shows two<br />

clusters with high correlation, and is also found by the RVM.<br />

The 1D-HDM ranked best all the target patterns with a result <strong>of</strong> 100. This synthetical<br />

data set is unfortunately <strong>in</strong>applicable to test the 2D-HDM because the patterns are<br />

def<strong>in</strong>ed along the data dimensions and therefore the 1D-HDM f<strong>in</strong>ds the best projection.<br />

Comput<strong>in</strong>g the PCA and search<strong>in</strong>g for a better projection <strong>of</strong> the pr<strong>in</strong>cipal components is<br />

not necessary because the value <strong>of</strong> 100 cannot be improved. Apply<strong>in</strong>g the PCA to the<br />

best dimensions selected by the 1D-HDM (2, 5 and 6), we obta<strong>in</strong> the plot shown <strong>in</strong> Figure<br />

3.19. These best components <strong>of</strong> the PCA are also ranked with 100 by the 2D-HDM.<br />

Note that the result<strong>in</strong>g plot is not visually better then the orthogonal projection (2, 5)<br />

and no additional <strong>in</strong>formation can be obta<strong>in</strong>ed through the PCA.<br />

The parallel coord<strong>in</strong>ates measures are designed to target di erent patterns. HSM ranks<br />

best parallel coord<strong>in</strong>ates plots for unclassified data with similar positions and directions,<br />

i.e. clusters. For classified data, SM looks for this clusters tak<strong>in</strong>g the classes <strong>in</strong>to account,<br />

and OM is designed to f<strong>in</strong>d parallel coord<strong>in</strong>ates plots hav<strong>in</strong>g classes with fewest overlap.<br />

In the po<strong>in</strong>t charts <strong>of</strong> the right column <strong>of</strong> Figure 3.18, we see that all the measures<br />

for parallel coord<strong>in</strong>ates ranked best one <strong>of</strong> our target patterns. HSM analyzed the data<br />

with no class <strong>in</strong>formation and ranked as best plot (5, 6) where two classes are visible. OM<br />

ranked also (5, 6) as the best because this plot has the smallest overlap between the two<br />

classes. SM ranked two target patterns <strong>in</strong> top 3: (5, 6) as the best, and (2, 6) as third<br />

best, present<strong>in</strong>g l<strong>in</strong>es <strong>in</strong> the two classes with almost the same positions and directions.<br />

This evaluation is only a start<strong>in</strong>g po<strong>in</strong>t for an evaluation <strong>of</strong> every possible parameter<br />

comb<strong>in</strong>ation. In the future, a complete statistical analysis <strong>of</strong> the correlation between the<br />

measures and the correlation to the ground truth will be necessary. In the follow<strong>in</strong>g, we<br />

briefly outl<strong>in</strong>e the basic steps for the future evaluation process:


3.1.8 Conclusion and Future Work 53<br />

1. Def<strong>in</strong>e ground truth. The ground truth should be generated <strong>in</strong> a synthetic data<br />

set hav<strong>in</strong>g two <strong>in</strong>dependent variables, as the density and separability <strong>of</strong> classes.<br />

2. Vary the number <strong>of</strong> classes. The synthetical data sets have to have di erent<br />

numbers <strong>of</strong> classes.<br />

3. Vary the number <strong>of</strong> dimensions. The synthetical data sets have to have di erent<br />

numbers <strong>of</strong> dimensions. They should simulate di erent types <strong>of</strong> high-dimensional<br />

data: small data sets – 2 to 9 dimensions, medium data sets – 10 to 49 dimensions,<br />

and large data sets – 50 to 100 dimensions.<br />

4. Statistical analysis. Make a statistical analysis <strong>of</strong> the correlation between the<br />

measures and a correlation to the ground truth.<br />

3.1.8 Conclusion and Future Work<br />

In this sections, we presented several methods to aid and potentially speed up the visual<br />

exploration process for di erent visualization techniques. In particular, we automated the<br />

rank<strong>in</strong>g <strong>of</strong> scatterplot and parallel coord<strong>in</strong>ates visualizations for classified and unclassified<br />

data for the purpose <strong>of</strong> correlation and cluster separation. In the next section, a ground<br />

truth is generated by lett<strong>in</strong>g users choose the most relevant visualizations from a manageable<br />

test set. To prove our methods, we compare them to the automatically generated<br />

rank<strong>in</strong>g. Some limitations are recognized as it is not always possible to f<strong>in</strong>d good separat<strong>in</strong>g<br />

views due to a grow<strong>in</strong>g number <strong>of</strong> classes and due to some multivariate relations.<br />

This is a general problem and not related to our techniques.<br />

The limitations <strong>of</strong> the above presented approach are <strong>of</strong> course determ<strong>in</strong>ed by the<br />

task, data complexity, and the measures applied to f<strong>in</strong>d the requested patterns. Tasks<br />

might be <strong>of</strong> di erent types, such as f<strong>in</strong>d<strong>in</strong>g outliers, significant patterns, di erent types<br />

<strong>of</strong> correlations between the dimensions etc. The complexity <strong>of</strong> the data can be described<br />

by the number <strong>of</strong> dimensions, the number <strong>of</strong> conta<strong>in</strong>ed classes, and the clarity <strong>of</strong> patterns<br />

(noise, over-plott<strong>in</strong>g, and distribution <strong>of</strong> the data). This complexity strongly <strong>in</strong>fluences<br />

the ability <strong>of</strong> measures to detect the required patterns. There are a number <strong>of</strong> measures<br />

<strong>in</strong> the doma<strong>in</strong> <strong>of</strong> high-dimensional data visualization assess<strong>in</strong>g di erent types <strong>of</strong> tasks and<br />

di erent applicability levels for di erent data sets. However, creat<strong>in</strong>g a data-task-measure<br />

taxonomy for our doma<strong>in</strong> is out <strong>of</strong> scope <strong>of</strong> this thesis, however, we strongly recommend<br />

this for future research. In Section 4.2, we will also present the results <strong>of</strong> a data-measure<br />

taxonomy with the focus on one task, namely the class separation <strong>in</strong> visualization.<br />

Our current approach is therefore to describe systematically the functionality <strong>of</strong> the<br />

presented measures as a function <strong>of</strong> their ability to detect hidden patterns <strong>in</strong> the data for<br />

a particular task. Our results have to be handled accord<strong>in</strong>gly.<br />

The comparison to other exist<strong>in</strong>g measures should be considered <strong>in</strong> future work. Furthermore,<br />

issues such as over-plott<strong>in</strong>g need to be part <strong>of</strong> the study s<strong>in</strong>ce they were currently<br />

disregarded. Scalability concerns will need to be addressed <strong>in</strong> future research under the<br />

constra<strong>in</strong>t <strong>of</strong> data complexity and heuristics to reduce the search space for target patterns.


54 Chapter 3. Quality Measures based <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

3.2 Quality Measures and Human Perception – An Empirical Study<br />

Quality measures have been devised to automatically extract <strong>in</strong>terest<strong>in</strong>g visual representations<br />

out <strong>of</strong> a large number <strong>of</strong> available candidates <strong>in</strong> the exploration <strong>of</strong> high-dimensional<br />

databases. The measures permit for <strong>in</strong>stance to search with<strong>in</strong> a large set <strong>of</strong> scatterplots<br />

(e.g., <strong>in</strong> a scatterplot matrix) and select the views that conta<strong>in</strong> the best separation among<br />

clusters. The rationale beh<strong>in</strong>d these techniques is that automatic selection <strong>of</strong> “best” views<br />

is not only useful but also necessary when the number <strong>of</strong> potential projections exceeds the<br />

limit <strong>of</strong> human <strong>in</strong>terpretation. While useful as a concept <strong>in</strong> general, such metrics received<br />

so far limited validation <strong>in</strong> terms <strong>of</strong> human perception. In this chapter, we present a<br />

perceptual study <strong>in</strong>vestigat<strong>in</strong>g the relationship between human <strong>in</strong>terpretation <strong>of</strong> clusters<br />

<strong>in</strong> 2D scatterplots and the measures automatically extracted out <strong>of</strong> them. Specifically<br />

we compare a series <strong>of</strong> selected metrics and analyze how they predict human detection<br />

<strong>of</strong> clusters. A thorough discussion <strong>of</strong> results follows with reflections on their impact and<br />

directions for future research.<br />

Our empirical evaluation is based on a user study where users had to select projections<br />

<strong>of</strong> attribute-comb<strong>in</strong>ations well suited for classify<strong>in</strong>g the data under <strong>in</strong>spection. The study<br />

then compares the scores <strong>of</strong> the selected scatterplots with the score obta<strong>in</strong>ed by the selected<br />

quality measures to analyze their correlation. The outcome <strong>of</strong> the study permits primarily<br />

to validate the assumption that the selection <strong>of</strong> views best ranked by quality measures is a<br />

viable way to simulate the selection <strong>of</strong> users. Furthermore, the study permits to compare<br />

the performance <strong>of</strong> the measures employed and kick-start a quality measures benchmark<br />

process, where metrics are compared aga<strong>in</strong>st a basel<strong>in</strong>e represented by the results obta<strong>in</strong>ed.<br />

In summary, the ma<strong>in</strong> contributions <strong>of</strong> this section are:<br />

• A validation <strong>of</strong> the hypothesis that quality measures can simulate the selection <strong>of</strong><br />

best views by human be<strong>in</strong>gs;<br />

• A comparison among a set <strong>of</strong> promis<strong>in</strong>g and established measures;<br />

• The provision <strong>of</strong> a first benchmark framework, through which it is possible to compare<br />

new quality metrics.<br />

The rest <strong>of</strong> the chapter is organized as follows. Section 3.2.1 describes the measures<br />

employed <strong>in</strong> the study <strong>in</strong> details. Section 3.2.2 describes the whole experiment design and<br />

Section 3.2.3 presents the results. Section 3.2.4 discusses the results obta<strong>in</strong>ed <strong>in</strong> the study<br />

o er<strong>in</strong>g a vision on how they can be <strong>in</strong>terpreted and exploited <strong>in</strong> the future. Section 3.2.5<br />

provides a description how to set up a framework for user based evaluation <strong>of</strong> quality<br />

metrics as suggested <strong>in</strong> this section. F<strong>in</strong>ally, Section 3.2.6 provides the conclusions.<br />

3.2.1 Measures<br />

For this study we have selected quality metrics from [129] and from Section 3.1.3 ([133])<br />

that where developed specifically for scatterplots with classified data. In both cases the authors<br />

propose automatic analysis methods to extract potentially relevant visual structures<br />

from a set <strong>of</strong> candidate visualizations.<br />

Our study is based on the Class Density Measure (CDM) and the Histogram Density<br />

Measure (HDM) presented <strong>in</strong> Section 3.1.3. These two measures where also described<br />

<strong>in</strong> [133].


3.2.1 Measures 55<br />

In [129] Sips et al. also present similar work. They provide measures for rank<strong>in</strong>g<br />

scatterplots with classified and unclassified data. They propose two additional quantitative<br />

measures on class consistency: one based on the distance to the cluster centroids, and<br />

another based on the entropies <strong>of</strong> the spatial distributions <strong>of</strong> classes. The paper also<br />

describes an <strong>in</strong>itial small user study where user selections are compared the outcomes <strong>of</strong><br />

the proposed methods. From this work we adopt the Class Consistency Measure (CCM).<br />

The authors present a measure called Class Density Measure that, although hav<strong>in</strong>g the<br />

same name as our measure presented <strong>in</strong> Section 3.1.3, di ers from our Class Density<br />

Measure. It is <strong>in</strong> fact similar to the HDM measure and is therefore not <strong>in</strong>cluded <strong>in</strong> the<br />

analysis.<br />

For a better overview the metrics are summarized <strong>in</strong> Table 3.3.<br />

Table 3.3: Overview <strong>of</strong> the analyzed measures with the reference for additional details.<br />

Measure<br />

Reference<br />

Distance Consistency Measure (DCM) [129]<br />

1D Histogram Density Measure (1D-HDM)<br />

2D Histogram Density Measure (2D-HDM) 3.1.3 & [133]<br />

Class Density Measure (CDM)<br />

The follow<strong>in</strong>g is based on the assumption that each cluster <strong>in</strong> the data is uniquely<br />

labeled (either manually or through some form <strong>of</strong> n-dimensional cluster<strong>in</strong>g algorithm) and<br />

that for each po<strong>in</strong>t it is possible to know to which cluster it perta<strong>in</strong>s. F<strong>in</strong>ally, <strong>in</strong> the<br />

visualizations shown here, and those used <strong>in</strong> the experiment, each cluster is colored with<br />

auniquehue.<br />

We will not provide extensive formal specifications and details on the metrics. For<br />

additional details and further discussions on their limits and capabilities please refer to<br />

the orig<strong>in</strong>al papers [129] and [133], and the previous Section 3.1.3.<br />

Distance Consistency Measure<br />

The Distance Consistency Measure (DCM) presented by Sips et al. <strong>in</strong> [129] is based<br />

on the distance <strong>of</strong> data po<strong>in</strong>ts to their cluster centroid. The measure assumes the calculation<br />

<strong>of</strong> a cluster<strong>in</strong>g model <strong>in</strong> the n-dimensional space and computes a specific value for<br />

a given 2D projection by project<strong>in</strong>g po<strong>in</strong>ts and centroids on the selected 2D space.<br />

More precisely, the algorithm is based on the calculation <strong>of</strong> how many po<strong>in</strong>ts violate<br />

the distance to centroid measure. For any given po<strong>in</strong>t the distance to its centroid <strong>in</strong> the<br />

n-dimensional space must always be lower than the distance to any other cluster centroid.<br />

However, when data is projected on a specific 2D space, this property can be violated. For<br />

a given projection, the measure is therefore calculated as the proportion <strong>of</strong> data po<strong>in</strong>ts<br />

that violate the centroid distance measure.<br />

The Distance Consistency Measure (DCM) based on the centroid distance is consequently<br />

calculated as follows:<br />

|x Õ œ v(X) :CD(x Õ ,centr Õ (c clabel(x) )) ”= true|<br />

[129] (3.20)<br />

k<br />

where x Õ is the 2D projection <strong>of</strong> the data po<strong>in</strong>t x, centr Õ (c clabel(x) ) is the centroid pro-


56 Chapter 3. Quality Measures based <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

jection <strong>of</strong> the centroid <strong>of</strong> the class <strong>of</strong> x (clabel(x)), and k the number <strong>of</strong> data po<strong>in</strong>ts.<br />

CD(x Õ ,centr Õ (c clabel(x) )) the centroid distance function, that describes that the distance<br />

<strong>of</strong> any po<strong>in</strong>t to his class centroid is m<strong>in</strong>imal <strong>in</strong> comparison to the distance to all other<br />

centroids. In other words, the percentage <strong>of</strong> po<strong>in</strong>ts that do not satisfy this property is<br />

calculated.<br />

Histogram Density Measure (1D and 2D)<br />

The Histogram Density Measure (HDM) approach presented <strong>in</strong> Section 3.1.3 is describ<strong>in</strong>g<br />

two quality measures for scatterplots with class <strong>in</strong>formation.<br />

For comput<strong>in</strong>g the 1D Histogram Density Measure (1D-HDM), data is projected<br />

over onto axis and a histogram is calculated to describe the distribution <strong>of</strong> the data po<strong>in</strong>ts<br />

over it. S<strong>in</strong>ce there are po<strong>in</strong>ts perta<strong>in</strong><strong>in</strong>g to di erent classes (i.e., clusters), the measure is<br />

based on the analysis <strong>of</strong> the amount <strong>of</strong> overlap among po<strong>in</strong>ts <strong>of</strong> di erent classes <strong>in</strong> the same<br />

histogram b<strong>in</strong>. The measure is <strong>in</strong>tended to isolate plots that show good class separations.<br />

Consequently, HDM looks for correspond<strong>in</strong>g histograms that show significant separation,<br />

and this property holds when the histogram b<strong>in</strong>s conta<strong>in</strong> only po<strong>in</strong>ts <strong>of</strong> one class.<br />

In order to measure this property, the approach uses entropy and axes rotation. Several<br />

<strong>in</strong>stances <strong>of</strong> the same 2D projection are computed, each with a di erent rotation factor.<br />

For each one an average entropy value is computed and the best rank among the rotation<br />

is selected as the measure’s value. The computation <strong>of</strong> the entropy values is expla<strong>in</strong>ed <strong>in</strong><br />

Section 3.1.3 <strong>in</strong> more detail.<br />

The 2D Histogram Density Measure (2D-HDM) is an extended version <strong>of</strong> the 1D-<br />

HDM, for which a 2-dimensional histogram on the scatterplot is computed, that is each<br />

b<strong>in</strong> represents a small square over the 2D projection and the b<strong>in</strong> count is the number <strong>of</strong><br />

data po<strong>in</strong>ts fall<strong>in</strong>g with<strong>in</strong> the square. The quality is measured similarly to the 1D-HDM<br />

by summ<strong>in</strong>g up a weighted sum <strong>of</strong> the entropy <strong>of</strong> each b<strong>in</strong>. The measure is normalized<br />

between 0 and 100, hav<strong>in</strong>g 100 for the best data po<strong>in</strong>ts visualization when each b<strong>in</strong> conta<strong>in</strong>s<br />

po<strong>in</strong>ts <strong>of</strong> only one class.<br />

In addition to the 1D-HDM, the b<strong>in</strong> neighborhood is also taken <strong>in</strong>to account <strong>in</strong> 2D-<br />

HDM. For each b<strong>in</strong> the <strong>in</strong>formation <strong>of</strong> po<strong>in</strong>ts p c <strong>in</strong> the b<strong>in</strong> and the direct neighbors labeled<br />

as u c are summed up. The full equation expla<strong>in</strong><strong>in</strong>g the calculation <strong>in</strong> details can be found<br />

<strong>in</strong> Section 3.1.3 and <strong>in</strong> the orig<strong>in</strong>al paper [133].<br />

The extended HDM measure to 2D can also f<strong>in</strong>d projections where classes are like two<br />

concentric circles <strong>of</strong> di erent diameters. In this case, a 1D projection will always have a<br />

big overlap <strong>of</strong> the classes, even if this circles do not overlap <strong>in</strong> 2D or nD.<br />

Class Density Measure<br />

The Class Density Measure (CDM) was also presented <strong>in</strong> detail <strong>in</strong> Section 3.1.3. This<br />

measure evaluates the scatterplots accord<strong>in</strong>g to their separation properties <strong>of</strong> classes. The<br />

goal is to identify those plots that show m<strong>in</strong>imal overlap between the classes.<br />

In order to compute the overlap between the classes, the method uses a cont<strong>in</strong>uous<br />

representation where the po<strong>in</strong>ts belong<strong>in</strong>g to the same cluster form a separate image. For<br />

each class we have a dist<strong>in</strong>ct image for which a cont<strong>in</strong>uous and smooth density function<br />

based on local neighborhoods is calculated. For each pixel p the distance to its k-th nearest<br />

neighbors N p <strong>of</strong> the same class is computed and the local density is calculated over the


3.2.2 Empirical Evaluation 57<br />

sphere with radius equal to the maximum distance.<br />

Hav<strong>in</strong>g these cont<strong>in</strong>uous density functions available for each class, the mutual overlap<br />

can be estimated by comput<strong>in</strong>g the sum <strong>of</strong> the absolute di erence between each pair and<br />

sum up the results. Section 3.1.3 gives more details about the computation formulas. The<br />

value <strong>of</strong> the metric is high if the densities at each pixel di er as much as possible, i.e., if<br />

one class has a higher density value compared to all others. Therefore, the visualization<br />

with the fewest overlap <strong>of</strong> the classes will be given the highest value. A property <strong>of</strong> this<br />

measure is that it not only estimates separate clusters well, but also estimates clusters<br />

where density di erence is noticeable. This is a great advantage s<strong>in</strong>ce it can ease the<br />

<strong>in</strong>terpretation <strong>of</strong> the data <strong>in</strong> the visualization.<br />

3.2.2 Empirical Evaluation<br />

The follow<strong>in</strong>g section describes the empirical evaluation <strong>of</strong> the described measures for projection<br />

quality. The aim <strong>of</strong> this evaluation is to assess the degree to which these measures<br />

reflect users’ perception <strong>of</strong> a high quality projection. Our method consists therefore <strong>of</strong> a<br />

user study for creat<strong>in</strong>g a basel<strong>in</strong>e and a series <strong>of</strong> measures that all judge the quality <strong>of</strong> a<br />

set <strong>of</strong> scatterplots. The results show the correlation computation between all the measures<br />

with the user graded quality.<br />

Hypotheses<br />

The hypotheses for the analyses were def<strong>in</strong>ed by the features <strong>of</strong> the four di erent automatic<br />

measures.<br />

H1. We expect lowest correlation <strong>of</strong> the 1D-HDM measure with users’ selection s<strong>in</strong>ce this<br />

measure takes only one dimensional projection for comput<strong>in</strong>g the separation quality<br />

<strong>of</strong> the data <strong>in</strong>to account.<br />

H2. <strong>High</strong>er correlation results are expected by the 2D-HDM measure because this extends<br />

its 1D version by creat<strong>in</strong>g a 2D histogram and considers direct neighborhoods <strong>of</strong> each<br />

data po<strong>in</strong>t for the quality computation.<br />

H3. The perceived quality <strong>of</strong> a projection may be even <strong>in</strong>fluenced by the density <strong>of</strong><br />

clusters hav<strong>in</strong>g a m<strong>in</strong>imal overlap, as suggested by the CDM. Here we expect a<br />

strong correlation with the measures’ rank.<br />

H4. F<strong>in</strong>ally, we expect high correlation with users’ selection, when the consistency <strong>of</strong><br />

clusters is computed, which is expressed by the quality <strong>of</strong> separation <strong>of</strong> the clusters.<br />

This is assessed by the DSC as described previously.<br />

In general, we expect a significant positive correlation <strong>of</strong> all these measure with users<br />

selection. However, these measures are also expected to vary <strong>in</strong> their approximation <strong>of</strong><br />

users’ perception, which is expressed by the coe cient <strong>of</strong> determ<strong>in</strong>ation - R 2 - <strong>of</strong> the<br />

regression.


58 Chapter 3. Quality Measures based <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

Participants<br />

Participants were 18 undergraduate students from the faculty <strong>of</strong> natural sciences. All had<br />

extensive experience <strong>in</strong> work<strong>in</strong>g with computers and scatterplots. Students participated<br />

<strong>in</strong> the experiment voluntarily and received no award for participat<strong>in</strong>g <strong>in</strong> the experiment.<br />

<strong>Data</strong> and Plot Selection<br />

To conduct the empirical evaluation, we took the UCI w<strong>in</strong>e data set 17 conta<strong>in</strong><strong>in</strong>g the results<br />

<strong>of</strong> a chemical analysis <strong>of</strong> three w<strong>in</strong>e types grown <strong>in</strong> a specific area <strong>of</strong> Italy. These types are<br />

represented <strong>in</strong> the 178 samples with the results <strong>of</strong> 13 chemical analyses recorded for each<br />

sample. The 13 attributes <strong>of</strong> the data set were pairwise comb<strong>in</strong>ed <strong>in</strong>to 78 scatterplots.<br />

The quality <strong>of</strong> these scatterplots was then computed by the four di erent measures. The<br />

data did not conta<strong>in</strong> any special cases <strong>of</strong> cluster constellation, nor did it have outliers or<br />

hidden data po<strong>in</strong>ts.<br />

The number <strong>of</strong> scatterplot representations to be used <strong>in</strong> the user study was 18, <strong>in</strong><br />

order to keep the performance time reasonably small, to allow a one-page representation<br />

<strong>of</strong> all the scatterplots at once <strong>in</strong> a reasonable size, so that all data po<strong>in</strong>ts can be seen.<br />

The selection <strong>of</strong> the 18 scatterplots was conducted along the distribution <strong>of</strong> the measures’<br />

quality assignment, described as follows:<br />

1. The quality values <strong>of</strong> the measures were normalized between 0 to 1, and assigned to<br />

one quantile.<br />

2. The scatterplots were sampled <strong>in</strong> such a way that the distribution between the<br />

number <strong>of</strong> projections <strong>in</strong> higher and lower quantiles were approximately the same<br />

for all measures.<br />

3. As a result, the distribution <strong>of</strong> quality values <strong>in</strong> each quantile was 4±1.<br />

These selected scatterplots were ordered <strong>in</strong> six columns and three rows and then pr<strong>in</strong>ted<br />

us<strong>in</strong>g a high quality color pr<strong>in</strong>ter. The order <strong>of</strong> the scatterplots was permuted by the Lat<strong>in</strong>square<br />

method, result<strong>in</strong>g <strong>in</strong> 18 di erent sett<strong>in</strong>gs, one for each participant. An example<br />

<strong>of</strong> the set <strong>of</strong> scatterplots used <strong>in</strong> the experiment is shown <strong>in</strong> Figure 3.20. Two orig<strong>in</strong>al<br />

experiment forms are attached <strong>in</strong> Appendix A.2 – Figure A.1 and Figure A.2.<br />

Task<br />

Participants were confronted with a scenario around the w<strong>in</strong>e data set. They were act<strong>in</strong>g<br />

<strong>in</strong> this scenario as a w<strong>in</strong>e-consultant for three di erent types <strong>of</strong> w<strong>in</strong>es. They were told<br />

that their challenge is to analyze a large amount <strong>of</strong> attributes describ<strong>in</strong>g the w<strong>in</strong>es, such as<br />

color saturation, alcohol content, etc. Participants were requested to select projections <strong>of</strong><br />

attribute-comb<strong>in</strong>ations that are well suited for classify<strong>in</strong>g the three di erent types <strong>of</strong> w<strong>in</strong>es.<br />

This task had to be carried out us<strong>in</strong>g a selected set <strong>of</strong> scatterplot views show<strong>in</strong>g attributes<br />

<strong>in</strong> a pair-wise manner. At first, participants were asked to select the five most qualitative<br />

projections for separat<strong>in</strong>g w<strong>in</strong>e types and then order them us<strong>in</strong>g numbers between 1 and 5<br />

(1 <strong>in</strong>dicat<strong>in</strong>g the absolute best representation, and 5 the worst out <strong>of</strong> the five best quality<br />

scatterplots).<br />

17 Source at UCI: www.archive.ics.uci.edu/ml/datasets/W<strong>in</strong>e


●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

● ● ● ●<br />

●<br />

●<br />

● ● ●●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

● ● ● ●<br />

● ● ● ● ●<br />

● ●<br />

●<br />

● ● ●<br />

●<br />

● ● ● ● ● ● ● ● ●●<br />

● ●<br />

●<br />

● ●●<br />

● ● ●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

● ● ●<br />

● ● ●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●●<br />

●<br />

● ●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

● ● ● ●<br />

●●<br />

● ● ● ●<br />

● ● ●<br />

●●<br />

●<br />

●<br />

●<br />

● ●<br />

● ● ● ●<br />

●<br />

● ● ●<br />

● ●●<br />

●●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

● ● ● ●<br />

●<br />

● ● ●●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

● ● ● ● ●<br />

●<br />

●<br />

●<br />

● ● ● ● ●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ●<br />

● ●<br />

● ● ●<br />

● ●<br />

● ● ●<br />

● ●<br />

● ●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

● ●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

● ● ●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ● ●<br />

●<br />

●<br />

●●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●●<br />

●<br />

● ●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

● ●<br />

● ● ● ● ●●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●●<br />

●<br />

● ● ● ●<br />

● ●<br />

●<br />

●<br />

●●<br />

● ● ● ●<br />

● ● ● ●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

● ●●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●●●<br />

●<br />

● ●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●● ●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

● ●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●●<br />

●<br />

●<br />

● ●<br />

● ●●●<br />

●<br />

●●<br />

● ●<br />

●●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ●● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

● ●<br />

● ●●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

● ●<br />

● ● ●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ● ●<br />

●●<br />

● ●<br />

● ●●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

● ● ●<br />

● ●●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

● ●<br />

●<br />

● ● ● ● ●<br />

● ● ●<br />

● ●●<br />

●<br />

● ●●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

● ● ●<br />

● ● ● ●<br />

● ● ● ●●<br />

●●<br />

●<br />

● ● ●<br />

●● ●<br />

●<br />

●<br />

● ● ●<br />

● ● ●●<br />

●<br />

●<br />

●<br />

●●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●●<br />

●<br />

● ●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ● ●<br />

● ●●<br />

● ● ● ● ●<br />

●<br />

● ●<br />

● ●<br />

● ●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ● ● ● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

● ● ●<br />

●<br />

● ● ●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

● ●<br />

● ● ● ● ●<br />

●<br />

●●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●●<br />

●<br />

● ● ●●●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ● ●<br />

● ●<br />

●●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

● ● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●●<br />

● ●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ● ●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

● ● ● ●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

● ● ●<br />

●<br />

●<br />

●●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

● ●<br />

●<br />

● ●●<br />

● ●<br />

● ●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

● ● ● ●●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

● ●<br />

● ●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

3.2.3 Results 59<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

● ● ●<br />

● ●<br />

● ●<br />

●<br />

● ●<br />

● ●<br />

● ●<br />

● ●<br />

● ●<br />

● ●<br />

● ●<br />

● ● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

Figure 3.20: Projections <strong>of</strong> scatterplots used <strong>in</strong> the experiment. Participants had to select the best<br />

five projections and order them by their quality. The order <strong>of</strong> the scatterplots was permuted for<br />

each participant separately us<strong>in</strong>g the Lat<strong>in</strong>-Square method.<br />

Procedure<br />

The experiment consisted <strong>of</strong> two parts. In the first part, participants had to read a short<br />

description <strong>of</strong> the scenario, the task and fill out a short standardized form on general<br />

questions (such as age, study stage, experience with computers and scatterplots) 18 . In<br />

the second ma<strong>in</strong> part <strong>of</strong> the experiment, participants had to perform the task by select<strong>in</strong>g<br />

and order<strong>in</strong>g the five best representations that classified three w<strong>in</strong>e types 19 . Clearly, the<br />

best suited scatterplot is the one that allows a clear dist<strong>in</strong>ction <strong>of</strong> the three w<strong>in</strong>e types<br />

by the two attributes. Participants’ e ectiveness ma<strong>in</strong>ly depended on their ability to read<br />

and <strong>in</strong>terpret scatterplots. The group <strong>of</strong> participants was quite homogeneous with regard<br />

to age and previous education. Expectedly, their performance did not show significant<br />

deviations or anomalies. This was assured by comput<strong>in</strong>g that none <strong>of</strong> the scores is above<br />

or below the triple standard deviation. In order not to be biased towards any <strong>of</strong> the<br />

measures, participants were not directed on how to def<strong>in</strong>e a high quality projection, nor<br />

how to look for dense or consistent clusters.<br />

3.2.3 Results<br />

A l<strong>in</strong>ear regression analysis was carried out us<strong>in</strong>g the Pearson coe cient for assess<strong>in</strong>g<br />

the correlation between users’ classification and the measures’ quality assignment <strong>of</strong> the<br />

selected projections. In order to make the measures comparable, we normalized the assigned<br />

quality measures <strong>in</strong>dividually for the projections between 0 to 1. From the users’<br />

answers we computed the probability <strong>of</strong> select<strong>in</strong>g a projection by count<strong>in</strong>g the number <strong>of</strong><br />

times each projection was selected. These probabilities were weighted with the averaged<br />

ranks assigned by the participants. This resulted <strong>in</strong> a sequential order <strong>of</strong> the projections<br />

reflect<strong>in</strong>g users’ quality preferences. The dependent variable <strong>of</strong> the statistical evaluation<br />

18 Appendix A.2 conta<strong>in</strong>s this general question form (<strong>in</strong> German) <strong>in</strong> Section A.2.1.<br />

19 Appendix A.2 conta<strong>in</strong>s two examples <strong>of</strong> the experiment form (Figure A.1 and Figure A.2).


60 Chapter 3. Quality Measures based <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

was the user rank<strong>in</strong>gs, and each <strong>of</strong> the four measures was one <strong>in</strong>dependent variable <strong>in</strong> separate<br />

computations. The results show significant positive correlation for all four measures<br />

(p


3.2.3 Results 61<br />

2D-HDM and DCM assigned the best quality to the projection exactly as did the users.<br />

CDM assigned for this projection 99% quality (rank 2), and 1D-HDM only 68% quality<br />

(rank 4). The projection <strong>of</strong> users’ highest quality is shown <strong>in</strong> Figure 3.22(a).<br />

The highest quality projection selected by CDM and 1D-HDM is shown <strong>in</strong> Figure 3.22(b).<br />

This projection shows a clear and very dense cluster for one <strong>of</strong> the w<strong>in</strong>e types, however, it<br />

also shows a high overlap for the other two types. Users assigned rank 4 for this projection.<br />

In users’ eye the worst quality projection was the one show<strong>in</strong>g high density <strong>of</strong> all three<br />

w<strong>in</strong>e types but also a high overlap, as shown <strong>in</strong> Figure 3.22(c). This was also confirmed by<br />

three measures, except by the CDM measure that still assigned a quality <strong>of</strong> 26.3% (rank<br />

11) to this projection.<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

● ● ●<br />

● ● ●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

● ● ●<br />

● ●●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

● ●<br />

●<br />

● ● ● ● ●<br />

● ● ●<br />

● ●●<br />

●<br />

●<br />

● ● ●●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ●● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

● ●<br />

● ●<br />

● ● ● ● ●<br />

● ●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

(a) Users’ highest quality<br />

ranked projection was confirmed<br />

by DCM and 2D-<br />

HDM quality measures.<br />

(b) <strong>High</strong>est quality ranked<br />

projection by CDM and<br />

1D-HDM measures.<br />

(c) Users’ lowest quality<br />

ranked projection was confirmed<br />

by DCM, 2D-HDM<br />

and also by 1D-HDM quality<br />

measures.<br />

Figure 3.22: Correlation <strong>of</strong> measures with users’ classification for highest and one lowest quality<br />

projection.<br />

Interest<strong>in</strong>g is also the phenomenon that none <strong>of</strong> the users selected 8 <strong>of</strong> the 18 projections<br />

21 . CDM, however, still assigned 65% quality to one <strong>of</strong> these projections as shown <strong>in</strong><br />

Figure 3.23(a). The highest quality assignment to one <strong>of</strong> these 8 projections was 58% by<br />

1D-HDM, 50% by DCM, and only 40% by 2D-HDM. Surpris<strong>in</strong>gly, the projection shown<br />

<strong>in</strong> Figure 3.23(b) was selected by a user and ranked between the best five, but all the<br />

measures ranked it second to last, or even last by CDM.<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

● ●<br />

● ● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

● ●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

● ●<br />

● ● ●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ●<br />

● ● ●<br />

● ● ●<br />

● ● ●<br />

● ● ●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ● ● ●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ● ● ●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

(a) Not selected by any<br />

user, but ranked by CDM<br />

with 65.<br />

(b) Selected by a user,<br />

ranked by all the measures<br />

second to last, and<br />

by CDM last.<br />

Figure 3.23: Surpris<strong>in</strong>g study results.<br />

21 In Appendix A.2.3 Figure A.3 shows the 8 projections that where not selected by any user.


62 Chapter 3. Quality Measures based <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

In summary, 2D-HDM, tightly followed by DCM, reflected users’ quality assignment<br />

best by reach<strong>in</strong>g the highest and lowest quality rank<strong>in</strong>g accurately, and hav<strong>in</strong>g the highest<br />

R 2 value <strong>of</strong> the correlation. These results should however not <strong>in</strong>dicate that density (CDM)<br />

is unimportant for quality assignments. It should rather motivate to comb<strong>in</strong>e and improve<br />

these measures, so they can su ciently support users <strong>in</strong> their task.<br />

3.2.4 Discussion<br />

In the follow<strong>in</strong>g section we exam<strong>in</strong>e the results <strong>of</strong> the experiment <strong>in</strong> more detail, discuss<strong>in</strong>g<br />

some <strong>of</strong> their potential implications and ideas for further research. As we have noted <strong>in</strong><br />

the results there is a divergence <strong>of</strong> results when the measure takes <strong>in</strong>to account the density<br />

or the amount <strong>of</strong> overlap among the clusters. 2D-HDM together with DCM reflected users<br />

preference for high quality projections better than the others. Intuitively, both density<br />

and overlap should play a role <strong>in</strong> the perception <strong>of</strong> clusters, nonetheless the results <strong>of</strong> our<br />

experiment seem to suggest that separation is more important. Future research will need<br />

to address this issue to establish whether a comb<strong>in</strong>ation <strong>of</strong> measures based on both density<br />

and separation can outperform the others.<br />

Another open issue not <strong>in</strong>vestigated <strong>in</strong> this study, is the <strong>in</strong>fluence di erent shapes <strong>of</strong><br />

clusters might have on user perception and, at the same time, on the proposed measures.<br />

Current results do not permit to di erentiate between the shapes clusters have, even if<br />

the images with highly ranked clusters conta<strong>in</strong> circular shapes.<br />

In relation to this last observation, it is worth notic<strong>in</strong>g that the major factor <strong>in</strong>volved<br />

<strong>in</strong> the separation <strong>of</strong> clusters is the proximity <strong>of</strong> the po<strong>in</strong>ts. This is <strong>of</strong> course not surpris<strong>in</strong>g<br />

as the Gestalt Laws <strong>of</strong> Group<strong>in</strong>g suggest that proximity is the strongest visual features<br />

used by the visual system to extract patterns out <strong>of</strong> images. Nonetheless, we believe<br />

it is worth runn<strong>in</strong>g new studies <strong>in</strong>vestigat<strong>in</strong>g the relationship between the other laws <strong>of</strong><br />

group<strong>in</strong>g (e.g., closure, similarity, cont<strong>in</strong>uation, etc.), users’ perception and additional<br />

quality metrics. Go<strong>in</strong>g along these l<strong>in</strong>es, Section 4.2 presents the results <strong>of</strong> a qualitative<br />

analysis on cluster separation factors. Here di erent plots that show a variety <strong>of</strong> data sets<br />

where analyzed manually to identify what k<strong>in</strong>d <strong>of</strong> patterns are formed by clusters and how<br />

these are identified by current metrics.<br />

Here our experimental task is focused on the perception <strong>of</strong> clusters. However, it is<br />

important to acknowledge that the perception <strong>of</strong> clusters <strong>of</strong> n-dimensional data spaces is<br />

not the only useful task. For <strong>in</strong>stance, the detection <strong>of</strong> outliers for which it is not only<br />

necessary to f<strong>in</strong>d suitable metrics but also to run studies similar to ours, is relevant <strong>in</strong><br />

order to understand the relationship between user perception and the metric. The same<br />

idea can and should be repeated for several user’s tasks, visual patterns, and metrics.<br />

We consider our study only a start<strong>in</strong>g po<strong>in</strong>t <strong>in</strong> this direction, nonetheless, it <strong>in</strong>troduces a<br />

well-reasoned experimental design procedure that can be repeated to explore all we have<br />

outl<strong>in</strong>ed above. For this reason <strong>in</strong> the follow<strong>in</strong>g section, we briefly summarize the common<br />

elements <strong>of</strong> our study design so that it could be repeated <strong>in</strong> future experiments.<br />

F<strong>in</strong>ally, we po<strong>in</strong>t out that the current study focuses exclusively on the correlation and<br />

comparison <strong>of</strong> what metrics and users detect, with an underly<strong>in</strong>g assumption that users’<br />

perception represents a sort <strong>of</strong> optimum. This assumption requires additional <strong>in</strong>vestigation<br />

as computational methods might be able to detect <strong>in</strong>terest<strong>in</strong>g patterns that users cannot<br />

necessarily perceive visually.


3.2.5 Guidel<strong>in</strong>es 63<br />

3.2.5 Guidel<strong>in</strong>es<br />

In the follow<strong>in</strong>g, we briefly outl<strong>in</strong>e the basic steps to repeat <strong>in</strong> new user studies, follow<strong>in</strong>g<br />

the same schema used <strong>in</strong> this study. Our motivation is the desire to facilitate the design<br />

<strong>of</strong> similar studies and to promote the production <strong>of</strong> related studies on the perception <strong>of</strong><br />

visual patterns and their formalization <strong>in</strong> computable metrics.<br />

1. Select a visualization technique. The first element necessary is the selection <strong>of</strong><br />

a specific visualization technique. In our examples we have used scatterplots that is<br />

one <strong>of</strong> the most used techniques <strong>in</strong> visualization. Future studies might <strong>in</strong>clude other<br />

high-dimensional visualization techniques like the ones presented <strong>in</strong> Section 2.2, e.g.,<br />

treemaps, parallel coord<strong>in</strong>ates, l<strong>in</strong>e charts, etc.<br />

2. Select a visual feature. In this phase it is necessary to th<strong>in</strong>k <strong>in</strong> terms <strong>of</strong> what<br />

particular features can be detected <strong>in</strong> the visualization technique under <strong>in</strong>spection.<br />

Note that some concepts recur across several visualization but need a redef<strong>in</strong>ition<br />

for each specific case (e.g., cluster<strong>in</strong>g <strong>in</strong> scatterplots and <strong>in</strong> parallel coord<strong>in</strong>ates).<br />

3. Formalize the feature. This is a fundamental step <strong>in</strong> our design schema. Once<br />

a specific feature has been selected it is necessary to formalize it <strong>in</strong> a way that it<br />

can be computed through an algorithm. In this phase it is advisable to produce<br />

more than one measure <strong>in</strong> order to capture several aspects <strong>of</strong> the same feature. This<br />

also permits to compare the performance <strong>of</strong> the selected measure <strong>in</strong> the study and<br />

acquire additional <strong>in</strong>formation on the visual processes implied <strong>in</strong> the perception <strong>of</strong><br />

the feature.<br />

4. Run a rank-based study. Once the feature has been formalized it is possible to<br />

run a study where the users have to rank the images <strong>in</strong> terms <strong>of</strong> the selected feature.<br />

When the images have been ranked it is possible to compare the ranks given by the<br />

metrics and the ones provided by the users (as suggested <strong>in</strong> our method and design<br />

<strong>of</strong> the study).<br />

5. Study and ref<strong>in</strong>e. The results <strong>of</strong> the algorithms can be compared to the results<br />

obta<strong>in</strong>ed by the users who represent the reference aga<strong>in</strong>st which all measures are<br />

evaluated. The goal <strong>of</strong> this phase is not only to determ<strong>in</strong>e which <strong>of</strong> the metrics<br />

performs best, but also to reason around the results to (1) hunt for <strong>in</strong>terest<strong>in</strong>g<br />

<strong>in</strong>sights about how users perceive the selected feature; (2) design better metrics able<br />

to capture the desired feature with more accuracy.<br />

3.2.6 Conclusion and Future Work<br />

To conclude the research presented <strong>in</strong> this Section 3.2, we would like to recall the contributions<br />

mentioned at the beg<strong>in</strong>n<strong>in</strong>g. Through a user centered evaluation design we<br />

showed that some quality measures are more and some less able to reflect users’ perception.<br />

However, there is still a question as to which extent users are able to preselect good<br />

quality projections <strong>of</strong> their multidimensional data <strong>in</strong> an e cient and unbiased manner.<br />

Our results <strong>in</strong>dicate that further development is needed to f<strong>in</strong>d the ultimate automatic<br />

quality measure. Nevertheless, the provision <strong>of</strong> the first quality benchmark framework,


64 Chapter 3. Quality Measures based <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

with which it is possible to compare di erent metrics is created. Another question regard<strong>in</strong>g<br />

the future development <strong>of</strong> similar studies is whether the accumulation <strong>of</strong> several similar<br />

experiments on di erent visualization techniques and features can be jo<strong>in</strong>ed to create a<br />

uniform model or better understand<strong>in</strong>g <strong>of</strong> how visualization works and how visual patterns<br />

can be formalized. While the answer to this issue is not clear at the moment, it is evident<br />

that at the very least every s<strong>in</strong>gle study has the potential to improve the understand<strong>in</strong>g<br />

and the utilization <strong>of</strong> the selected technique.<br />

In future works, the same techniques can be applied to other visualization methods,<br />

e.g., parallel coord<strong>in</strong>ates, to evaluate the correlation between the specific quality metrics<br />

and the user perception. S<strong>in</strong>ce <strong>in</strong> the current work we focus on cluster detection exclusively,<br />

di erent visual patterns like outliers could be <strong>in</strong>vestigated. Like mentioned <strong>in</strong> Section 3.2.4,<br />

it is also important to analyze how good users perform <strong>in</strong> f<strong>in</strong>d<strong>in</strong>g <strong>in</strong>terest<strong>in</strong>g patterns.


4<br />

A Systematization <strong>of</strong> Quality Metrics <strong>in</strong><br />

<strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization<br />

Contents<br />

„Noth<strong>in</strong>g has such power to broaden the m<strong>in</strong>d as the ability to <strong>in</strong>vestigate<br />

systematically and truly all that comes under thy observation <strong>in</strong> life.”<br />

Marcus Aurelius<br />

4.1 Quality Metrics <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization . . . . 66<br />

4.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69<br />

4.1.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70<br />

4.1.3 Quality Metrics Pipel<strong>in</strong>e . . . . . . . . . . . . . . . . . . . . . . . 71<br />

4.1.4 Systematic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 74<br />

4.1.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79<br />

4.1.6 F<strong>in</strong>d<strong>in</strong>gs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84<br />

4.1.7 Directions for Further Research . . . . . . . . . . . . . . . . . . . 85<br />

4.1.8 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86<br />

4.1.9 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . 86<br />

4.2 <strong>Visual</strong> Cluster Separation Factors: Sketch<strong>in</strong>g a Taxonomy . . . 87<br />

4.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87<br />

4.2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88<br />

4.2.3 <strong>Visual</strong> Cluster Separation Taxonomy . . . . . . . . . . . . . . . . 89<br />

4.2.4 Discussion and Further Research . . . . . . . . . . . . . . . . . . 90<br />

I<br />

n a number <strong>of</strong> recent papers, di erent quality metrics have been proposed to automate<br />

the demand<strong>in</strong>g search through large spaces <strong>of</strong> alternative visualizations (e.g., alternative<br />

projections or order<strong>in</strong>g), allow<strong>in</strong>g the user to concentrate on the most promis<strong>in</strong>g<br />

visualizations suggested by the quality metrics. Over the last decade, this approach has<br />

witnessed a remarkable development, however, few reflections exist on how these methods<br />

are related to each other and how the approach can be developed further. For this<br />

purpose, <strong>in</strong> Section 4.1 we provide an overview <strong>of</strong> approaches that use quality metrics <strong>in</strong><br />

high-dimensional data visualization and propose a systematization based on a thorough<br />

literature review. We carefully analyze the papers and derive a set <strong>of</strong> factors for discrim<strong>in</strong>at<strong>in</strong>g<br />

the quality metrics, visualization techniques, and the process itself. The process is<br />

described through a reworked version <strong>of</strong> the well-known <strong>in</strong>formation visualization pipel<strong>in</strong>e.<br />

We demonstrate the usefulness <strong>of</strong> our model by apply<strong>in</strong>g it to several exist<strong>in</strong>g approaches<br />

that use quality metrics, and we provide reflections on implications <strong>of</strong> our model for future<br />

research.<br />

Another aspect that is worth to be <strong>in</strong>vestigated <strong>in</strong> the context <strong>of</strong> quality metrics, is<br />

their ability to detect di erent types <strong>of</strong> structures <strong>of</strong> high-dimensional data. In Section 4.2


66 Chapter 4. A Systematization <strong>of</strong> Quality Metrics <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization<br />

we present the results <strong>of</strong> an <strong>in</strong>-depth qualitative evaluation <strong>of</strong> two cluster separation measures.<br />

This evaluation is concentrated on scatterplot visualizations (2D, 3D, and SPLOM)<br />

and the most popular task – cluster<strong>in</strong>g. The qualitative data study converged <strong>in</strong>to a taxonomy<br />

<strong>of</strong> visual cluster separation factors for scatterplots, and we shortly report on the<br />

results <strong>in</strong> this section. Beyond that, the outcome <strong>of</strong> the study is used to describe possible<br />

next steps <strong>in</strong> the field, that we deem important to advance the research <strong>in</strong> this area.<br />

Parts <strong>of</strong> this chapter appeared <strong>in</strong> the follow<strong>in</strong>g publications [27, 122].<br />

4.1 Quality Metrics <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization<br />

The extraction <strong>of</strong> relevant and mean<strong>in</strong>gful <strong>in</strong>formation out <strong>of</strong> high-dimensional data is<br />

notoriously complex and cumbersome. The curse <strong>of</strong> dimensionality is a popular way<br />

<strong>of</strong> stigmatiz<strong>in</strong>g the whole set <strong>of</strong> troubles encountered <strong>in</strong> high-dimensional data analysis;<br />

f<strong>in</strong>d<strong>in</strong>g relevant projections, select<strong>in</strong>g mean<strong>in</strong>gful dimensions, and gett<strong>in</strong>g rid <strong>of</strong> noise,<br />

be<strong>in</strong>g only a few <strong>of</strong> them. Multi-dimensional data visualization also carries its own set <strong>of</strong><br />

challenges like, above all, the limited capability <strong>of</strong> any technique to scale to more than an<br />

handful <strong>of</strong> data dimensions.<br />

Researchers have been try<strong>in</strong>g to solve these problems through a number <strong>of</strong> automatic<br />

data analysis and visualization approaches that cover the whole spectrum <strong>of</strong> possibilities:<br />

from fully automatic to fully <strong>in</strong>teractive. <strong>Visual</strong>ization researchers have discovered early<br />

on that search<strong>in</strong>g for <strong>in</strong>terest<strong>in</strong>g patterns <strong>in</strong> this k<strong>in</strong>d <strong>of</strong> data can be done through a mixed<br />

approach, where the mach<strong>in</strong>e based on quality metrics automatically searches through a<br />

large number <strong>of</strong> potentially <strong>in</strong>terest<strong>in</strong>g projections, and the user <strong>in</strong>teractively steers the<br />

process and explores the output through visualization.<br />

The pioneer<strong>in</strong>g work <strong>of</strong> Friedman and Tukey <strong>in</strong> 1974 <strong>in</strong>troduced the idea with their<br />

projection pursuits method [54]. They recognized the limit <strong>of</strong> human be<strong>in</strong>gs <strong>in</strong> explor<strong>in</strong>g<br />

the exponential set <strong>of</strong> projections and tackled the high-dimensionality issue by lett<strong>in</strong>g an<br />

algorithm discover <strong>in</strong>terest<strong>in</strong>g l<strong>in</strong>ear projections <strong>in</strong> 1D (histograms) and 2D (scatterplots)<br />

and lett<strong>in</strong>g the user evaluate the correspond<strong>in</strong>g output.<br />

Dur<strong>in</strong>g the last few years the use <strong>of</strong> this paradigm has witnessed a grow<strong>in</strong>g <strong>in</strong>terest,<br />

and an <strong>in</strong>creas<strong>in</strong>g number <strong>of</strong> techniques has been published <strong>in</strong> key data visualization<br />

conferences, and journals. Quality metrics have been used for very disparate goals such as:<br />

search<strong>in</strong>g for <strong>in</strong>terest<strong>in</strong>g projections, reduc<strong>in</strong>g clutter, and f<strong>in</strong>d<strong>in</strong>g mean<strong>in</strong>gful abstractions.<br />

However, the <strong>in</strong>itial idea <strong>of</strong> quality metrics has been elaborated and expanded so much<br />

further and <strong>in</strong>to so many di erent directions that it is hard to come up with a coherent and<br />

unified picture for them. A reader <strong>of</strong> one <strong>of</strong> these papers may well appreciate the value <strong>of</strong><br />

a s<strong>in</strong>gle technique without hav<strong>in</strong>g a way to place it <strong>in</strong>to a larger context. Also, researchers<br />

who might want to approach this area <strong>of</strong> <strong>in</strong>vestigation for the first time and develop new<br />

techniques may have a hard time appreciat<strong>in</strong>g the whole spectrum <strong>of</strong> possibilities and<br />

directions related to the use <strong>of</strong> quality metrics.<br />

In this section, we move first steps towards fill<strong>in</strong>g this gap. We provide a systematization<br />

<strong>of</strong> us<strong>in</strong>g quality metrics <strong>in</strong> high-dimensional data analysis through a literature review.<br />

We analyzed numerous papers conta<strong>in</strong><strong>in</strong>g quality metrics and went through an iterative


4.1. Quality Metrics <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization 67<br />

process that led to the def<strong>in</strong>ition <strong>of</strong> a number <strong>of</strong> factors and a quality metrics pipel<strong>in</strong>e,<br />

which is <strong>in</strong>spired to the traditional <strong>in</strong>formation visualization pipel<strong>in</strong>e [36].<br />

The extracted factors and the pipel<strong>in</strong>e have the follow<strong>in</strong>g <strong>in</strong>terrelated goals:<br />

1. putt<strong>in</strong>g the exist<strong>in</strong>g methods <strong>in</strong>to a common framework;<br />

2. eas<strong>in</strong>g the generation <strong>of</strong> new research <strong>in</strong> the field;<br />

3. spott<strong>in</strong>g relevant gaps to bridge with future research.<br />

In this section, we provide an extensive explanation <strong>of</strong> the methodology we followed,<br />

the results we obta<strong>in</strong>ed, and their practical use. In particular, we demonstrate this by go<strong>in</strong>g<br />

through a number <strong>of</strong> selected examples how we are able to describe exist<strong>in</strong>g approaches<br />

through the proposed models. Also, we identify a number <strong>of</strong> <strong>in</strong>terest<strong>in</strong>g gaps and give<br />

guidel<strong>in</strong>es on how to carry out new research <strong>in</strong> this area. To the best <strong>of</strong> our knowledge,<br />

despite the numerous techniques that can be categorized under the umbrella <strong>of</strong> qualitymetrics-driven<br />

visualization, this is the first attempt <strong>in</strong> this direction.<br />

Def<strong>in</strong>itions<br />

In order to make the goal and scope <strong>of</strong> our work clear, we provide some <strong>in</strong>itial def<strong>in</strong>itions.<br />

Information <strong>Visual</strong>ization Pipel<strong>in</strong>e: a reference model that describes how to transforms<br />

data <strong>in</strong>to visualizations through a series <strong>of</strong> process<strong>in</strong>g steps, as def<strong>in</strong>ed <strong>in</strong> [36].<br />

Quality Metric: a metric calculated at any stage <strong>of</strong> the <strong>in</strong>formation visualization<br />

pipel<strong>in</strong>e that captures properties useful to extract mean<strong>in</strong>gful <strong>in</strong>formation about the data.<br />

(Please note that we use the terms metric and measure as synonyms <strong>in</strong> this thesis.)<br />

<strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong>: any data set with a dimensionality that is too high to<br />

easily extract mean<strong>in</strong>gful relations across the whole set <strong>of</strong> dimensions. In the context <strong>of</strong><br />

this thesis, any dimensionality higher than 10 is considered high-dimensional.<br />

Our focus is on the analysis <strong>of</strong> methods that apply quality metrics at any stage <strong>of</strong> the<br />

<strong>in</strong>formation visualization pipel<strong>in</strong>e as a way to facilitate the detection and presentation <strong>of</strong><br />

<strong>in</strong>terest<strong>in</strong>g patterns <strong>in</strong> high-dimensional data.<br />

Examples<br />

We first discuss a few short examples <strong>of</strong> the approaches covered <strong>in</strong> our review to familiarize<br />

the reader with the concepts exposed <strong>in</strong> this section and get the feel<strong>in</strong>g <strong>of</strong> their<br />

heterogeneity. They cover a broad selection <strong>of</strong> the factors, denoted with italics, which will<br />

be presented <strong>in</strong> detail <strong>in</strong> Section 4.1.4.<br />

Example 1<br />

We start with a familiar example presented <strong>in</strong> Section 3.1.6 and published <strong>in</strong> [133] where<br />

high-dimensional data sets are analyzed by comput<strong>in</strong>g an <strong>in</strong>terest<strong>in</strong>gness score for every<br />

scatterplot generated with all the possible comb<strong>in</strong>ations <strong>of</strong> axis pairs from the orig<strong>in</strong>al


68 Chapter 4. A Systematization <strong>of</strong> Quality Metrics <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization<br />

Best ranked views us<strong>in</strong>g CDM<br />

100 97 84<br />

Figure 4.1: (Top row <strong>of</strong> Figure 3.8) Rank<strong>in</strong>g projections accord<strong>in</strong>g to the Class Density Measure,<br />

favor<strong>in</strong>g projections with m<strong>in</strong>imal overlap between predef<strong>in</strong>ed classes (i.e., the colors) [133].<br />

data. The score is calculated by runn<strong>in</strong>g image process<strong>in</strong>g algorithms on top <strong>of</strong> each scatterplot<br />

<strong>in</strong> order to detect images with clusters <strong>in</strong> the visualization. The system returns a<br />

list <strong>of</strong> scatterplots as those presented <strong>in</strong> Figure 4.1 sorted <strong>in</strong> order <strong>of</strong> relevance accord<strong>in</strong>g<br />

to the chosen quality measure.<br />

Example 2<br />

Peng et al. <strong>in</strong> [112] provide algorithms to reorder the axes <strong>of</strong> multidimensional data visualizations<br />

(parallel coord<strong>in</strong>ates, scatterplot matrices, glyphs, recursive patterns) <strong>in</strong> order<br />

to reduce clutter and make <strong>in</strong>terest<strong>in</strong>g patterns more clearly visible. For each visualization<br />

a specific quality metric calculated <strong>in</strong> the data space is used to f<strong>in</strong>d the best order<strong>in</strong>g. In<br />

Figure 4.2, we present an example on scatterplot matrix reorder<strong>in</strong>g.<br />

Figure 4.2: Clutter reduction achieved through axes reorder<strong>in</strong>g <strong>in</strong> a scatterplot matrix (<strong>in</strong>itial<br />

visualization on the left, reordered on the right) [112].<br />

Example 3<br />

Johansson et al. <strong>in</strong> [80] study the abstraction obta<strong>in</strong>ed by apply<strong>in</strong>g sampl<strong>in</strong>g or aggregation<br />

algorithms on top <strong>of</strong> parallel coord<strong>in</strong>ates and provide quality metrics to judge when the<br />

abstraction disrupts relevant patterns <strong>in</strong> the data. In Figure 4.3 we show an example from<br />

their work, where on the left the data set conta<strong>in</strong><strong>in</strong>g 16384 items is displayed with parallel


4.1.1 Background 69<br />

The orig<strong>in</strong>al data set conta<strong>in</strong><strong>in</strong>g 16384 items. Target<strong>in</strong>g a visual quality <strong>of</strong> 0.95<br />

reta<strong>in</strong>s 987 items.<br />

Figure 4.3: <strong>Data</strong> abstraction algorithm based on sampl<strong>in</strong>g, aim<strong>in</strong>g at reduc<strong>in</strong>g data size while<br />

preserv<strong>in</strong>g relevant patterns. Orig<strong>in</strong>al visualization on the left with 16384 data items. Sampled<br />

visualization on the right with 987 items and a visual quality <strong>of</strong> 0.95 [80].<br />

coord<strong>in</strong>ates. On the right side they display an image target<strong>in</strong>g a visual quality <strong>of</strong> 0.95<br />

(on a scale from [0,1]) by display<strong>in</strong>g only 987 items. The image quality is calculated by a<br />

screen metric us<strong>in</strong>g distance transforms.<br />

All the approaches have <strong>in</strong> common that they use quality metrics <strong>in</strong> the context <strong>of</strong><br />

high-dimensional data visualization; nonetheless they can di er on a variety <strong>of</strong> aspects.<br />

For <strong>in</strong>stance, <strong>in</strong> Example 1 the purpose is to f<strong>in</strong>d <strong>in</strong>terest<strong>in</strong>g projections, <strong>in</strong> Example 2<br />

the purpose is to reduce clutter, whereas the purpose <strong>in</strong> Example 3 is to f<strong>in</strong>d the right<br />

abstraction level. The approaches can as well di er <strong>in</strong> a number <strong>of</strong> other aspects such<br />

as: the visualization techniques employed, the space <strong>in</strong> which the quality metrics are<br />

calculated, or the level <strong>of</strong> <strong>in</strong>teraction they provide.<br />

Therefore the questions are:<br />

Q1. How we can put all the approaches <strong>in</strong>to a common framework which is able to<br />

highlight commonalities and di erences?<br />

Q2. What are the ma<strong>in</strong> factors through which we can describe them?<br />

Q3. How can we learn from the approaches and build on top <strong>of</strong> them to systematically<br />

move the idea <strong>of</strong> quality-metrics driven visualization forward?<br />

These are the ma<strong>in</strong> questions that motivate our work, and <strong>in</strong> the follow<strong>in</strong>g sections we<br />

will provide the results <strong>of</strong> our <strong>in</strong>vestigation.<br />

4.1.1 Background<br />

While more areas are deal<strong>in</strong>g with quality metrics (see Section 2.3.2), we decided to focus<br />

on the use <strong>of</strong> quality metrics <strong>in</strong> high-dimensional data exploration only. Our <strong>in</strong>itial data<br />

gather<strong>in</strong>g process <strong>in</strong>cluded a broader class <strong>of</strong> papers, <strong>in</strong>clud<strong>in</strong>g those cited <strong>in</strong> Section 2.3.2.<br />

However, we soon realized there is no all encompass<strong>in</strong>g model able to synthesize the<br />

relevant aspects and, at the same time, is useful <strong>in</strong> practice. For this reason, here we<br />

focuses only on the use <strong>of</strong> quality metrics <strong>in</strong> high-dimensional data.


70 Chapter 4. A Systematization <strong>of</strong> Quality Metrics <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization<br />

There exist a number <strong>of</strong> research papers which try to categorize exist<strong>in</strong>g work <strong>in</strong> the<br />

visualization area. We briefly mention some recent ones to put our work <strong>in</strong> a larger context.<br />

In Reth<strong>in</strong>k<strong>in</strong>g <strong>Visual</strong>ization [138] Tory and Möller provide a taxonomy to describe scientific<br />

and <strong>in</strong>formation visualization under the same structure. Ellis and Dix organize a large<br />

number <strong>of</strong> exist<strong>in</strong>g clutter reduction techniques <strong>in</strong>to a clutter reduction taxonomy [49].<br />

Yi et al. review a large number <strong>of</strong> visualization systems to better understand the role <strong>of</strong><br />

<strong>in</strong>teraction <strong>in</strong> visualization [160]. Segel and Heer analyze a large body <strong>of</strong> story tell<strong>in</strong>g<br />

visualizations to identify common design patterns [123]. All these papers share with our<br />

work the need <strong>of</strong> putt<strong>in</strong>g some order <strong>in</strong>to a complex aspect <strong>of</strong> data visualization by start<strong>in</strong>g<br />

from a detailed analysis <strong>of</strong> what researchers and practitioners have proposed <strong>in</strong> the past.<br />

S<strong>in</strong>ce our proposed systematization uses a data visualization pipel<strong>in</strong>e as the basis for<br />

the analysis <strong>of</strong> quality metrics, we deem important to briefly discuss exist<strong>in</strong>g data process<strong>in</strong>g<br />

pipel<strong>in</strong>es. The <strong>in</strong>formation visualization pipel<strong>in</strong>e has been presented by Card et al. [36]<br />

and is widely accepted as the standard process<strong>in</strong>g model for <strong>in</strong>formation visualization. The<br />

pipel<strong>in</strong>e transforms data go<strong>in</strong>g through the follow<strong>in</strong>g stages: raw data, table data, visual<br />

structures and views. At each stage an operator is applied, respectively: data transformation,<br />

visual mapp<strong>in</strong>g, and view transformation. The <strong>Data</strong> State Reference model [39] is<br />

largely based on the <strong>in</strong>formation visualization pipel<strong>in</strong>e and classifies visualizations accord<strong>in</strong>g<br />

to how they use the operators <strong>in</strong> the pipel<strong>in</strong>e. In this regard it is similar to our work<br />

<strong>in</strong> that we also use elements <strong>of</strong> the pipel<strong>in</strong>e to classify the papers we have analyzed. The<br />

KDD pipel<strong>in</strong>e [51] has been developed <strong>in</strong> the early n<strong>in</strong>eties to describe the data process<strong>in</strong>g<br />

stages <strong>in</strong>volved <strong>in</strong> knowledge discovery. The data goes through several stages (selection,<br />

pre-process<strong>in</strong>g, transformation, data m<strong>in</strong><strong>in</strong>g, <strong>in</strong>terpretation/evaluation) lead<strong>in</strong>g to a f<strong>in</strong>al<br />

stage <strong>of</strong> knowledge generation. While we took <strong>in</strong>spiration from this model, as quality metrics<br />

<strong>in</strong>volve automatic computation and visualization, we decided not to use it as a basis<br />

for our work because visualization does not explicitly appear <strong>in</strong> the <strong>in</strong>termediary steps <strong>of</strong><br />

the process. Keim et al. [88] and Bert<strong>in</strong>i et al. [23] present alternative pipel<strong>in</strong>es that show<br />

how automated data analysis algorithms can be <strong>in</strong>cluded <strong>in</strong> the data visualization process.<br />

These papers are also sources <strong>of</strong> <strong>in</strong>spiration for our work as they focus on the <strong>in</strong>tegration<br />

<strong>of</strong> automated algorithms and data visualization.<br />

4.1.2 Methodology<br />

We followed an iterative data gather<strong>in</strong>g, cod<strong>in</strong>g, and model<strong>in</strong>g approach <strong>in</strong>spired to the<br />

methods used <strong>in</strong> grounded theory analysis [130]. We started from a small set <strong>of</strong> papers<br />

about quality metrics we knew from our own experience and used this <strong>in</strong>itial list to derive a<br />

first set <strong>of</strong> descriptive factors. After that, we expanded the list by analyz<strong>in</strong>g the references<br />

conta<strong>in</strong>ed <strong>in</strong> the first set <strong>of</strong> papers and by search<strong>in</strong>g <strong>in</strong> relevant visualization venues. In<br />

particular, we used Google Scholar 1 to search for references to and from the collected<br />

papers. We also expanded our list by targeted keyword search. We also tried to expand<br />

our list by keyword search but it did not produce satisfactory results, ma<strong>in</strong>ly because<br />

many quality metrics paper do not mention the word “quality metrics” <strong>in</strong> their text.<br />

At this stage, we decided to narrow down the scope <strong>of</strong> our study and focus on quality<br />

metrics for high-dimensional data analysis. We discarded the papers that (1) did not explicitly<br />

address high-dimensional data, and (2) did not propose quality metrics systems or<br />

1 http://scholar.google.com/


4.1.3 Quality Metrics Pipel<strong>in</strong>e 71<br />

algorithms. For <strong>in</strong>stance we discarded a number <strong>of</strong> <strong>in</strong>terest<strong>in</strong>g papers on the use <strong>of</strong> quality<br />

metrics for generic data visualizations [79], for graph draw<strong>in</strong>g [45], or the discussions on<br />

generic aspects <strong>of</strong> quality metrics [26].<br />

The first two the authors 2 went <strong>in</strong>dependently through the current list <strong>of</strong> papers,<br />

completed a table with the current version <strong>of</strong> the classification, and took notes on necessary<br />

modifications/additions to accommodate new aspects discovered dur<strong>in</strong>g the analysis. After<br />

this first phase the two lists and the notes where confronted <strong>in</strong> order to reach a consensus<br />

on table factors and paper cod<strong>in</strong>g. The third author 3 played the devil’s advocate role at<br />

this stage to confirm the factors were explicative, understandable and relevant. A third<br />

set <strong>of</strong> additional papers were gathered and coded at this po<strong>in</strong>t to test the classification<br />

further.<br />

We proceeded then to the def<strong>in</strong>ition <strong>of</strong> a visualization pipel<strong>in</strong>e able to capture the<br />

data visualization processes described <strong>in</strong> the papers. We started from the traditional<br />

<strong>in</strong>formation visualization pipel<strong>in</strong>e [36] because it is widely known and helps captur<strong>in</strong>g key<br />

elements <strong>of</strong> quality-metrics-driven visualizations (details <strong>in</strong> Section 4.1.3).<br />

We generated the quality metrics pipel<strong>in</strong>e iteratively us<strong>in</strong>g the set <strong>of</strong> gathered papers<br />

and the descriptive table with quality metrics factors as reference. In particular, (1) we<br />

built a first draft <strong>of</strong> the new pipel<strong>in</strong>e; (2) we went through the whole list <strong>of</strong> papers and<br />

checked whether the pipel<strong>in</strong>e was able to describe every aspect <strong>in</strong>volved <strong>in</strong> the process; (3)<br />

where discrepancies were found, we ref<strong>in</strong>ed the pipel<strong>in</strong>e accord<strong>in</strong>gly. As a f<strong>in</strong>al step, we<br />

double-checked that every paper <strong>in</strong> the list could be described by a specific <strong>in</strong>stance <strong>of</strong> the<br />

pipel<strong>in</strong>e. Similarly to the procedure followed <strong>in</strong> the first phase we let one <strong>of</strong> the authors,<br />

not <strong>in</strong>volved <strong>in</strong> the model generation phase 3 , aga<strong>in</strong> play devil’s advocate and ref<strong>in</strong>e the<br />

model at <strong>in</strong>termediary steps. The work on the pipel<strong>in</strong>e generated also small adjustments<br />

that led to the f<strong>in</strong>al version <strong>of</strong> the quality metrics table (Table 4.2).<br />

It is important to note that, while we followed a systematic approach there is no<br />

guarantee that this is the only way to describe quality metrics and their use. Many <strong>of</strong><br />

the elements <strong>in</strong>troduced <strong>in</strong> the proposed models are the result <strong>of</strong> our own experience<br />

and are thus necessarily subjective. Nonetheless, the usefulness <strong>of</strong> the proposed model is<br />

demonstrated by its ability to describe the whole set <strong>of</strong> papers and to identify relevant<br />

gaps <strong>in</strong>terest<strong>in</strong>g for future research.<br />

4.1.3 Quality Metrics Pipel<strong>in</strong>e<br />

We briefly recall the ma<strong>in</strong> elements <strong>of</strong> the Card et al.’s pipel<strong>in</strong>e [36] and then we move<br />

forward to the description <strong>of</strong> our extensions.<br />

The orig<strong>in</strong>al purpose <strong>of</strong> the <strong>in</strong>fovis pipel<strong>in</strong>e was to model the ma<strong>in</strong> steps required to<br />

transform data <strong>in</strong>to <strong>in</strong>teractive visualizations. The quality metrics pipel<strong>in</strong>e <strong>in</strong> Figure 4.4<br />

preserves its ma<strong>in</strong> elements: process<strong>in</strong>g steps (horizontal arrows), stages (boxes), and<br />

user feedback (with few nam<strong>in</strong>g di erences we will expla<strong>in</strong> soon). <strong>Data</strong> transformation<br />

transforms data <strong>in</strong>to the desired format. <strong>Visual</strong> mapp<strong>in</strong>g maps data structures <strong>in</strong>to visual<br />

structures (visualization axes, marks, graphical properties). View transformation creates<br />

rendered views out <strong>of</strong> the visual structures. The whole set <strong>of</strong> transformations is <strong>in</strong>fluenced<br />

by the user who can decide at any time to transform the data (e.g., filter), use di erent<br />

visual structures and, navigate the visualization through di erent view po<strong>in</strong>ts.<br />

2 Enrico Bert<strong>in</strong>i and myself.<br />

3 Daniel Keim.


72 Chapter 4. A Systematization <strong>of</strong> Quality Metrics <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization<br />

Quality-Metrics-Driven Automation<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

Figure 4.4: Quality metrics pipel<strong>in</strong>e. The pipel<strong>in</strong>e provides an additional layer named quality<br />

metrics base automation on top <strong>of</strong> the traditional <strong>in</strong>formation visualization pipel<strong>in</strong>e [36]. The<br />

layer obta<strong>in</strong>s <strong>in</strong>formation from the stages <strong>of</strong> the pipel<strong>in</strong>e (the boxes) and <strong>in</strong>fluences the processes<br />

<strong>of</strong> the pipel<strong>in</strong>e through the metrics it calculates. The user is always <strong>in</strong> control.<br />

The <strong>in</strong>fovis pipel<strong>in</strong>e captures extremely well the key elements <strong>of</strong> <strong>in</strong>teractive visualization<br />

across a variety <strong>of</strong> doma<strong>in</strong>s and visual techniques. However, when we focus on<br />

the visualization <strong>of</strong> high-dimensional data patterns a practical problem arises. While the<br />

whole set <strong>of</strong> processes is still valid, the number <strong>of</strong> possible comb<strong>in</strong>ations at each step is<br />

so high that it is impractical to f<strong>in</strong>d <strong>in</strong>teractively the most e ective ones. An example <strong>in</strong><br />

the spirit <strong>of</strong> Mack<strong>in</strong>lay’s sem<strong>in</strong>al analysis [99] helps to clarify the problem: if the orig<strong>in</strong>al<br />

data has dimensionality n = 10 (still a quite low number) and the number <strong>of</strong> available<br />

visual parameters is k = 4 (e.g., a scatterplot with the follow<strong>in</strong>g visual primitives: x-axis,<br />

y-axis, size, and color - see Figure 4.5), the number <strong>of</strong> alternative mapp<strong>in</strong>gs at the visual<br />

mapp<strong>in</strong>g stage is already more than 5000 (k-permutations, i.e., the number <strong>of</strong> sequences<br />

without repetition:<br />

n!<br />

(n≠k)! ).<br />

Figure 4.5: Mapp<strong>in</strong>g a 10 dimensional data set to a scatterplot with four visual primitives (x-axis,<br />

y-axis, size, and color) has over 5000 possible alternative mapp<strong>in</strong>gs.<br />

The ma<strong>in</strong> function <strong>of</strong> quality metrics algorithms is to aid the user <strong>in</strong> the selection <strong>of</strong><br />

promis<strong>in</strong>g comb<strong>in</strong>ations. Typically, the algorithms search through large sets <strong>of</strong> possibilities<br />

and suggest one or more solutions to be evaluated by the user. To describe these steps we<br />

created an additional layer <strong>in</strong> Figure 4.4 that we call quality-metrics-driven automation,<br />

which depicts how quality metrics fit <strong>in</strong>to the process. The metrics draw <strong>in</strong>formation from<br />

the stages <strong>of</strong> the pipel<strong>in</strong>e (green upwards arrows) and <strong>in</strong>fluence the process<strong>in</strong>g steps (blue<br />

downwards arrows) with their computation. The user rema<strong>in</strong>s <strong>in</strong> control <strong>of</strong> the whole<br />

process lett<strong>in</strong>g the mach<strong>in</strong>e perform the computationally hard tasks. We named the new<br />

pipel<strong>in</strong>e the quality metrics pipel<strong>in</strong>e.<br />

The concept <strong>of</strong> generation <strong>of</strong> alternatives and their evaluation is at the core <strong>of</strong> the<br />

method. Regardless the purpose, all the systems we have encountered follow a common<br />

general pattern:


4.1.3 Quality Metrics Pipel<strong>in</strong>e 73<br />

1. Create alternatives (projections, mapp<strong>in</strong>gs, etc.);<br />

2. Evaluate alternatives (rank views, order<strong>in</strong>gs, etc);<br />

3. Produce a f<strong>in</strong>al representation (ranked list <strong>of</strong> views, small multiples, etc.).<br />

As we will show <strong>in</strong> Section 4.1.5, systems with disparate purposes can be described by<br />

this same model.<br />

Process<strong>in</strong>g<br />

In the follow<strong>in</strong>g we provide details about specific features <strong>of</strong> the process<strong>in</strong>g steps <strong>of</strong> the<br />

quality metrics pipel<strong>in</strong>e.<br />

1. <strong>Data</strong> Transformation (source data æ transformed data). In the orig<strong>in</strong>al pipel<strong>in</strong>e<br />

this step has the ma<strong>in</strong> role to put the data <strong>in</strong> a tabular format, hence the orig<strong>in</strong>al<br />

name tabular data <strong>of</strong> its output. S<strong>in</strong>ce here we focus on high-dimensional data,<br />

we assume the source data to be already <strong>in</strong> a tabular format and we rename it <strong>in</strong>to<br />

transformed data. At this stage data transformation is responsible for the generation<br />

<strong>of</strong> alternative data subsets or derivations. Common operations <strong>in</strong>clude: feature<br />

selection, projection, aggregation, and sampl<strong>in</strong>g.<br />

2. <strong>Visual</strong> Mapp<strong>in</strong>g (transformed data æ visual structures). <strong>Visual</strong> mapp<strong>in</strong>g is the<br />

core stage <strong>of</strong> the pipel<strong>in</strong>e where data dimensions are mapped to visual features to<br />

form visual structures. Dist<strong>in</strong>ct mapp<strong>in</strong>gs <strong>of</strong> data features to visual features provide<br />

alternatives that can aga<strong>in</strong> be evaluated <strong>in</strong> terms <strong>of</strong> quality metrics. The most<br />

common type <strong>of</strong> operation at this stage is the generation <strong>of</strong> order<strong>in</strong>gs; by assign<strong>in</strong>g<br />

data dimensions to visualization axes <strong>in</strong> di erent orders. In general, alternatives can<br />

be generated by consider<strong>in</strong>g the full set <strong>of</strong> visual features (e.g., color, size, shape).<br />

3. Render<strong>in</strong>g/View Transformation (visual structures æ views). Render<strong>in</strong>g transforms<br />

visual structures <strong>in</strong>to views by specify<strong>in</strong>g graphical properties that turn these<br />

structures <strong>in</strong>to pixels. We added the word Render<strong>in</strong>g to the pipel<strong>in</strong>e to emphasize<br />

the role <strong>of</strong> the image space; many quality metrics are thus calculated directly <strong>in</strong> the<br />

image space consider<strong>in</strong>g the pixels generated <strong>in</strong> the visualization process. At this<br />

stage alternatives views <strong>of</strong> the same structures can be generated automatically. Surpris<strong>in</strong>gly,<br />

as we discuss <strong>in</strong> Section 4.1.6, this stage is, <strong>in</strong> the context <strong>of</strong> our <strong>in</strong>quiry,<br />

rarely used.<br />

Quality Metrics Computation<br />

Quality metrics can draw <strong>in</strong>formation from any <strong>of</strong> the stages <strong>of</strong> the pipel<strong>in</strong>e. As we describe<br />

later <strong>in</strong> Section 4.1.4 quality metrics can be calculated <strong>in</strong> the data space, image space or<br />

a comb<strong>in</strong>ation <strong>of</strong> the two. Metrics calculated at the view stage draw <strong>in</strong>formation from the<br />

rendered image, whereas the others draw <strong>in</strong>formation from the data space (and elements <strong>of</strong><br />

the visual structures <strong>in</strong> some few cases). Many di erent k<strong>in</strong>d <strong>of</strong> metrics are possible. Our<br />

analysis <strong>of</strong> quality metrics features <strong>in</strong> Section 4.1.4 provides numerous additional details.


74 Chapter 4. A Systematization <strong>of</strong> Quality Metrics <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization<br />

Quality Metrics Influence<br />

As described above, quality metrics algorithms generate alternatives and organize them<br />

<strong>in</strong>to a f<strong>in</strong>al representation. At the data process<strong>in</strong>g stage they can for <strong>in</strong>stance generate<br />

1D, 2D, or nD projections (e.g., [52, 59, 126]), data samples (e.g., [24, 80]), or alternative<br />

aggregates (e.g., [42]). At the visual mapp<strong>in</strong>g stage the layer generates alternative order<strong>in</strong>gs<br />

or mapp<strong>in</strong>gs between data and visual properties (e.g., [112, 120]). At the view stage<br />

the layer can generate modifications <strong>of</strong> the current view like chang<strong>in</strong>g the po<strong>in</strong>t <strong>of</strong> view,<br />

highlight<strong>in</strong>g specific items, or distort<strong>in</strong>g the visual space (e.g., [8]).<br />

User Influence<br />

The quality metrics layer does not want to substitute the user <strong>in</strong> favor <strong>of</strong> the mach<strong>in</strong>e.<br />

While the users can always <strong>in</strong>fluence all the stages <strong>of</strong> the pipel<strong>in</strong>e, their ma<strong>in</strong> responsibility<br />

becomes to steer the process, e.g., by sett<strong>in</strong>g quality metrics parameters, and to explore<br />

the result<strong>in</strong>g views. It is worth not<strong>in</strong>g that the process is not necessarily a l<strong>in</strong>ear flow<br />

through the steps. As will be evident from the examples <strong>in</strong> Section 4.1.5 <strong>in</strong> many cases<br />

complex iteration takes place.<br />

4.1.4 Systematic Analysis<br />

Through our paper review we identified two ma<strong>in</strong> areas <strong>of</strong> <strong>in</strong>vestigation. First, we classify<br />

the papers accord<strong>in</strong>g to quality metrics criteria that help expla<strong>in</strong><strong>in</strong>g their key features.<br />

Second, we provide a more detailed categorization <strong>of</strong> the visualization techniques we have<br />

come across.<br />

Quality Metrics<br />

We identified a number <strong>of</strong> factors that describe the methods encountered through the<br />

literature review. Each factor has a number <strong>of</strong> possible values and each paper can assume<br />

one or more <strong>of</strong> these values (see Table 4.2).<br />

In the follow<strong>in</strong>g, we describe the ma<strong>in</strong> factors we extracted from our analysis.<br />

What is measured<br />

This factor describes what is measured by the quality metric. In our analysis we have<br />

grouped the metrics <strong>in</strong> the follow<strong>in</strong>g categories:<br />

Cluster<strong>in</strong>g metrics measure the extent to which the visualization or the data conta<strong>in</strong><br />

group<strong>in</strong>gs, that is, well-separated clusters that can be easily identified. Cluster<strong>in</strong>g is loosely<br />

def<strong>in</strong>ed because we have encountered many alternative approaches. It is worth to keep <strong>in</strong><br />

m<strong>in</strong>d that with cluster<strong>in</strong>g here we <strong>in</strong>tend any measure <strong>in</strong> the data or image space which<br />

is able to capture group<strong>in</strong>gs.<br />

Correlation relates to two or more data dimensions and captures the extent to which<br />

systematic changes to one dimension are accompanied by changes <strong>in</strong> other dimensions.<br />

Simple Pearson correlation between two variables is one <strong>of</strong> the most commonly used<br />

measure <strong>in</strong> this category but global correlation among multiple data dimensions is also<br />

used [82].


4.1.4 Systematic Analysis 75<br />

Outlier metrics capture the extent to which the data segment under <strong>in</strong>spection conta<strong>in</strong>s<br />

elements that behave di erently from the large majority <strong>of</strong> the data, i.e., outliers.<br />

Complex patterns metrics capture aspects that cannot be easily categorized as any <strong>of</strong><br />

the classes described above. We detected a number <strong>of</strong> papers with such measures and<br />

grouped all <strong>of</strong> them <strong>in</strong> this class. An example is Graph-Theoretic Scagnostics [151] a<br />

technique where it is possible to characterize scatterplots with features like “str<strong>in</strong>gy” or<br />

“sk<strong>in</strong>ny”.<br />

Image quality refers to metrics where the purpose is not necessarily to f<strong>in</strong>d specific<br />

patterns but more to identify the degree <strong>of</strong> organization <strong>of</strong> a visualization or, as some <strong>of</strong><br />

the papers call it, the amount <strong>of</strong> clutter.<br />

Feature preservation metrics focus on the comparison between a reference state and<br />

the representation <strong>in</strong> the visualization, or between the features <strong>in</strong> the data and the visualization,<br />

with the <strong>in</strong>tent to preserve the features <strong>of</strong> <strong>in</strong>terest as much as possible. A<br />

subset <strong>of</strong> these papers focus on classified data, search<strong>in</strong>g for projections where the orig<strong>in</strong>al<br />

classes are well separated [129, 133]. In the same category we can f<strong>in</strong>d papers that<br />

measure the <strong>in</strong>formation loss due to data abstraction techniques such as sampl<strong>in</strong>g and<br />

aggregation [24, 42, 80].<br />

It is worth notic<strong>in</strong>g that <strong>in</strong> this categorization we classified the techniques accord<strong>in</strong>g<br />

to their ma<strong>in</strong> target. This however does not h<strong>in</strong>der a metric <strong>of</strong> one type to also detect<br />

patterns <strong>of</strong> another type. For <strong>in</strong>stance, cluster<strong>in</strong>g and correlation, as well as complex<br />

patterns and image quality, may have such an overlap.<br />

Where it is measured (data/image space)<br />

In our review we have found a completely mixed set <strong>of</strong> approaches with respect to where<br />

the metrics are calculated: data space or image space. Metrics calculated <strong>in</strong> data space<br />

detect data features directly <strong>in</strong> the data without us<strong>in</strong>g <strong>in</strong>formation from the view that<br />

will be used to display the results. For <strong>in</strong>stance, the Rank-by-Feature technique [126]<br />

ranks 1D and 2D projections accord<strong>in</strong>g to a number <strong>of</strong> statistical properties calculated<br />

only <strong>in</strong> data space. Metrics calculated <strong>in</strong> image space bypass the analysis <strong>of</strong> the data and<br />

work directly on the rendered image. Often these methods employ sophisticated image<br />

process<strong>in</strong>g techniques like our work presented <strong>in</strong> Section 3.1.2 and [133] where <strong>in</strong>terest<strong>in</strong>g<br />

scatterplots are ranked us<strong>in</strong>g a Hough transformation. A mixed-space approach, where<br />

both data and and image space are used at the same time, is also possible. We found<br />

two dist<strong>in</strong>ct cases. Bert<strong>in</strong>i and Santucci [24] present a measure to compare features <strong>in</strong><br />

the data space to features <strong>in</strong> the image space; with the <strong>in</strong>tent <strong>of</strong> preserv<strong>in</strong>g as much as<br />

possible data features <strong>in</strong> the f<strong>in</strong>al image. Peng et al. [112] measure clutter <strong>in</strong> relation to<br />

the order<strong>in</strong>g <strong>of</strong> visualization axes: these calculations need data features (outliers, correlations)<br />

and visualization features (e.g., axes adjacency) at the same time. Please note that<br />

the entries <strong>in</strong> Table 4.2, where both data and image space are present, do not necessarily<br />

imply the use <strong>of</strong> the aforementioned mixed approach. More <strong>of</strong>ten, they simply mean that<br />

alternative approaches co-exist <strong>in</strong> the context <strong>of</strong> the same paper.<br />

Purpose<br />

Purpose describes the ma<strong>in</strong> reason for us<strong>in</strong>g quality metrics, that is, what is the goal to<br />

be achieved with the metric. We identified the follow<strong>in</strong>g purposes.<br />

Projection aims at f<strong>in</strong>d<strong>in</strong>g subsets <strong>of</strong> the orig<strong>in</strong>al dimensions <strong>in</strong> which <strong>in</strong>terest<strong>in</strong>g patterns<br />

reside, e.g., analyz<strong>in</strong>g all the possible 2D projections <strong>of</strong> a multidimensional data set<br />

by check<strong>in</strong>g whether <strong>in</strong>terest<strong>in</strong>g group<strong>in</strong>gs exist <strong>in</strong> a scatterplot.


76 Chapter 4. A Systematization <strong>of</strong> Quality Metrics <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization<br />

Order<strong>in</strong>g aims at f<strong>in</strong>d<strong>in</strong>g, where possible, an order<strong>in</strong>g <strong>of</strong> the visualization axes that<br />

eases the visual detection <strong>of</strong> <strong>in</strong>terest<strong>in</strong>g patterns. Parallel coord<strong>in</strong>ates is a classical example<br />

where the order <strong>of</strong> the axes greatly <strong>in</strong>fluences the chances <strong>of</strong> detect<strong>in</strong>g <strong>in</strong>terest<strong>in</strong>g patterns<br />

<strong>in</strong> the data.<br />

Abstraction aims at ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g or controll<strong>in</strong>g a certa<strong>in</strong> degree <strong>of</strong> data representation<br />

quality when data reduction techniques are used to <strong>in</strong>crease the scalability <strong>of</strong> a visualization.<br />

Sampl<strong>in</strong>g and aggregation are the two ma<strong>in</strong> types <strong>of</strong> abstraction techniques we<br />

encountered. For <strong>in</strong>stance, <strong>in</strong> [42] the authors propose a data abstraction technique that<br />

permits to measure the <strong>in</strong>formation loss due to abstraction and to f<strong>in</strong>d a trade-o between<br />

data loss and data reduction.<br />

<strong>Visual</strong> mapp<strong>in</strong>g aims at f<strong>in</strong>d<strong>in</strong>g <strong>in</strong>terest<strong>in</strong>g mapp<strong>in</strong>gs between the orig<strong>in</strong>al data features<br />

and the visual features <strong>of</strong> the visualization technique. Features such as color, size or shape<br />

fall <strong>in</strong>to this category.<br />

View optimization aims at modify<strong>in</strong>g parameters <strong>of</strong> the view with the <strong>in</strong>tent to produce<br />

better visualizations, <strong>in</strong> which, for example, data segments with a high degree <strong>of</strong> <strong>in</strong>terest<br />

are highlighted.<br />

Interaction<br />

The last column <strong>of</strong> the table <strong>in</strong>dicates which papers o er the possibility to <strong>in</strong>teract with the<br />

quality-metrics-based automation. We extracted two ma<strong>in</strong> classes <strong>of</strong> <strong>in</strong>teraction: threshold<br />

selection and metrics selection. With threshold selection we mean the possibility to set<br />

thresholds <strong>in</strong> the quality metrics computation mechanism (e.g., the data abstraction level<br />

<strong>in</strong> [42] or the density estimation smooth<strong>in</strong>g parameter <strong>in</strong> [52]). With metrics selection we<br />

mean systems <strong>in</strong> which the user can either switch from one metrics to another or comb<strong>in</strong>e<br />

them <strong>in</strong>to an <strong>in</strong>tegrated one (e.g., [42, 82]). Please note that some <strong>of</strong> the papers may<br />

conta<strong>in</strong> <strong>in</strong>teraction capabilities and still be marked as not <strong>in</strong>teractive because they do not<br />

provide direct <strong>in</strong>teraction with the quality metrics mechanisms.<br />

<strong>Visual</strong>ization<br />

The orig<strong>in</strong>al table we have designed to classify the full set <strong>of</strong> papers (see Table 4.2 below)<br />

conta<strong>in</strong>s a rough categorization <strong>of</strong> visualization techniques <strong>in</strong>to three ma<strong>in</strong> classes: scatterplots<br />

(SP), parallel coord<strong>in</strong>ates (PC), and others (which <strong>in</strong>clude a fairly large number <strong>of</strong><br />

di erent techniques). While this categorization helps understand<strong>in</strong>g how these techniques<br />

distribute over the whole set <strong>of</strong> papers (SP and PC accounts for 80% <strong>of</strong> the total) it does<br />

not say anyth<strong>in</strong>g about key features <strong>of</strong> visualization techniques; especially those closely<br />

related to the usage <strong>of</strong> quality metrics.<br />

We def<strong>in</strong>e layout dimensionality as the number <strong>of</strong> data axes a visualization has. A<br />

data axis is the visualization feature that establishes what position a s<strong>in</strong>gle visual mark<br />

takes <strong>in</strong> the visualization. For <strong>in</strong>stance, scatterplots have dimensionality two because they<br />

can accommodate two spatial dimensions.<br />

The visualization techniques are classified <strong>in</strong>to 1D, 2D, 3D, 4D, and nD, where nD<br />

stands for techniques that can accommodate an arbitrary number <strong>of</strong> dimensions (with<br />

obvious scalability limits when the number <strong>of</strong> dimensions grows too big).<br />

It is worth notic<strong>in</strong>g that <strong>in</strong> general every visualization has an additional number <strong>of</strong><br />

visual features to which data features can be mapped, e.g., color and size, but here we<br />

focus on the layout because it is the variable that most characterizes every visualization


4.1.4 Systematic Analysis 77<br />

technique and that has the biggest impact on the use <strong>of</strong> quality metrics. Table 4.1 shows<br />

the dimensionality <strong>of</strong> all the techniques we have identified <strong>in</strong> the review.<br />

The visualization techniques that are not <strong>in</strong> the nD class necessarily need an additional<br />

mechanism for the analysis <strong>of</strong> high-dimensional data. Typically, as discussed below, they<br />

are organized <strong>in</strong> a higher level structure that accommodates several projections. Those<br />

which can accommodate an arbitrary number <strong>of</strong> dimensions (nD) all need some k<strong>in</strong>d <strong>of</strong><br />

order<strong>in</strong>g mechanisms.<br />

Table 4.1: <strong>Visual</strong>ization techniques categorized by their layout dimensionality (i.e., the number <strong>of</strong><br />

axes <strong>of</strong> the visualization).<br />

<strong>Visual</strong>ization<br />

histogram<br />

jigsaw map [150]<br />

scatterplot<br />

pixel bar charts [87]<br />

dimensional stack<strong>in</strong>g [91]<br />

matrix [22]<br />

parallel coord<strong>in</strong>ates [78]<br />

radvis [72]<br />

scatterplot matrix [37]<br />

star glyphs [128]<br />

table lens [115]<br />

Layout <strong>Dimensional</strong>ity<br />

1D<br />

1D<br />

2D<br />

4D<br />

nD<br />

nD<br />

nD<br />

nD<br />

nD<br />

nD<br />

nD<br />

While not explicitly discussed <strong>in</strong> any <strong>of</strong> the reviewed papers, we have noticed that<br />

<strong>of</strong>ten a quality-metrics-driven approach needs some k<strong>in</strong>d <strong>of</strong> (implicit or explicit) metavisualization.<br />

With meta-visualization we mean a visualization <strong>of</strong> visualizations. More<br />

specifically, a visualization layout strategy that organizes s<strong>in</strong>gle visualizations <strong>in</strong>to an organized<br />

form. For <strong>in</strong>stance, when a quality-metrics-driven technique produces a number<br />

<strong>of</strong> <strong>in</strong>terest<strong>in</strong>g scatterplots as an output, there is the need to organize them <strong>in</strong>to a schema<br />

that facilitates their comprehension and analysis (e.g., organized <strong>in</strong>to a list sorted by <strong>in</strong>terest<strong>in</strong>gness).<br />

From our analysis we have identified the follow<strong>in</strong>g ma<strong>in</strong> meta-visualization<br />

strategies:<br />

List: a layout strategy that organizes visualizations <strong>in</strong> an ordered l<strong>in</strong>ear fashion (<strong>of</strong>ten<br />

sorted to reflect quality metrics rank<strong>in</strong>gs);<br />

Matrix: a layout strategy that organizes visualizations <strong>in</strong> a grid format, where grid entries<br />

are organized accord<strong>in</strong>g to some data features (e.g., column and rows represent data<br />

dimensions) (<strong>of</strong>ten called also Small Multiples, Trellis, Lattice, Facets).<br />

It is worth notic<strong>in</strong>g that some basic visualization techniques can be considered metavisualizations<br />

themselves. A notable example is the scatterplot matrix which shows a set<br />

<strong>of</strong> scatterplots organized <strong>in</strong> a matrix layout.<br />

In general there is a strong <strong>in</strong>terplay between visualizations and meta-visualizations.<br />

As mentioned above, techniques with a fixed dimensionality need to be organized <strong>in</strong> a<br />

meta-visualization. The meta-visualization <strong>in</strong>fluences the order<strong>in</strong>g <strong>of</strong> the visualizations


78 Chapter 4. A Systematization <strong>of</strong> Quality Metrics <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization<br />

Table 4.2: Quality metrics papers classified accord<strong>in</strong>g to quality metrics factors (sorted by purpose).<br />

Paper Title <strong>Visual</strong>ization technique What is measured<br />

SP PC other cluster<strong>in</strong>g correlation outliers complex<br />

patterns<br />

What is measured Where it is<br />

image<br />

quality<br />

feature<br />

pres.<br />

Where it is<br />

measured<br />

A Projection Pursuit Algorithm for Exploratory<br />

<strong>Data</strong> Analysis - Friedman & Tukey [54] SP cluster<strong>in</strong>g data projection<br />

data image projection order<strong>in</strong>g abstraction visual<br />

mapp<strong>in</strong>g<br />

space<br />

Purpose Inter-<br />

view<br />

optimization<br />

act-<br />

ion<br />

A Rank-by-Feature Framework for Unsupervised<br />

Multidimensional <strong>Data</strong> Exploration Us<strong>in</strong>g Low<br />

<strong>Dimensional</strong> Projections-Seo & Shneiderman[126]<br />

F<strong>in</strong>d<strong>in</strong>g and <strong>Visual</strong>iz<strong>in</strong>g Relevant Subspaces for<br />

Cluster<strong>in</strong>g <strong>High</strong>-<strong>Dimensional</strong> Astronomical <strong>Data</strong><br />

Us<strong>in</strong>g Connected Morphological Operators**[52]<br />

SP<br />

histogram,<br />

matrix, list<br />

cluster<strong>in</strong>g correlation outliers<br />

complex<br />

patterns<br />

data projection S<br />

SP histogram cluster<strong>in</strong>g image projection T<br />

Graph-Theoretic Scagnostics - Wilk<strong>in</strong>son et al.<br />

[151] SP cluster<strong>in</strong>g outliers<br />

complex<br />

patterns<br />

image projection<br />

Select<strong>in</strong>g good views <strong>of</strong> high-dimensional data<br />

us<strong>in</strong>g class consistency - Sips et al. [129] SP class pres. data projection T<br />

Coord<strong>in</strong>at<strong>in</strong>g computational and visual<br />

approaches for <strong>in</strong>teractive feature selection and<br />

multivariate cluster<strong>in</strong>g - Guo [59]<br />

Explor<strong>in</strong>g <strong>High</strong>-D Spaces with Multiform Matrices<br />

and Small Multiples - MacEachern et al. [98]<br />

Improv<strong>in</strong>g the <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong><br />

<strong>Data</strong>sets Us<strong>in</strong>g Quality Measures - Albuquerque<br />

et al. [8]<br />

Interactive Hierarchical Dimension Order<strong>in</strong>g,<br />

Spac<strong>in</strong>g and Filter<strong>in</strong>g for Exploration <strong>of</strong> <strong>High</strong><br />

<strong>Dimensional</strong> <strong>Data</strong>sets - Yang et al. [158]<br />

Interactive <strong>Dimensional</strong>ity Reduction Through<br />

User-def<strong>in</strong>ed Comb<strong>in</strong>ations <strong>of</strong> Quality Metrics -<br />

Johansson & Johansson [82]<br />

PC<br />

matrix correlation data projection order<strong>in</strong>g<br />

pixel based<br />

vis., matrix,<br />

small multiples<br />

jigsaw map,<br />

radvis, table<br />

lens<br />

histogram, star<br />

glyphs<br />

Pargnostics: Image-Space Metrics for Parallel<br />

Coord<strong>in</strong>ates - Dasgupta & Kosara [43] PC cluster<strong>in</strong>g correlation<br />

correlation data projection order<strong>in</strong>g<br />

cluster<strong>in</strong>g correlation outliers data image projection order<strong>in</strong>g<br />

correlation data projection order<strong>in</strong>g<br />

visual<br />

mapp<strong>in</strong>g<br />

view<br />

optimization<br />

PC cluster<strong>in</strong>g correlation outliers data projection order<strong>in</strong>g S, T<br />

image<br />

quality<br />

image projection order<strong>in</strong>g S<br />

S, T<br />

Comb<strong>in</strong><strong>in</strong>g automated analysis and visualization<br />

techniques for effective exploration <strong>of</strong> highdimensional<br />

data - Tatu et al. [133]<br />

<strong>High</strong>-<strong>Dimensional</strong> <strong>Visual</strong> <strong>Analytics</strong>: Interactive<br />

Exploration Guided by Pairwise Views <strong>of</strong> Po<strong>in</strong>t<br />

Distributions - Wilk<strong>in</strong>son et al. [152]<br />

Clutter Reduction <strong>in</strong> Multi-<strong>Dimensional</strong> <strong>Data</strong><br />

<strong>Visual</strong>ization Us<strong>in</strong>g Dimension Reorder<strong>in</strong>g - Peng<br />

et al. [112]<br />

Similarity Cluster<strong>in</strong>g <strong>of</strong> Dimensions for an<br />

Enhanced <strong>Visual</strong>ization <strong>of</strong> Multidimensional <strong>Data</strong><br />

- Ankerst et al. [9]<br />

SP PC cluster<strong>in</strong>g correlation<br />

SP PC cluster<strong>in</strong>g outliers<br />

SP PC<br />

PC<br />

star glyphs,<br />

dim. stack<strong>in</strong>g<br />

recursive<br />

pattern, circle<br />

segments<br />

Measur<strong>in</strong>g <strong>Data</strong> Abstraction Quality <strong>in</strong><br />

Multiresolution <strong>Visual</strong>izations - Cui et al. [42] SP PC histogram<br />

correlation outliers<br />

complex<br />

patterns<br />

complex<br />

patterns<br />

image<br />

quality<br />

class pres. data image projection order<strong>in</strong>g<br />

image projection order<strong>in</strong>g<br />

data image order<strong>in</strong>g<br />

correlation data order<strong>in</strong>g<br />

feature<br />

pres.<br />

data abstraction T<br />

Quality Metrics for 2D Scatterplot Graphics:<br />

Automatically Reduc<strong>in</strong>g <strong>Visual</strong> Clutter - Bert<strong>in</strong>i &<br />

Santucci [24]<br />

A Screen Space Quality Method for <strong>Data</strong><br />

Abstraction - Johansson & Cooper [80] PC<br />

SP cluster<strong>in</strong>g<br />

feature<br />

pres.<br />

feature<br />

pres.<br />

data image abstraction<br />

image sampl<strong>in</strong>g<br />

Enabl<strong>in</strong>g Automatic Clutter Reduction <strong>in</strong> Parallel<br />

Coord<strong>in</strong>ate Plots - Ellis & Dix [48] PC<br />

image<br />

quality<br />

image sampl<strong>in</strong>g T<br />

Pixnostics: Towards measur<strong>in</strong>g the value <strong>of</strong><br />

visualization - Schneidew<strong>in</strong>d et al. [120]<br />

jigsaw map,<br />

pixel bar chart<br />

correlation<br />

complex<br />

patterns<br />

data image<br />

visual<br />

mapp<strong>in</strong>g<br />

** Ferdosi et al.<br />

Legend: SP = scatter plot (& matrix), PC = parallel coord<strong>in</strong>ates, feature/class pres. = feature/class preservation, S = select metric, T = set threshold.


4.1.5 Examples 79<br />

and <strong>in</strong> some cases also the content. For <strong>in</strong>stance, the matrix layout requires that the<br />

visualization with<strong>in</strong> a grid cell corresponds to the data values it represents.<br />

F<strong>in</strong>ally, meta-visualizations can themselves be <strong>in</strong>fluenced by quality metrics. All the<br />

layout strategies have some degree <strong>of</strong> freedom <strong>in</strong> terms <strong>of</strong> reorder<strong>in</strong>g, and an optimal<br />

reorder<strong>in</strong>g (accord<strong>in</strong>g to some given goal) can only be achieved by search<strong>in</strong>g <strong>in</strong> the space<br />

<strong>of</strong> solutions (e.g., as presented <strong>in</strong> [112]).<br />

4.1.5 Examples<br />

In this section, we provide four selected examples from our review as a way to show<br />

how our proposed model can describe exist<strong>in</strong>g approaches <strong>in</strong> this area. We selected the<br />

examples <strong>in</strong> a way to cover as many <strong>in</strong>terest<strong>in</strong>g aspects as possible. In particular, we<br />

picked papers with di erent purposes because they guarantee a larger variety <strong>of</strong> features.<br />

For completeness we provide all the other quality metrics pipel<strong>in</strong>es <strong>in</strong> Appendix A.3 <strong>in</strong><br />

the same order the papers are listed <strong>in</strong> Table 4.2.<br />

The first example comes from our own work presented <strong>in</strong> Section 3.1.4 and Section 3.1.5,<br />

published <strong>in</strong> [133]. The ma<strong>in</strong> goal <strong>of</strong> this work is to f<strong>in</strong>d <strong>in</strong>terest<strong>in</strong>g projections <strong>of</strong> n-<br />

dimensional data us<strong>in</strong>g image process<strong>in</strong>g techniques. The section presents several measures,<br />

but here we focus only on the part deal<strong>in</strong>g with parallel coord<strong>in</strong>ates and one specific<br />

metric, the Similarity Measure.<br />

The basic idea <strong>of</strong> the method is to generate all possible 2D comb<strong>in</strong>ations <strong>of</strong> the orig<strong>in</strong>al<br />

dimensions and evaluate them <strong>in</strong> terms <strong>of</strong> their ability to form clusters <strong>in</strong> a 2-axis parallel<br />

coord<strong>in</strong>ates representation. Every pair <strong>of</strong> axis is evaluated <strong>in</strong>dividually us<strong>in</strong>g a standard<br />

image process<strong>in</strong>g technique (the Hough transform), which permits to discrim<strong>in</strong>ate between<br />

uniform and chaotic distributions <strong>of</strong> l<strong>in</strong>e angles and positions (for details please refer back<br />

to Figure 3.6 for the Hough transform). Once <strong>in</strong>terest<strong>in</strong>g pairs have been extracted, they<br />

are jo<strong>in</strong>ed together to form groups <strong>of</strong> parallel coord<strong>in</strong>ates <strong>of</strong> a desired (user-def<strong>in</strong>ed) size<br />

(e.g., <strong>in</strong> Figure 3.14, groups <strong>of</strong> 4-dimensional parallel coord<strong>in</strong>ates are formed).<br />

Figure 4.6 presents the pipel<strong>in</strong>e for this example. We can recognize three ma<strong>in</strong> elements:<br />

(A) all 2D parallel coord<strong>in</strong>ates are generated <strong>in</strong> the data transformation phase;<br />

(B) all the alternatives are evaluated <strong>in</strong> the image space at the view stage; (C) the algorithm<br />

comb<strong>in</strong>es the <strong>in</strong>terest<strong>in</strong>g segments <strong>in</strong>to a list <strong>of</strong> parallel coord<strong>in</strong>ates (like those <strong>in</strong><br />

Figure 3.14) us<strong>in</strong>g the visual mapp<strong>in</strong>g stage.<br />

Quality-Metrics-Driven Automation<br />

A<br />

C<br />

B<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

Figure 4.6: Quality metrics pipel<strong>in</strong>e for the first example from [133]: (A) generation <strong>of</strong> alternatives;<br />

(B) evaluation <strong>of</strong> alternatives (image space); (C) creation <strong>of</strong> the f<strong>in</strong>al representation.<br />

The technique uses parallel coord<strong>in</strong>ates (PC) as pr<strong>in</strong>cipal visualization technique and<br />

a list as a meta-visualization. It measures cluster<strong>in</strong>g properties, <strong>in</strong> the image space, and<br />

its ma<strong>in</strong> purpose is to f<strong>in</strong>d <strong>in</strong>terest<strong>in</strong>g projections. Interaction with the metrics is very<br />

limited if not absent.


80 Chapter 4. A Systematization <strong>of</strong> Quality Metrics <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization<br />

The second example comes from the work <strong>of</strong> Johansson and Johansson on <strong>in</strong>teractive<br />

feature selection [82]. The technique ranks every s<strong>in</strong>gle dimension for its importance us<strong>in</strong>g<br />

a comb<strong>in</strong>ation <strong>of</strong> correlation, outlier, and cluster<strong>in</strong>g features calculated on the data. This<br />

rank<strong>in</strong>g is used as the basis for an <strong>in</strong>teractive threshold selection tool by which the user<br />

can decide how many dimensions to keep; weight<strong>in</strong>g the choice with the correspond<strong>in</strong>g<br />

<strong>in</strong>formation loss presented by the chart (see Figure 4.7). Once the user selects the desired<br />

number <strong>of</strong> dimensions the system presents the result with parallel coord<strong>in</strong>ates and automatically<br />

f<strong>in</strong>ds a good order<strong>in</strong>g us<strong>in</strong>g the same data features calculated for rank<strong>in</strong>g the<br />

dimensions. The user can also choose di erent weight<strong>in</strong>g schemes to focus more on correlation,<br />

outliers or clusters. Figure 4.8 shows the results <strong>of</strong> cluster<strong>in</strong>g (top) and correlation<br />

(bottom).<br />

!"#$%&&"%'$%('!"#$%&&"%)'*%+,-$.+*/,'(*0,%&*"%$1*+2'-,(3.+*"%'+#-"34#'3&,-5(,6*%,<br />

S reduced = [1, 2, 3, 7,<br />

Cluster<br />

c 0<br />

Variables Q<br />

[3, 6, 7, 10]<br />

c 1 [2, 3, 10, 17]<br />

c 2 [1, 2, 7]<br />

First iteration:<br />

[1, 2,<br />

Second iteration:<br />

[1, 12<br />

Third iteration:<br />

[12, 1<br />

Fig. 5. Example <strong>of</strong> variable order<strong>in</strong>g a<br />

Initially the clusters are ordered acco<br />

iteration the reorder<strong>in</strong>g is found that r<br />

Figure 4.7: Interactive Fig. 4. Interactive chart to select displaynumber <strong>of</strong> the amount <strong>of</strong> dimensions <strong>of</strong> <strong>in</strong>formation to keep lost relative vs. <strong>in</strong>formation to connected loss [82]. variables be<strong>in</strong>g part <strong>of</strong> c i ,<br />

!!" !"""#$%&'(&)$!*'(#*'#+!(,&-!.&$!*'#&'/#)*01,$"%#2%&13!)(4#+<br />

number <strong>of</strong> variables to keep <strong>in</strong> the reduced data set. The black l<strong>in</strong>e previous clusters (represented by red<br />

represents the comb<strong>in</strong>ed <strong>in</strong>formation loss for all quality metrics, the blue,<br />

red and green l<strong>in</strong>es represent <strong>in</strong>formation loss <strong>in</strong> cluster, correlation and<br />

outlier structures respectively. The red vertical l<strong>in</strong>e corresponds to the<br />

number <strong>of</strong> variables currently selected.<br />

sum <strong>of</strong> I(x j ) for the removed variables and I total is the sum <strong>of</strong> I(x j )<br />

for all variables <strong>in</strong> the data set.<br />

The <strong>in</strong>teractive display (figure 4) consists <strong>of</strong> a l<strong>in</strong>e graph and a<br />

graphical user <strong>in</strong>terface for modification <strong>of</strong> weight values and selection<br />

<strong>of</strong> number <strong>of</strong> variables to keep. The l<strong>in</strong>e graph displays the relationship<br />

between I lost (y-axis) and number <strong>of</strong> variables to keep <strong>in</strong> the<br />

reduced data set (x-axis), represent<strong>in</strong>g each quality metric <strong>in</strong>dividually<br />

by a l<strong>in</strong>e and us<strong>in</strong>g one l<strong>in</strong>e for the comb<strong>in</strong>ed importance value <strong>of</strong> all<br />

metrics. A similar approach is taken <strong>in</strong> [6], where quality measures for<br />

data abstractions such as cluster<strong>in</strong>g and sampl<strong>in</strong>g are <strong>in</strong>tegrated <strong>in</strong>to<br />

multivariate visualizations. A vertical l<strong>in</strong>e is used <strong>in</strong> the <strong>in</strong>teractive<br />

display to facilitate identification <strong>of</strong> lost <strong>in</strong>formation for the selected<br />

number <strong>of</strong> variables. If reta<strong>in</strong><strong>in</strong>g 18 variables, accord<strong>in</strong>g to the position<br />

<strong>of</strong> the vertical l<strong>in</strong>e <strong>in</strong> figure 4, it can be seen from the display that<br />

some <strong>of</strong> the reta<strong>in</strong>ed variables conta<strong>in</strong> no cluster <strong>in</strong>formation at all. In<br />

figure 6 the correspond<strong>in</strong>g 18 variable data set is visualized us<strong>in</strong>g parallel<br />

The coord<strong>in</strong>ates. syntheticAs data can set be seen reduced from the tovisual 9 variables aids at the us<strong>in</strong>g bottomdifferent<br />

quality<strong>of</strong> metric the axes, weights the five and left variablesorders. are <strong>of</strong> lowInglobal the top importance view cluster<strong>in</strong>g and is<br />

Fig. 2.<br />

assigned alsoahave large lowweight cluster and correlation the variables importance. are ordered By look<strong>in</strong>g to enhance at the the<br />

clusterpatterns structures. <strong>of</strong> the l<strong>in</strong>es In the it isbottom also quite view easily a seen correspond<strong>in</strong>g that these variables weight<strong>in</strong>g are and<br />

order<strong>in</strong>g rather is made noisy, for hence correlation more variables structures. can be removed from the data set<br />

without los<strong>in</strong>g much more <strong>in</strong>formation.<br />

This pair forms the basis <strong>of</strong> the orde<br />

the highest correlation conta<strong>in</strong><strong>in</strong>g x a o<br />

ordered is identified. The unordered v<br />

right border <strong>of</strong> the ordered variables,<br />

forms a highly correlated pair. This c<br />

pairs with highest correlation conta<strong>in</strong><br />

positioned at the leftmost or rightmos<br />

ordered variables, until all variables a<br />

The variable order<strong>in</strong>gs enhanc<strong>in</strong>g<br />

based on the quality values calculat<br />

connection with the cluster and outlie<br />

the same way. An example <strong>of</strong> the ord<br />

structures, is shown <strong>in</strong> figure 5, whe<br />

reta<strong>in</strong>ed after dimensionality reductio<br />

formed as follows:<br />

1. Initially the clusters are sorted i<br />

quality value, as shown <strong>in</strong> figu<br />

based on three clusters, c 0 , c 1 a<br />

Figure 4.8: Top: best order<strong>in</strong>g to enhance cluster<strong>in</strong>g. Bottom: best order<strong>in</strong>g to enhance correlation<br />

[82].<br />

2. In the first iteration all variable<br />

first cluster, c 0 , are positioned<br />

c 0 <strong>in</strong>cludes variable 6. This va<br />

Figure 4.9 shows the pipel<strong>in</strong>e for this example. Aga<strong>in</strong> we have three ma<strong>in</strong>iselements:<br />

hence not taken <strong>in</strong>to conside<br />

(A) every s<strong>in</strong>gle dimension is ranked by the quality metrics directly from the source figure Fig. data. represents 3. Thethe visual positions aidso<br />

The reason why the source data is needed is that the importance measure3. <strong>of</strong> In aderstand<strong>in</strong>g thes<strong>in</strong>gle<br />

<strong>of</strong> the impor<br />

subsequent iterations the<br />

dimension is computed 3.4 Variable tak<strong>in</strong>gOrder<strong>in</strong>g<br />

<strong>in</strong>to account the full set <strong>of</strong> dimensions (see the Spaper r is the<br />

reduced and for<br />

correlation<br />

<strong>of</strong> any cluster,<br />

betw<br />

c j<br />

c 1 , red for <strong>in</strong>stance, and positive variables <strong>in</strong> blue. 2, 3<br />

the unit, TheN order is the <strong>of</strong> variables total number <strong>in</strong> multivariate <strong>of</strong> data visualization items <strong>in</strong> has theadata largeset impact and D is variables and I out 3 and are10the arecluster,<br />

also part<br />

the range<br />

on how<br />

<strong>of</strong> the<br />

easilyvariable we can perceive<br />

conta<strong>in</strong><strong>in</strong>g<br />

different<br />

thestructures one-dimensional<br />

<strong>in</strong> the data.<br />

unit.<br />

The<br />

A k-<br />

proposed system comb<strong>in</strong>es several quality metrics to f<strong>in</strong>d a dimensionalityunit<br />

reduction<br />

dimensional<br />

4. The reorder<strong>in</strong>g <strong>of</strong> variables <strong>in</strong> S<br />

is considered<br />

that can bedense regarded<br />

if<br />

as<br />

itsadensity good representation<br />

is higher than<br />

<strong>of</strong><br />

the<br />

sequence <strong>of</strong> connected variable<br />

thresholds the orig<strong>in</strong>al <strong>of</strong> all data one-dimensional set, focus<strong>in</strong>g the units structures <strong>of</strong> which that it areis<strong>of</strong>composed.<br />

<strong>in</strong>terest for variables travers<strong>in</strong>g the border p<br />

With<strong>in</strong> the particular the proposed analysis task system at hand. theF<strong>in</strong>d<strong>in</strong>g cluster<strong>in</strong>g one appropriate algorithm variable has been rectangles) areI(x found, and S redu<br />

order<strong>in</strong>g enhanc<strong>in</strong>g all <strong>in</strong>terest<strong>in</strong>g structures at once may, however, be<br />

j )=w corr I c<br />

i = 1 this is achieved by switch<br />

slightly modified to further speed-up the cluster detection by us<strong>in</strong>g


4.1.5 Examples 81<br />

details); (B) the user selects the dimensions guided by the quality metrics, both the user<br />

and the quality metric <strong>in</strong>fluence the data transformation process; (C) the system f<strong>in</strong>ds<br />

the best order<strong>in</strong>g accord<strong>in</strong>g to the weight<strong>in</strong>g scheme proposed by the user produc<strong>in</strong>g one<br />

specific visual mapp<strong>in</strong>g. Theviewispresentedtotheuser.<br />

Quality-Metrics-Driven Automation<br />

A B C<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

ULTIRESOLUTIONVISUALIZATION<br />

L, all the records<br />

us sample.<br />

oundary, the syss<br />

view, and then<br />

the above guidel<br />

have the option<br />

e the DAL or the<br />

ction<br />

ng<strong>in</strong>g from a s<strong>in</strong>usters<br />

conta<strong>in</strong><strong>in</strong>g<br />

lution visualizarepresentative<br />

or<br />

ords <strong>in</strong> this cluscords<br />

or clusters<br />

groups, a tree <strong>of</strong><br />

l the items with a<br />

AL. If the tree is<br />

the nodes <strong>of</strong> this<br />

nique position <strong>in</strong><br />

ange <strong>of</strong> nodes <strong>in</strong><br />

ge. All the nodes<br />

abstraction level<br />

Figure 4.9: Quality metrics pipel<strong>in</strong>e for the second example from [82]: (A) dimensions ranked by<br />

their importance; (B) selection <strong>of</strong> number <strong>of</strong> dimensions to reta<strong>in</strong> vs. <strong>in</strong>formation loss; (C) creation<br />

<strong>of</strong> the f<strong>in</strong>al mapp<strong>in</strong>g with order<strong>in</strong>g.<br />

713<br />

This technique uses parallel coord<strong>in</strong>ates as pr<strong>in</strong>cipal visualization. There is no metavisualization<br />

to organize alternative results <strong>in</strong> a schema but the <strong>in</strong>teractive chart functions<br />

while specific to hierarchically clustered data, can support all <strong>of</strong> the<br />

<strong>in</strong>teractions as a wayonto the abstraction. pilot the generation <strong>of</strong> alternatives. It measures cluster<strong>in</strong>g, correlation and<br />

outliers <strong>in</strong> the data space and its ma<strong>in</strong> purpose is to f<strong>in</strong>d <strong>in</strong>terest<strong>in</strong>g projections and<br />

5 CASE STUDY<br />

order<strong>in</strong>gs. Interaction<br />

1: CHOOSING A DATA ABSTRACTION LEVEL<br />

plays a central role <strong>in</strong> the selection <strong>of</strong> the number <strong>of</strong> dimensions<br />

(DAL)<br />

and <strong>in</strong> the weight<strong>in</strong>g scheme.<br />

In this section, we show how to choose an appropriate DAL. At this<br />

level, The the abstracted third example dataset should is have taken highfrom data abstraction the work quality<br />

This (equal paper or moreproposes than 0.90) and a technique the visualization toshould create haveabstracted the visualizations <strong>in</strong> a user-controlled<br />

<strong>of</strong> Cui et al. on data abstraction quality [42].<br />

best visual quality under the constra<strong>in</strong>ts <strong>of</strong> the data abstraction quality.<br />

manner. The analytic The task is system to searchfeatures for clusters <strong>in</strong> data the OUT5D abstraction dataset. metrics (Histogram Di erence Measure<br />

This anddataset Nearest consists Neighbor <strong>of</strong> five remote Measure) sens<strong>in</strong>g channels: and controllers SPOT, Magnetics,<br />

Potassium, Thorium and Uranium, with 16384 records. We<br />

to let the user f<strong>in</strong>d a trade-o between<br />

abstraction level and <strong>in</strong>formation loss. In particular, the data abstraction quality is calculated<br />

dataset. by<strong>Data</strong> compar<strong>in</strong>g po<strong>in</strong>ts have significant featuresoverlaps <strong>of</strong> the with orig<strong>in</strong>al each otherdata and to features <strong>in</strong> the sampled or aggregated<br />

employ scatterplots to visualize this dataset. Figure 4 shows the orig<strong>in</strong>al<br />

so data. we cannot dist<strong>in</strong>guish relative data density <strong>in</strong> different regions and<br />

have difficulty observ<strong>in</strong>g any trends with<strong>in</strong> this dataset.<br />

714<br />

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS,<br />

observe. Next we adjust the<br />

visual quality <strong>in</strong> the marked<br />

data density is ma<strong>in</strong>ta<strong>in</strong>ed, a<br />

Cluster A still overlap with<br />

abstraction are shown <strong>in</strong> Figu<br />

and we term<strong>in</strong>ate our explora<br />

Abstraction quality measu<br />

discovered. If we only know<br />

ber <strong>of</strong> abstracted records and<br />

we cannot have much confid<br />

that 96 percent <strong>of</strong> the data a<br />

more than 0.95 and the NNM<br />

sampl<strong>in</strong>g, we are fairly certai<br />

orig<strong>in</strong>al dataset very well an<br />

is very likely valid. In gener<br />

measures to the discovered p<br />

the pattern, which enables an<br />

Fig. 4. Scatterplots <strong>of</strong> orig<strong>in</strong>al dataset (DAL=1.00)<br />

Fig. 6. Scatterplots <strong>of</strong> abstracted dataset (DAL=0.08)<br />

Figure 4.10: <strong>Visual</strong> abstraction <strong>of</strong> a scatterplot matrix from [42].<br />

Figure 4.11 shows the pipel<strong>in</strong>e for this example. We have two ma<strong>in</strong> elements: (A)<br />

the data abstraction quality measures are calculated by compar<strong>in</strong>g the source data to the<br />

transformed data; (B) the user selects the desired abstraction quality and receives feedback<br />

6 CASE STUDY 2: COM<br />

ODS<br />

In this application, two data<br />

pl<strong>in</strong>g, are compared us<strong>in</strong>g the<br />

bedded with<strong>in</strong> our multireso<br />

the AAUP dataset, which su<br />

tion <strong>of</strong> pr<strong>of</strong>essors at 1161 <strong>in</strong><br />

visualize this dataset. Throu<br />

has the advantage <strong>of</strong> ma<strong>in</strong>tai<br />

cluster<strong>in</strong>g has the advantage<br />

First we briefly review som<br />

The HDM is based on the<br />

between the distributions <strong>of</strong><br />

changes <strong>in</strong> the relative densi<br />

tance between the orig<strong>in</strong>al da<br />

cannot be elim<strong>in</strong>ated dur<strong>in</strong>g<br />

average distance, because th<br />

records. Thus the NNM met<br />

good at monitor<strong>in</strong>g the chang


82 Chapter 4. A Systematization <strong>of</strong> Quality Metrics <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization<br />

Quality-Metrics-Driven Automation<br />

A B A<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

IEEE TRANSACTIONS Figure ON VISUALIZATION 4.11: Quality ANDmetrics COMPUTER pipel<strong>in</strong>e GRAPHICS, for example VOL. 12, NO. three 5, SEPTEMBER/OCTOBER from [42]: (A) data 2006features compared<br />

between the orig<strong>in</strong>al data and the abstracted data; (B) <strong>in</strong>stantiation <strong>of</strong> the desired abstraction<br />

level guided by quality metrics.<br />

between pairs <strong>of</strong> image pixels. The PSNR x-axis represents the DAL and the y-axis represents the quality measures.<br />

The red and blue l<strong>in</strong>e represent the changes <strong>of</strong> HDM and NNM<br />

) is the most common image quality meamean<br />

squared error) and used <strong>in</strong> the JPEG aga<strong>in</strong>st the abstraction level, respectively. A vertical l<strong>in</strong>e called the<br />

ef<strong>in</strong>ed by the follow<strong>in</strong>g equations: on its quality by DAL steer<strong>in</strong>g handle the is drawn data transformation to <strong>in</strong>dicate the current process. abstraction level. The<br />

The paper applies cross po<strong>in</strong>ts the technique <strong>of</strong> this vertical to scatterplots l<strong>in</strong>e and the and plot l<strong>in</strong>es parallel denote coord<strong>in</strong>ates the correspond<strong>in</strong>g<br />

to many measures other <strong>of</strong> this techniques. abstraction There level. The is no DAL meta-visualization and measures to organize<br />

but it is generic<br />

N M<br />

i=1 j=1 (F(i, j) ˆF(i, j)) enough 2<br />

to be (12) applied<br />

NM<br />

are displayed to the right <strong>of</strong> the DAL handle. With these plots, analysts<br />

alternative results canbut knowsimilarly the qualityto <strong>of</strong> the the current second DAL example <strong>in</strong> the context an <strong>in</strong>teractive <strong>of</strong> the entire chart is used to<br />

uared error, F(i, j) is the pixel set an value abstraction at (i, j) quality threshold space. (see Figure 4.12). It measures feature preservation, and its<br />

i, j) is the pixel value at ma<strong>in</strong> (i, j) <strong>in</strong>purpose the com- is abstraction. Interaction plays a central role <strong>in</strong> the selection <strong>of</strong> the right<br />

N are the length and height abstraction <strong>of</strong> the image. level.<br />

R = 10log 10 ( MAX2 I<br />

MSE ) (13)<br />

ignal-to-noise ratio and MAX I is the maxian<br />

see, the NNM employs the same method<br />

stance between two datasets. The only dify<br />

different methods to process the average<br />

es.<br />

ITY MEASURES WITH MULTIRESOLUe<br />

our work on <strong>in</strong>tegrat<strong>in</strong>g quality measures<br />

p effective and abstraction-aware multiresst<br />

we describe the <strong>in</strong>teraction tool that we<br />

sures. Then we present the <strong>in</strong>teractive opality<br />

measures. Next, we discuss the view<br />

pl<strong>in</strong>g, and f<strong>in</strong>ally we give Figure an overview 4.12: <strong>Visual</strong> <strong>of</strong> abstraction chart with threshold sett<strong>in</strong>g for the abstraction level and feedback<br />

(SBB) we use to control abstraction abstraction param-quality [42].<br />

Analysts can adjust the DAL <strong>of</strong> cluster<strong>in</strong>g Fig. 2. 1D plots <strong>of</strong> quality measures<br />

widget for all abstraction methods and the<br />

y brush the structure formed As by cluster<strong>in</strong>g a fourth example we choose the paper from Yang et al. [158]. They use quality<br />

metrics to support 4.2anInteractive dimensionOperations<br />

management system for high-dimensional data. Their<br />

res<br />

<strong>in</strong>teractive hierarchical Several <strong>in</strong>teractive dimension operations management are supported system <strong>in</strong> this called system. DOSFA Users can (Dimension Order<strong>in</strong>g,<br />

[12, Spac<strong>in</strong>g,<br />

ctive selection via brush<strong>in</strong>g<br />

move the slider bar <strong>in</strong> Figure 1 or the DAL handle <strong>in</strong> Figure 2 to adjust<br />

the data abstraction level. After the DAL has been changed, the<br />

16] us<strong>in</strong>g aFilter<strong>in</strong>g Approach) supports automatic and <strong>in</strong>teractive dimension order<strong>in</strong>g,<br />

he data selected through filter<strong>in</strong>g brush<strong>in</strong>g isand called spac<strong>in</strong>g. systemAn willexample generate an canabstracted be seendataset <strong>in</strong> Figure and display 4.13 where it <strong>in</strong> theon data the left hand side,<br />

e rema<strong>in</strong><strong>in</strong>g data are called the the data unselected is presented visualization. <strong>in</strong> an unchanged The DALs forway, selected and and onunselected the right data hand can be side adjusted<br />

<strong>in</strong>dependently. Users can also modify the location <strong>of</strong> one <strong>of</strong> the<br />

the data is visualized<br />

several after quality DOSFA was applied and the data is ordered, spaced and filtered. Di erent<br />

t the DAL for the selected data as well as<br />

view <strong>of</strong> the data generates boundaries <strong>of</strong> the selected region by click<strong>in</strong>g the left mouse button on<br />

ts to display them. Figureorders 1 shows<strong>of</strong> twodimensions such or near cantheshow boundary di erent and dragg<strong>in</strong>g patterns <strong>in</strong> the <strong>of</strong> desired the data direction. to theInuser. addi-Dependetion, the selected orderregion a can importance-oriented be moved by choos<strong>in</strong>g order a region is needed. on the<br />

on the<br />

nveys the quality measures task, for the a similarity-oriented selected<br />

veys the quality measures for the unselected<br />

An annotateddata pipel<strong>in</strong>e display, with and then these adjust<strong>in</strong>g stepstheis DAL presented for the region. <strong>in</strong> Figure This usually 4.14. It conta<strong>in</strong>s four<br />

means that the user knows the data subset that she wants to explore<br />

ma<strong>in</strong> steps: (A) a hierarchical structure <strong>of</strong> the dimensions is constructed, by group<strong>in</strong>g<br />

and wants to take advantage <strong>of</strong> the scalability <strong>of</strong> multiresolution visualization.<br />

<strong>in</strong>to clusters Alternatively andasimilar user can clusters first choose <strong>in</strong>to a DAL larger <strong>in</strong> the clusters; current (B) <strong>in</strong> the data<br />

similar dimensions<br />

transformation process selected dimensions region, and then are adjust filtered the selected/brush<strong>in</strong>g based on their boundary similarity to enlarge<br />

order<strong>in</strong>g dim<strong>in</strong>ish <strong>in</strong>fluences the size <strong>of</strong> the themapp<strong>in</strong>g region. This stage usually by means determ<strong>in</strong><strong>in</strong>g that an the order<strong>in</strong>g <strong>of</strong><br />

and importance;<br />

(C) the dimension<br />

acceptable data abstraction level had been found, but the area <strong>of</strong> <strong>in</strong>terest<br />

needs to be <strong>in</strong>creased or decreased.<br />

Analysts can also <strong>in</strong>struct the system to run the abstraction algorithm<br />

aga<strong>in</strong> to generate a new abstraction. For example, resampl<strong>in</strong>g<br />

can help analysts verify patterns that had been discovered <strong>in</strong> the previous<br />

samples. If a pattern still exists after resampl<strong>in</strong>g several times,<br />

this pattern is most likely a robust one. Furthermore, analysts can compare<br />

the abstraction measures from mutiple resampl<strong>in</strong>g, and select an


4.1.5 Examples 83<br />

Figure 4.13: Left: star glyphs represent<strong>in</strong>g orig<strong>in</strong>al data set. Right: visualized data after DOSFA<br />

was applied [158].<br />

Quality-Metrics-Driven Automation<br />

Source<br />

<strong>Data</strong><br />

A B C D<br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

Figure 4.14: Quality metrics pipel<strong>in</strong>e for example four from [158]: (A) construct hierarchical<br />

structure <strong>of</strong> dimensions by cluster<strong>in</strong>g; (B) filter dimensions by similarity and importance; (C) map<br />

dimensions order<strong>in</strong>g to visualization; (D) <strong>in</strong>fluence the view accord<strong>in</strong>g to the quality measured<br />

(spac<strong>in</strong>g the parallel coord<strong>in</strong>ates accord<strong>in</strong>g to their similarity). The user can steer all these steps,<br />

after <strong>in</strong>teract<strong>in</strong>g with the clustered dimensions showed <strong>in</strong> an InterR<strong>in</strong>g visualization.<br />

the visualization’s dimensions, or for mapp<strong>in</strong>g the more important dimensions to more<br />

prevalent visualization positions or to map them to more pre attentive visual attributes.<br />

(D) the quality <strong>of</strong> dimensions <strong>in</strong>fluences also the view transformation step by determ<strong>in</strong><strong>in</strong>g<br />

the spac<strong>in</strong>g between the dimensions <strong>in</strong> the parallel coord<strong>in</strong>ates accord<strong>in</strong>g to their similarity.<br />

All these “best” sett<strong>in</strong>gs can be automatically calculated by the system and the result is<br />

presented to the user. It is also possible to present the dimension hierarchies to the user<br />

with a InterR<strong>in</strong>g [159]. The user can <strong>in</strong>teract with the InterR<strong>in</strong>g, trigger<strong>in</strong>g the filter<strong>in</strong>g,<br />

order<strong>in</strong>g, and spac<strong>in</strong>g for the f<strong>in</strong>al result. This is represented by the user-<strong>in</strong>teraction arrows<br />

on the lower level <strong>of</strong> the pipel<strong>in</strong>e.<br />

This paper applies quality metrics <strong>in</strong> data space to improve scatterplots, parallel coord<strong>in</strong>ates<br />

and star glyphs for high-dimensional data. It measures correlation to f<strong>in</strong>d the<br />

best dimension order<strong>in</strong>g, projection, and view optimization for the data sets. The user can<br />

steer the process by <strong>in</strong>fluenc<strong>in</strong>g all the pipel<strong>in</strong>e steps.<br />

These four examples cover many aspects discussed <strong>in</strong> the previous sections, especially<br />

metrics calculated <strong>in</strong> the data vs. image space, di erent purposes, di erent measure types,<br />

di erent uses <strong>of</strong> the pipel<strong>in</strong>e, and di erent <strong>in</strong>teraction levels. Many <strong>of</strong> the papers we have<br />

reviewed have similar elements and functions, nonetheless there are others that deviate<br />

considerably from these ones. While we cannot provide the full set <strong>of</strong> examples <strong>in</strong> this<br />

section, we discuss <strong>in</strong> Section 4.1.6 some f<strong>in</strong>d<strong>in</strong>gs that stem from the analysis <strong>of</strong> the whole<br />

set, <strong>in</strong>clud<strong>in</strong>g those with uncommon approaches and list all the quality metrics pipel<strong>in</strong>es<br />

<strong>in</strong> Appendix A.3.


84 Chapter 4. A Systematization <strong>of</strong> Quality Metrics <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization<br />

4.1.6 F<strong>in</strong>d<strong>in</strong>gs<br />

In the follow<strong>in</strong>g, we discuss some major trends we have observed dur<strong>in</strong>g our analysis.<br />

From the visualization po<strong>in</strong>t <strong>of</strong> view we already discussed the role <strong>of</strong> meta-visualizations,<br />

that is, visualizations with the purpose to accommodate other visualizations. Dur<strong>in</strong>g the<br />

paper review we found very limited explicit discussions <strong>of</strong> this aspect that we deem extremely<br />

relevant. Many <strong>of</strong> the papers we have analyzed seem to assume that provid<strong>in</strong>g<br />

a simple list <strong>of</strong> <strong>in</strong>terest<strong>in</strong>g visualizations will automatically solve the user’s task. To the<br />

best <strong>of</strong> our knowledge, the only work that analyzes the issue explicitly and <strong>in</strong> great depth<br />

is the Trellis display [18], which organizes the display <strong>in</strong> a way to make patterns among<br />

views apparent. We believe a deeper <strong>in</strong>vestigation <strong>of</strong> this issue is needed.<br />

Interest<strong>in</strong>gly, some <strong>of</strong> the papers we reviewed do take care <strong>of</strong> the navigation issue, that<br />

is, how to explore configurations automatically found by the algorithm. These papers usually<br />

provide an additional visualization that permits to navigate from one configuration to<br />

another. For <strong>in</strong>stance, Johansson et al. provide a l<strong>in</strong>e chart visualization to <strong>in</strong>teractively<br />

show alternative projections <strong>in</strong> parallel coord<strong>in</strong>ates [82]. Similarly, “hierarchical dimension<br />

order<strong>in</strong>g” [158] uses the InterR<strong>in</strong>g visualization to the let the user navigate through<br />

alternative subsets <strong>of</strong> dimensions organized <strong>in</strong> a hierarchical fashion. F<strong>in</strong>ally, the Rankby-Feature<br />

framework [126] uses color-coded <strong>in</strong>teractive lists and scatterplot matrices to<br />

provide a preview <strong>of</strong> the statistical properties <strong>of</strong> each views.<br />

We also noticed a lack <strong>of</strong> systematic approaches to the order<strong>in</strong>g problem - every paper<br />

proposes its own method. The whole topic <strong>of</strong> seriation, <strong>in</strong>troduced <strong>in</strong> the early work <strong>of</strong><br />

Bert<strong>in</strong> [22] and discussed <strong>in</strong> depth by Hahsler et al. [62], deserves deeper <strong>in</strong>vestigation and<br />

acknowledgment. Additionally, <strong>in</strong>novative ways <strong>of</strong> order<strong>in</strong>g data dimensions may exist,<br />

like the eulerian tours and hamiltonian decompositions presented by Hurley et al. [75],<br />

which explore the possibility <strong>of</strong> repeat<strong>in</strong>g the axes to reduce dependency on a specific<br />

order.<br />

In Section 4.1.4, we listed a series <strong>of</strong> meta-visualizations that we have found, namely<br />

list and matrix (small multiples). We believe this list can be expanded if novel solutions<br />

are developed. A promis<strong>in</strong>g one we have noticed <strong>in</strong> a few papers, but not <strong>in</strong>cluded <strong>in</strong> the<br />

review (because they are not specifically us<strong>in</strong>g quality metrics) is the idea <strong>of</strong> arrang<strong>in</strong>g<br />

iconic versions <strong>of</strong> the visualizations generated <strong>in</strong> a scatterplot view (e.g., us<strong>in</strong>g MDS or<br />

similar techniques). Such a technique is for <strong>in</strong>stance proposed <strong>in</strong> the work <strong>of</strong> Yang et al.<br />

where pixel-based icons are laid out with an MDS projection <strong>in</strong> a scatterplot [156].<br />

Another issue we noticed from our analysis is the limited use <strong>of</strong> the visual mapp<strong>in</strong>g<br />

and view transformation functions <strong>in</strong> the pipel<strong>in</strong>e. More specifically, visual mapp<strong>in</strong>g is<br />

almost exclusively used as a way to generate alternative order<strong>in</strong>gs, tak<strong>in</strong>g <strong>in</strong>to account<br />

exclusively the mapp<strong>in</strong>g between the orig<strong>in</strong>al data dimensions and the visualization axes.<br />

But alternative mapp<strong>in</strong>gs can also be generated by l<strong>in</strong>k<strong>in</strong>g data dimensions to the whole<br />

spectrum <strong>of</strong> visual features like color, size, shape, etc., as is common <strong>in</strong> several systems<br />

based on visual languages like ggplot2[1], tableau[3], and protovis[2]). Pixnostics [120] is<br />

the only technique <strong>in</strong> our review present<strong>in</strong>g this k<strong>in</strong>d <strong>of</strong> a process supported by quality<br />

metrics.<br />

View transformation is also rarely used <strong>in</strong> the quality metrics pipel<strong>in</strong>e. The only<br />

example we found is the use <strong>of</strong> quality metrics to automatically select focus area parameters<br />

<strong>in</strong> table lens [8]. The automatic selection <strong>of</strong> <strong>in</strong>terest<strong>in</strong>g po<strong>in</strong>t <strong>of</strong> views <strong>in</strong> 3D scatterplots,<br />

for example, is one clear case where the use <strong>of</strong> quality metrics at the view transformation<br />

stage would be beneficial. Another one is the automatic highlight <strong>of</strong> <strong>in</strong>terest<strong>in</strong>g items <strong>in</strong>


4.1.7 Directions for Further Research 85<br />

a view (e.g., visual boost<strong>in</strong>g <strong>in</strong> pixel-based visualizations [109]).<br />

F<strong>in</strong>ally, the purposes we have considered can be roughly classified <strong>in</strong>to two broad higher<br />

level purposes: f<strong>in</strong>d<strong>in</strong>g <strong>in</strong>terest<strong>in</strong>g visualizations and scal<strong>in</strong>g visualizations to larger data<br />

sets. When consider<strong>in</strong>g these goals it is evident how cluster<strong>in</strong>g, correlation, outliers, and<br />

complex patterns support more the first goal, whereas image quality and feature preservation<br />

tend to support more the second one. One <strong>in</strong>terest<strong>in</strong>g pend<strong>in</strong>g issue is whether the<br />

use <strong>of</strong> quality metrics <strong>in</strong> high-dimensional data is conf<strong>in</strong>ed to these two general purposes.<br />

One purpose, which to the best <strong>of</strong> our knowledge is totally unexplored, is the use <strong>of</strong> quality<br />

metrics to automatically or semi-automatically compare di erent visual techniques <strong>of</strong> the<br />

same data.<br />

4.1.7 Directions for Further Research<br />

In the follow<strong>in</strong>g, we present a selected set <strong>of</strong> research issues we deem important for the<br />

advancement <strong>of</strong> quality-metrics-driven data visualization.<br />

Evaluation and applications<br />

Surpris<strong>in</strong>gly, none <strong>of</strong> the papers we have analyzed reported on user evaluation. While<br />

we are conv<strong>in</strong>ced that quality metrics are useful and need to be further developed, we<br />

also realize that the whole idea has not yet been tested. Usefulness is therefore one <strong>of</strong><br />

the most important aspect to consider, followed by usability issues. To the best <strong>of</strong> our<br />

knowledge, there are no studies report<strong>in</strong>g on the use <strong>of</strong> the quality metrics approach <strong>in</strong><br />

real-world sett<strong>in</strong>gs. Observatory studies or even simple case studies would greatly improve<br />

the approach and most likely direct research to specific issues hard to anticipate without<br />

observation.<br />

Perceptual tun<strong>in</strong>g<br />

All the metrics that work <strong>in</strong> the image space try to simulate the human pattern recognition<br />

mach<strong>in</strong>ery to some extend. They try to partially substitute human vision with image<br />

process<strong>in</strong>g algorithms with the (implicit) assumption that algorithm rank<strong>in</strong>gs will match<br />

user rank<strong>in</strong>gs. This assumption needs a much deeper <strong>in</strong>vestigation. Our study presented<br />

<strong>in</strong> Section 3.2 and published <strong>in</strong> [134], where quality metrics rank<strong>in</strong>gs <strong>of</strong> clusters <strong>in</strong> scatterplots<br />

are compared to human rank<strong>in</strong>gs, represents a first step <strong>in</strong> this direction. In addition,<br />

it is necessary to validate and tune the image space metrics <strong>in</strong> a way that the parameters<br />

take models <strong>of</strong> human perception <strong>in</strong>to account. Excellent examples <strong>of</strong> <strong>in</strong>itial steps <strong>in</strong> this<br />

direction are <strong>in</strong> the follow<strong>in</strong>g papers [81, 94, 116], where the perception <strong>of</strong> visual patterns<br />

has been tuned accord<strong>in</strong>g to user studies aimed at model<strong>in</strong>g the way humans perceive them.<br />

Metrics systematization<br />

Dur<strong>in</strong>g our review we collected a very large number <strong>of</strong> alternative quality metrics, some<br />

calculated <strong>in</strong> data space some <strong>in</strong> image space. While this proliferation <strong>of</strong> metrics is a sign<br />

<strong>of</strong> the richness <strong>of</strong> this approach, it is currently very hard to compare them and understand<br />

which one is suitable for a given task. Some authors provide a number <strong>of</strong> metrics <strong>in</strong> the<br />

same environment lett<strong>in</strong>g the user choose which one to use. Nonetheless, we fear that<br />

this approach with limited guidance may not be e ective for end users, especially, if there<br />

is a lack <strong>of</strong> understand<strong>in</strong>g <strong>of</strong> the level <strong>of</strong> redundancy between one metric and another.<br />

Similarly, given the above mentioned dichotomy, it is hard if not impossible to state which


86 Chapter 4. A Systematization <strong>of</strong> Quality Metrics <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization<br />

approach yields the best results <strong>in</strong> which contexts. On a side note, the mixed approach<br />

<strong>of</strong> giv<strong>in</strong>g the user the possibility to comb<strong>in</strong>e several metrics <strong>in</strong>to a composite one needs<br />

much more <strong>in</strong>vestigation, validation, and guidance.<br />

Scalability<br />

Image space and data space quality metrics have di erent scalability issues. Quality<br />

metrics <strong>in</strong> image space have the advantage <strong>of</strong> be<strong>in</strong>g <strong>in</strong>dependent from the orig<strong>in</strong>al data size,<br />

e.g., [42], that is, their computational complexity only depends on the screen dimensions.<br />

However, as data grows <strong>in</strong> size, virtually all visualizations experience some degree <strong>of</strong><br />

degradation that may <strong>in</strong>fluence the discrim<strong>in</strong>atory power <strong>of</strong> the metric. For <strong>in</strong>stance,<br />

visualizations with a lot <strong>of</strong> clutter might h<strong>in</strong>der the discovery <strong>of</strong> the desired patterns.<br />

Quality metrics <strong>in</strong> data space, on the other hand, are expected to be more robust <strong>in</strong> terms<br />

<strong>of</strong> pattern detection, but their computation is directly a ected by data size. A thorough<br />

<strong>in</strong>vestigation <strong>of</strong> these issues and how to f<strong>in</strong>d a compromise between the two is clearly an<br />

<strong>in</strong>terest<strong>in</strong>g subject for future research.<br />

4.1.8 Limitations<br />

Our work has some important limitations to take <strong>in</strong>to account; first <strong>of</strong> all its subjective<br />

nature. We are by no means suggest<strong>in</strong>g this is the only way to describe the current state <strong>of</strong><br />

quality metrics <strong>in</strong> high-dimensional visualization. There are no doubt a number <strong>of</strong> equally<br />

good alternative ways to describe it; this chapter provides a much-needed start<strong>in</strong>g po<strong>in</strong>t.<br />

We encourage the reader to use this as a way to get <strong>in</strong>spiration for further research and<br />

to understand its status.<br />

Similarly, while we did our best to follow a thorough methodology (see Section 4.1.2),<br />

there might be relevant papers we overlooked. Even though we tried to be very broad<br />

and <strong>in</strong>clusive, our background heavily <strong>in</strong>fluences the review. Especially, given our focus<br />

on Computer Science we might have missed relevant literature from Statistics. However,<br />

we feel confident that at this po<strong>in</strong>t <strong>of</strong> our review any additional paper would not change<br />

the structure or the elements <strong>of</strong> our model. In other terms, the real goal <strong>of</strong> our review<br />

was not to <strong>in</strong>clude every possible paper on the discussed matter but more to have enough<br />

coverage to build a coherent and useful picture.<br />

4.1.9 Conclusion and Future Work<br />

We presented a systematic analysis <strong>of</strong> quality metrics as a way to support the exploration<br />

<strong>of</strong> high-dimensional data sets. Quality metrics have been used <strong>in</strong> a variety <strong>of</strong> contexts and<br />

purposes. With this work we started a collection <strong>of</strong> these disparate systems under one<br />

umbrella and provided a way to reason about their characteristic features. Specifically,<br />

we presented an analysis <strong>of</strong> the visualization techniques, the quality metrics, and the<br />

process<strong>in</strong>g pipel<strong>in</strong>e. The analysis has two ma<strong>in</strong> outcomes. First, it permits to describe the<br />

methods <strong>in</strong> detail and to capture their key components. Second, as shown <strong>in</strong> Section 4.1.6<br />

and Section 4.1.7, it permits to spot <strong>in</strong>terest<strong>in</strong>g research gaps and promis<strong>in</strong>g directions<br />

for future research. While we consider this work just an <strong>in</strong>itial step, we hope it will spur<br />

new ideas and support researchers and practitioners <strong>in</strong> the development <strong>of</strong> <strong>in</strong>terest<strong>in</strong>g new<br />

applications and novel techniques.


4.2. <strong>Visual</strong> Cluster Separation Factors: Sketch<strong>in</strong>g a Taxonomy 87<br />

4.2 <strong>Visual</strong> Cluster Separation Factors: Sketch<strong>in</strong>g a Taxonomy 4<br />

The quality metrics systematization presented <strong>in</strong> the previous section was followed by<br />

a qualitative analysis <strong>of</strong> concrete measures from this large pool. Here we turned our<br />

focus to the quality metrics for scatterplots that are designed to identify the visualizations<br />

represent<strong>in</strong>g best the clusters <strong>in</strong> classified data. That means they rank scatterplot views<br />

that separate the data classes well - better than views with mixed classes. Our idea was<br />

to use two <strong>of</strong> these metrics to identify the best visualizations, and <strong>in</strong>dependent <strong>of</strong> the<br />

data. Simultaneously, we wanted to give advice as to whether it would be best to use<br />

a 2D scatterplot, a 3D scatterplot, or a SPLOM for a specific data set. We therefore<br />

computed the measures for di erent data sets, and surpris<strong>in</strong>gly identified that these are<br />

not robust with respect to the di erent cluster shapes encountered <strong>in</strong> the analyzed data.<br />

Led by this <strong>in</strong>sight, we analyzed more deeply all the cases us<strong>in</strong>g open and axial cod<strong>in</strong>g<br />

<strong>of</strong> failure reasons build<strong>in</strong>g up a taxonomy <strong>of</strong> visual cluster separation factors. We named<br />

this process a qualitative evaluation.<br />

The next sections will sketch the methodology and the results <strong>of</strong> this evaluation by present<strong>in</strong>g<br />

<strong>in</strong>troductory ideas <strong>in</strong> Section 4.2.1 that led to this work. Section 4.2.2 will present<br />

a short description <strong>of</strong> the methodology, followed by the taxonomy axes <strong>in</strong> Section 4.2.3<br />

and conclud<strong>in</strong>g <strong>in</strong> Section 4.2.4 with a discussion about the limitations <strong>of</strong> this work and<br />

possible future research mak<strong>in</strong>g use <strong>of</strong> the developed taxonomy.<br />

4.2.1 Introduction<br />

An impressive number <strong>of</strong> quality measures, dimension reduction techniques, and visualizations<br />

for high-dimensional data have been developed <strong>in</strong> the past. The more exist, the<br />

harder it is for users to f<strong>in</strong>d the right choice for their tasks. The literature is not provid<strong>in</strong>g<br />

any guidance on how to choose the right visualization or dimension reduction technique<br />

for the complex multidimensional data. Quality metrics were designed to filter the high<br />

number <strong>of</strong> representations and provide an <strong>in</strong>terest<strong>in</strong>g selection to the user. Sedlmair et<br />

al. [122] <strong>in</strong>vestigate to which extent the existent measures can accomplish this task. They<br />

choose the 2D-HDM (Section 3.1.3) and the DCM [129] (also used <strong>in</strong> the empirical evaluation<br />

<strong>in</strong> Section 3.2 and described <strong>in</strong> Section 3.2.1) as quality metrics to judge the di erent<br />

data projections regard<strong>in</strong>g their ability to represent the clusters <strong>in</strong> classified data sets.<br />

The measures where designed for 2D scatterplots and extended by the authors to work<br />

also on scatterplot matrices (SPLOMs) and 3D scatterplots. This decision is motivated<br />

by the fact that scatterplots is a widely used technique to display high-dimensional projections,<br />

and <strong>of</strong>ten SPLOMs are used to see more than two dimensions <strong>of</strong> the data. S<strong>in</strong>ce<br />

analysts also quiet <strong>of</strong>ten work with 3D scatterplots, this technique was also <strong>in</strong>cluded <strong>in</strong><br />

the study. To obta<strong>in</strong> the lower-dimensional embedd<strong>in</strong>gs, di erent dimension reduction<br />

techniques were used - the well known PCA, robust PCA, MDS and t-SNE [143]. Initial<br />

4 This chapter is based on the collaboration with UBC where I participated <strong>in</strong> a project on quality<br />

measures, lead by Pr<strong>of</strong>. T. Munzner and M. Sedlmair. The work resulted <strong>in</strong> a jo<strong>in</strong>t EuroVis publication<br />

[122]. S<strong>in</strong>ce I was not <strong>in</strong> the lead <strong>in</strong> this project, this chapter is present<strong>in</strong>g briefly the methodology and<br />

the results, and a deeper description can be gathered from the paper itself. Please note that the full<br />

taxonomy <strong>of</strong> cluster separation factors and data characteristics is not my contribution. S<strong>in</strong>ce I was part <strong>of</strong><br />

the qualitative analysis I would like to recall the results <strong>in</strong> my thesis, and provide a personal outlook on<br />

further research ideas at the end <strong>of</strong> this chapter.


88 Chapter 4. A Systematization <strong>of</strong> Quality Metrics <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization<br />

experiments on di erent data sets showed that by us<strong>in</strong>g these visualizations and projection<br />

techniques, the measures are not able to detect di erent cluster shapes <strong>in</strong> the data<br />

projections. Compared to a human judgement, they surpris<strong>in</strong>gly provided mismatches by<br />

rank<strong>in</strong>g visualizations high, when the human rank was low, and rank<strong>in</strong>g visualizations<br />

low, when the human rank was high. This implies that good visualizations are sometimes<br />

missed and bad visualizations are ranked high, both cases that should be avoided.<br />

These surpris<strong>in</strong>g outcomes shifted the focus <strong>of</strong> the study from a guide for the user to the<br />

right choice <strong>of</strong> a visualization technique and a dimension reduction technique dependent on<br />

their data, to an <strong>in</strong> depth analysis <strong>of</strong> di erent visual separability factors. A sketch <strong>of</strong> the<br />

methodology <strong>of</strong> the systematic study <strong>of</strong> the di erences between the computed measures<br />

and the human judgement is presented <strong>in</strong> the next section.<br />

4.2.2 Method<br />

To discover the divergences between human judgement and measure ranks, a qualitative<br />

data study was conducted. The first two authors <strong>of</strong> the paper 5 manually <strong>in</strong>spected over<br />

800 visualizations (comb<strong>in</strong>ation <strong>of</strong> 75 data sets, 4 dimension reduction techniques, and 3<br />

visualizations - 2D, 3D scatterplot and SPLOM) and judged their quality <strong>in</strong> display<strong>in</strong>g<br />

data clusters. Their judgements were compared with the measure ranks and the mismatches<br />

were analyzed. “The <strong>in</strong>vestigators generated a detailed set <strong>of</strong> characteristics that<br />

<strong>in</strong>fluenced cluster separability <strong>in</strong> general, and specific reasons why the measures failed <strong>in</strong><br />

the cases where they found a mismatch. Based on separability characteristics and failure<br />

reasons, we generate a higher-level taxonomy <strong>of</strong> factors, which we iteratively ref<strong>in</strong>ed <strong>in</strong><br />

multiple passes” [122]. This was done “not only by consider<strong>in</strong>g its explanatory clarity<br />

and power, but also by mapp<strong>in</strong>g the ranges where each measure was successful along the<br />

factor axes, and by plac<strong>in</strong>g some <strong>of</strong> the studied data sets along them. Figure 4.15 shows<br />

the measure success ranges on a simplified version <strong>of</strong> the taxonomy” [122]. The study<br />

consisted <strong>of</strong> four stages: (1) choos<strong>in</strong>g variables for study; (2) generat<strong>in</strong>g data set <strong>in</strong>stances<br />

With<strong>in</strong>-Class Factors<br />

Count<br />

Size<br />

Clump<strong>in</strong>ess<br />

few<br />

small<br />

equidistant<br />

x<br />

uni-rand. one spot many spots<br />

x<br />

many<br />

large<br />

Density sparse x dense<br />

clumpy<br />

Outlier none x many<br />

x<br />

Between-Class Factors<br />

Class/Po<strong>in</strong>t<br />

Count<br />

Variance <strong>of</strong> Count<br />

Variance <strong>of</strong> Size<br />

few classes/ x<br />

many po<strong>in</strong>ts<br />

similar<br />

Variance <strong>of</strong> Density similar<br />

Mixture<br />

similar<br />

random<br />

x<br />

x<br />

x<br />

x<br />

VS.<br />

many classes/<br />

few po<strong>in</strong>ts<br />

different<br />

different<br />

different<br />

non-random:<br />

equidistant/<br />

<strong>in</strong>terwoven<br />

Shape<br />

narrow<br />

round<br />

Isotropy<br />

Curvature<br />

x<br />

curvy<br />

Split<br />

Variance <strong>of</strong> Shape<br />

Inner-Outer<br />

Position<br />

contiguous<br />

similar<br />

non-existent<br />

x<br />

x<br />

x<br />

VS.<br />

VS.<br />

split<br />

different<br />

existent<br />

Centroid evocative x<br />

mislead<strong>in</strong>g<br />

Class Separation full overlap<br />

x<br />

partial overlap adjacent separate<br />

distant<br />

Measures:<br />

Centroid<br />

Grid<br />

<strong>Data</strong>sets:<br />

gaussian: synth., MDS, Fig. 5(a)<br />

fisheries: real, MDS, Fig. 5(d)<br />

x spambase: real, PCA, Fig. 5(b)<br />

hiv: real, t-SNE, Fig. 5(e)<br />

shuttle: real, MDS, Fig. 5(c)<br />

entangled: synth., t-SNE, Fig. 5(f)<br />

Figure 4.15: Taxonomy <strong>of</strong> factors <strong>in</strong> visual cluster separation, where factor axes are marked to<br />

show the ranges where exist<strong>in</strong>g measures are successful; gaps represent failure cases. The centroid<br />

measure (CDM) is marked <strong>in</strong> blue and the grid (2D-HDM) is marked <strong>in</strong> red. All positions are<br />

approximate estimates. Marked along the factor axes are six data sets that are exemplified <strong>in</strong> the<br />

paper. (Used with permission by [122].)<br />

5 Michael Sedlmair and myself.


4.2.3 <strong>Visual</strong> Cluster Separation Taxonomy 89<br />

and comput<strong>in</strong>g measures; (3) open cod<strong>in</strong>g and measure evaluation; and (4) axial cod<strong>in</strong>g<br />

and taxonomy build<strong>in</strong>g, and details can be found <strong>in</strong> the paper [122].<br />

4.2.3 <strong>Visual</strong> Cluster Separation Taxonomy<br />

Class separation <strong>in</strong> a visualization is <strong>in</strong>fluenced by di erent characteristics <strong>of</strong> the data set.<br />

Figure 4.16 presents the factors that a ect visual cluster separation. These are grouped <strong>in</strong><br />

“With<strong>in</strong>-Class factors” that are determ<strong>in</strong>ed by the structure or appearance <strong>of</strong> a s<strong>in</strong>gle class<br />

and “Between-Class factors” that represent <strong>in</strong>teractions between two or more classes [122].<br />

Influence<br />

Shape Po<strong>in</strong>t Distance Scale<br />

Count<br />

Size<br />

Density<br />

Clump<strong>in</strong>ess<br />

Outlier<br />

Shape<br />

equidistant<br />

With<strong>in</strong>-Class Factors<br />

Isotropy<br />

few<br />

small<br />

sparse<br />

none<br />

narrow<br />

uniformly<br />

random<br />

one<br />

dense spot<br />

Curvature<br />

many<br />

large<br />

dense<br />

many dense<br />

spots<br />

many<br />

curvy<br />

clumpy<br />

Variance<br />

Class/Po<strong>in</strong>t<br />

Count<br />

Variance <strong>of</strong><br />

Count<br />

few classes<br />

many po<strong>in</strong>ts<br />

similar<br />

many classes<br />

few po<strong>in</strong>ts<br />

different<br />

Variance <strong>of</strong><br />

Size similar different<br />

Variance <strong>of</strong><br />

Density similar different<br />

Mixture<br />

Split<br />

Variance <strong>of</strong><br />

Shape<br />

Between-Class Factors<br />

random<br />

contiguous<br />

similar<br />

VS.<br />

equidistant<br />

VS.<br />

<strong>in</strong>terwoven<br />

split<br />

different<br />

Position<br />

Centroid<br />

round<br />

evocative<br />

mislead<strong>in</strong>g<br />

Inner-Outer<br />

Position<br />

Class<br />

Separation<br />

non-existent<br />

full<br />

overlap<br />

partial<br />

overlap<br />

adjacent<br />

separate<br />

existent<br />

distant<br />

Figure 4.16: A taxonomy <strong>of</strong> data characteristics with respect to class separation <strong>in</strong> scatterplots.<br />

Some factors are organized as axes (arrows) while others are b<strong>in</strong>ned. Between-Class factors <strong>of</strong>ten<br />

result from the variance <strong>of</strong> With<strong>in</strong>-Class factors (horizontal dependencies), and factors at the top<br />

can strongly <strong>in</strong>fluence factors below them (vertical dependencies). Class Separation is therefore<br />

dependent on all other factors (used with permission by [122]).<br />

In brief, we recall the characteristics determ<strong>in</strong><strong>in</strong>g these factor groups. Four categories<br />

describe the two factor groups, the scale, po<strong>in</strong>t distance, shape, and position category<br />

that <strong>in</strong>fluence each other from first to last. Enclosed <strong>in</strong> these groups the With<strong>in</strong>-Class<br />

factors are: count, size, density, clump<strong>in</strong>ess, outlier, shape and centroid. These factors<br />

describe the structure and appearance <strong>of</strong> s<strong>in</strong>gle classes. Variance <strong>of</strong> the With<strong>in</strong>-Class<br />

factors across multiple classes determ<strong>in</strong>e the Between-Class factors sketched on the right<br />

side <strong>of</strong> the figure. In this study, the follow<strong>in</strong>g comb<strong>in</strong>ations <strong>in</strong>fluenc<strong>in</strong>g the perceived<br />

cluster shapes were identified: class-po<strong>in</strong>t count, variance <strong>of</strong> (po<strong>in</strong>t) count, variance <strong>of</strong><br />

(class) size, variance <strong>of</strong> (class) density, mixture (<strong>of</strong> classes), split (<strong>of</strong> classes), variance <strong>of</strong><br />

shape, <strong>in</strong>ner-outer position, class separation. S<strong>in</strong>ce the arrows <strong>in</strong>dicate the <strong>in</strong>fluence <strong>of</strong><br />

the factors, horizontally from left to right, and vertically from top to bottom, the factor<br />

positioned <strong>in</strong> the lower right corner, class separation, can be strongly <strong>in</strong>fluenced by all the<br />

other factors.<br />

The two quality measures have di erent strengths, so they perform di erently while<br />

encounter<strong>in</strong>g these di erent factors. Figure 4.15 marks along the factor axes the measures’


90 Chapter 4. A Systematization <strong>of</strong> Quality Metrics <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization<br />

performances. This clearly shows the gaps where current measures can not achieve good<br />

results. The study identifies that the centroid factor is <strong>in</strong>fluenced by many other factors<br />

and the centroid based measure (CDM) alone cannot identify all di erent constellations <strong>of</strong><br />

visual classes. CDM is vulnerable with respect to shape, clump<strong>in</strong>ess, outliers, variance <strong>of</strong><br />

count, <strong>of</strong> size, or <strong>of</strong> density, and <strong>in</strong>ner-outer position [122]. Similar, HDM also encountered<br />

a number <strong>of</strong> problems while identify<strong>in</strong>g classes <strong>in</strong> visualizations. The biggest issue is with<br />

narrow, adjacent classes that coexist <strong>in</strong> the same grid cell and span over di erent cells.<br />

The measure emerged to be sensitive to the grid size, despite previous results from the<br />

literature. The most di cult factor was the class separation, the measure fail<strong>in</strong>g <strong>in</strong> contact<br />

with overlapp<strong>in</strong>g classes. Depend<strong>in</strong>g on the grid, the classes were sometimes rated good,<br />

even though present<strong>in</strong>g a high overlap or class split.<br />

The goal <strong>of</strong> this taxonomy is guid<strong>in</strong>g others <strong>in</strong> design<strong>in</strong>g, us<strong>in</strong>g, and evaluat<strong>in</strong>g cluster<br />

separability measures. Other researchers can test di erent data sets and map their features<br />

onto the taxonomy axes. This will give an overview <strong>of</strong> the coverage <strong>of</strong> relevant factors by<br />

the particular measures and help <strong>in</strong> improv<strong>in</strong>g or develop<strong>in</strong>g more reliable measures <strong>in</strong> the<br />

future.<br />

4.2.4 Discussion and Further Research<br />

This study shows that so far measures were developed and validated on far too few and<br />

too simple data sets. The real world is much more complex, and s<strong>in</strong>ce the data complexity<br />

rises, a more systematic development <strong>of</strong> the measures is needed. As we saw <strong>in</strong> the previous<br />

section, more aspects can be identified <strong>in</strong> real data sets, that are not covered yet by<br />

exist<strong>in</strong>g measures. In the follow<strong>in</strong>g, we present a list <strong>of</strong> issues that emerged as a result <strong>of</strong><br />

this study, and which we deem important for further research <strong>in</strong> the area <strong>of</strong> quality metrics.<br />

Taxonomy based evaluation and systematization<br />

A large number <strong>of</strong> metrics for cluster separation <strong>in</strong> scatterplots have been developed. They<br />

all try to discover good views display<strong>in</strong>g the data clusters. We believe that there are two<br />

ma<strong>in</strong> reasons, why there are a variety <strong>of</strong> measures for this task: di erent strengths <strong>of</strong> measures<br />

and miss<strong>in</strong>g unified picture <strong>of</strong> exist<strong>in</strong>g approaches. First, the measures have di erent<br />

strengths accord<strong>in</strong>g to the factors <strong>of</strong> the taxonomy. They cannot cover the entire spectrum<br />

<strong>of</strong> data characteristics, and therefore focus just on a subset <strong>of</strong> these. Us<strong>in</strong>g the metrics<br />

for the area that they cannot cope with will lead to wrong results. Therefore, guided by<br />

the taxonomy presented before, an evaluation <strong>of</strong> the existent metrics is needed that can<br />

help users to choose the right measure depend<strong>in</strong>g on their data. Second, the variety <strong>of</strong><br />

measures makes the development <strong>of</strong> new ones di cult s<strong>in</strong>ce a unify<strong>in</strong>g picture is miss<strong>in</strong>g.<br />

Guided by the taxonomy axes, the existent approaches can be evaluated and their ranges<br />

<strong>of</strong> success can be marked to them. This analysis would provide a good systematization<br />

<strong>of</strong> current approaches spott<strong>in</strong>g the data characteristics that have to be addressed <strong>in</strong> the<br />

future and lead the researchers through the variety <strong>of</strong> approaches.<br />

Taxonomy based measure development<br />

After the gaps <strong>of</strong> existent measures are identified, new research can be conducted to cover<br />

the data characteristics miss<strong>in</strong>g so far. We believe that it is hard to develop one s<strong>in</strong>gle<br />

measure to cover all these factors, but hav<strong>in</strong>g di erent measures and be<strong>in</strong>g aware <strong>of</strong> their<br />

coverage potential along these axes helps <strong>in</strong> avoid<strong>in</strong>g false rank<strong>in</strong>gs <strong>in</strong> the future.


4.2.4 Discussion and Further Research 91<br />

New taxonomies for di erent visualization techniques<br />

While this taxonomy focuses on one prom<strong>in</strong>ent visualization technique, the scatterplot,<br />

there are also metrics designed for other high-dimensional visualization techniques like categorized<br />

<strong>in</strong> Section 4.1.4. Di erent visualization techniques will need di erent factors to<br />

characterize di erent patterns (e.g., cluster separation). Even though a taxonomy like this<br />

is laborious, the benefits <strong>of</strong> it can improve the development <strong>of</strong> metrics for these techniques.<br />

New taxonomies for di erent quality metric factors<br />

We have seen <strong>in</strong> Section 4.1.4 that di erent patterns are quantified by measures, and a<br />

systematization <strong>of</strong> the factors that <strong>in</strong>fluence them is miss<strong>in</strong>g for other factors too. Factors<br />

like correlation, outliers, complex patterns, image quality, or feature preservation are<br />

miss<strong>in</strong>g such a taxonomy. Hav<strong>in</strong>g all these taxonomies – which would be the ideal case<br />

scenario – it would be possible to identify <strong>in</strong>terrelations between di erent patterns and<br />

how they are represented <strong>in</strong> visualizations. We believe that these <strong>in</strong>sights can help <strong>in</strong><br />

comb<strong>in</strong><strong>in</strong>g measures to identify more than one pattern.<br />

Metrics for dimension reduction properties<br />

Dimension reduction techniques are <strong>of</strong>ten used to reduce the dimensionality <strong>of</strong> the data<br />

sets before display<strong>in</strong>g them on the screen. The metrics are always applied on dimension<br />

reduced data sets, so artifacts <strong>in</strong>cluded by these techniques cannot be excluded. A study<br />

<strong>of</strong> how di erent data characteristics are ma<strong>in</strong>ta<strong>in</strong>ed or obscured by these techniques, can<br />

be conducted by compar<strong>in</strong>g di erent techniques, or the same technique with di erent<br />

parameter sett<strong>in</strong>gs on the same data set. As far as we know, there are no studies report<strong>in</strong>g<br />

on this type <strong>of</strong> analysis, and we believe it to be an <strong>in</strong>terest<strong>in</strong>g topic for future research. Also<br />

quality measures can be designed to automatically detect structure changes, by parameter<br />

or technique change. Properties like noise <strong>in</strong>variance, rotation <strong>in</strong>variance, scalability with<br />

respect to data po<strong>in</strong>ts and dimensions, can be explored by new quality metrics.


92 Chapter 4. A Systematization <strong>of</strong> Quality Metrics <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization


5<br />

<strong>Visual</strong> Subspace Analysis <strong>of</strong><br />

<strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

Contents<br />

„<strong>Visual</strong> ideas comb<strong>in</strong>ed with technology comb<strong>in</strong>ed with personal <strong>in</strong>terpretation<br />

equals photography. Each must hold it’s own; if it doesn’t, the th<strong>in</strong>g<br />

collapses.”<br />

Arnold Newman<br />

5.1 <strong>Visual</strong> Exploration for Subspace Cluster<strong>in</strong>g . . . . . . . . . . . 94<br />

5.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94<br />

5.1.2 Subspace Cluster<strong>in</strong>g Algorithms . . . . . . . . . . . . . . . . . . 96<br />

5.1.3 Task Def<strong>in</strong>ition and Design Space for <strong>Visual</strong> Subspace Cluster<br />

Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99<br />

5.1.4 The ClustNails System . . . . . . . . . . . . . . . . . . . . . . . . 101<br />

5.1.5 Use Case and System Comparison . . . . . . . . . . . . . . . . . 106<br />

5.1.6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . 109<br />

5.2 <strong>Visual</strong> <strong>Analytics</strong> <strong>of</strong> Subspace Search . . . . . . . . . . . . . . . . 110<br />

5.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110<br />

5.2.2 Subspace Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 112<br />

5.2.3 Proposed Analytical Workflow . . . . . . . . . . . . . . . . . . . 113<br />

5.2.4 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120<br />

5.2.5 Discussion and Possible Extensions . . . . . . . . . . . . . . . . . 124<br />

5.2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127<br />

S<br />

ubspace cluster<strong>in</strong>g addresses an important problem <strong>in</strong> cluster<strong>in</strong>g multidimensional<br />

data. In sparse multidimensional data, many dimensions are irrelevant and obscure<br />

the cluster boundaries. Subspace cluster<strong>in</strong>g helps by m<strong>in</strong><strong>in</strong>g the clusters present <strong>in</strong> only<br />

locally relevant subsets <strong>of</strong> dimensions. However, understand<strong>in</strong>g the result <strong>of</strong> subspace<br />

cluster<strong>in</strong>g by analysts is not trivial. In addition to the group<strong>in</strong>g <strong>in</strong>formation, relevant<br />

sets <strong>of</strong> dimensions and overlaps between groups, both <strong>in</strong> terms <strong>of</strong> dimensions and records,<br />

need to be analyzed. In Section 5.1, we present an <strong>in</strong>teractive visualization system called<br />

ClustNails to analyze, navigate, relate, and understand subspace cluster<strong>in</strong>g results. Real<br />

world data sets are used to demonstrate the functionality <strong>of</strong> the system.<br />

Additionally, high-dimensional data spaces <strong>of</strong>ten consist <strong>of</strong> comb<strong>in</strong>ed features that measure<br />

di erent properties, <strong>in</strong> which case the particular relationships between the various<br />

properties may not be clear to the analysts a priori s<strong>in</strong>ce it can only be revealed if appropriate<br />

feature comb<strong>in</strong>ations (subspaces) <strong>of</strong> the data are taken <strong>in</strong>to consideration. Consider<strong>in</strong>g<br />

just a s<strong>in</strong>gle subspace is, however, <strong>of</strong>ten not su cient s<strong>in</strong>ce di erent subspaces may show<br />

complementary, conjo<strong>in</strong>tly, or contradict<strong>in</strong>g relations between data items. Useful <strong>in</strong>forma-


94 Chapter 5. <strong>Visual</strong> Subspace Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

tion may consequently rema<strong>in</strong> embedded <strong>in</strong> sets <strong>of</strong> subspaces <strong>of</strong> a given high-dimensional<br />

<strong>in</strong>put data space.<br />

Rely<strong>in</strong>g on the notion <strong>of</strong> subspaces <strong>in</strong> Section 5.2, we propose a novel method for the<br />

visual analysis <strong>of</strong> high-dimensional data <strong>in</strong> which we employ an <strong>in</strong>terest<strong>in</strong>gness-guided<br />

subspace search algorithm to detect a candidate set <strong>of</strong> subspaces. Us<strong>in</strong>g proper def<strong>in</strong>ed<br />

subspace similarity functions we provide an <strong>in</strong>teractive exploration environment to compare<br />

and relate subspaces with respect to their topological similarities and dimension<br />

similarities. Real and synthetic data sets are used to demonstrate our approach.<br />

Parts <strong>of</strong> this chapter appeared <strong>in</strong> the follow<strong>in</strong>g publications [135, 136].<br />

5.1 <strong>Visual</strong> Exploration for Subspace Cluster<strong>in</strong>g<br />

In this section, we <strong>in</strong>troduce a visual subspace cluster analysis system called ClustNails. It<br />

<strong>in</strong>tegrates several novel visualization techniques with various user <strong>in</strong>teraction facilities to<br />

support the navigation and <strong>in</strong>terpretation <strong>of</strong> subspace cluster<strong>in</strong>g results. We demonstrate<br />

the e ectiveness <strong>of</strong> the proposed system by analyz<strong>in</strong>g real world data sets and compar<strong>in</strong>g<br />

it to other exist<strong>in</strong>g visual subspace cluster analysis systems.<br />

This section is organized as follows. In Section 5.1.1, we elaborate what aspects motivated<br />

our research <strong>in</strong> this area. In Section 5.1.2, we <strong>in</strong>troduce the subspace cluster<strong>in</strong>g<br />

problem and po<strong>in</strong>t to important overview articles <strong>in</strong> this area. We also expla<strong>in</strong> <strong>in</strong> Section<br />

5.1.3 the challenges <strong>in</strong> design<strong>in</strong>g e ective visualization tools for subspace cluster<strong>in</strong>g<br />

analysis tasks. In Section 5.1.4, we provide an overall view <strong>of</strong> the system as well as detailed<br />

visualization and order<strong>in</strong>g techniques. In Section 5.1.5, we validate the system with real<br />

world data sets and compare it with a state <strong>of</strong> the art system, and Section 5.1.6 concludes.<br />

5.1.1 Motivation<br />

Cluster<strong>in</strong>g is one <strong>of</strong> the most prom<strong>in</strong>ent techniques used to analyze large and complex data<br />

sets, and visualization is <strong>of</strong>ten helpful <strong>in</strong> understand<strong>in</strong>g the output <strong>of</strong> a given cluster<strong>in</strong>g<br />

method. A cluster<strong>in</strong>g algorithm assesses the relationships among objects <strong>of</strong> a data set by<br />

organiz<strong>in</strong>g objects <strong>in</strong>to clusters, such that objects with<strong>in</strong> a cluster are similar to each other<br />

but dissimilar from objects <strong>in</strong> other clusters. Cluster<strong>in</strong>g has a wide range <strong>of</strong> application<br />

<strong>in</strong> areas such as bus<strong>in</strong>ess <strong>in</strong>telligence, pattern recognition, image or document analysis,<br />

and bio<strong>in</strong>formatics. With the fast development <strong>of</strong> modern technologies, vast amounts <strong>of</strong><br />

high-dimensional data are generated. This poses new challenges for cluster<strong>in</strong>g that require<br />

specialized solutions.<br />

The need for subspace cluster<strong>in</strong>g stems from the well-known “curse <strong>of</strong> dimensionality”,<br />

that is, the enormous challenges that arise <strong>in</strong> data analysis whenever the data under<br />

analysis has a high number <strong>of</strong> dimensions. As the number <strong>of</strong> dimensions grows, relations<br />

among data po<strong>in</strong>ts become more complex and <strong>in</strong>terest<strong>in</strong>g patterns become harder to uncover.<br />

Computation also becomes an issue as the number <strong>of</strong> comb<strong>in</strong>ations <strong>in</strong>crease steeply


5.1.1 Motivation 95<br />

with data dimensionality.<br />

The need for subspace cluster<strong>in</strong>g derives essentially from two dist<strong>in</strong>ct but related issues:<br />

(1) how similarity among data items changes as as the number <strong>of</strong> data dimensions grows<br />

and (2) the relevance <strong>of</strong> di erent dimensions <strong>in</strong> di erent clusters.<br />

Several studies have analyzed the strange behavior similarity functions have <strong>in</strong> highdimensional<br />

data [28, 69]. In summary, they are organized around the problem <strong>of</strong> f<strong>in</strong>d<strong>in</strong>g<br />

the nearest and farthest po<strong>in</strong>ts to a given query po<strong>in</strong>t and show that as the number <strong>of</strong> data<br />

dimensions <strong>in</strong>creases the di erence between the two does not <strong>in</strong>crease as fast the distance<br />

to the nearest po<strong>in</strong>t. That is:<br />

dist max ≠ dist m<strong>in</strong><br />

lim<br />

=0, (5.1)<br />

dæŒ dist m<strong>in</strong><br />

mean<strong>in</strong>g that the discrim<strong>in</strong>ation between the nearest and farthest po<strong>in</strong>ts becomes irrelevant.<br />

In turn, this has the e ect that a progressive degradation <strong>of</strong> the quality <strong>of</strong> data<br />

cluster<strong>in</strong>g can be expected because distances between data po<strong>in</strong>ts become progressively<br />

mean<strong>in</strong>gless.<br />

The second problem is related to the fact that clusters are <strong>of</strong>ten present only <strong>in</strong> subsets<br />

<strong>of</strong> dimensions <strong>of</strong> the orig<strong>in</strong>al data space, and this is <strong>of</strong> course more probable when the<br />

number <strong>of</strong> dimensions is high. These clusters might be hard to detect if consider<strong>in</strong>g the<br />

whole data space because they can <strong>in</strong>troduce noise and fool the cluster<strong>in</strong>g algorithm. This<br />

e ect can be expla<strong>in</strong>ed through a simple diagram like the one shown <strong>in</strong> Figure 5.1.<br />

The figure shows the distribution <strong>of</strong> data po<strong>in</strong>ts <strong>in</strong> a 3D space and illustrates the<br />

concept <strong>of</strong> a subspace cluster – given three dimensions x, y, and z, clustersmayexist<br />

<strong>in</strong> di erent subspaces. A standard cluster<strong>in</strong>g algorithm like k-means would have problems<br />

f<strong>in</strong>d<strong>in</strong>g the clusters because they are not clearly separated <strong>in</strong> the 3D space. But,<br />

when consider<strong>in</strong>g 2D projections <strong>of</strong> these data respectively on (x, y), (x, z) and (y, z) the<br />

clusters become apparent. Subspace cluster<strong>in</strong>g techniques aim to f<strong>in</strong>d these clusters that<br />

might otherwise rema<strong>in</strong> hidden if a traditional cluster<strong>in</strong>g algorithm was applied. Subspace<br />

cluster<strong>in</strong>g gives for each cluster (1) the objects belong<strong>in</strong>g to the cluster, and (2) the subset<br />

<strong>of</strong> dimensions that constitute the cluster. Based on the type <strong>of</strong> subspace cluster<strong>in</strong>g<br />

method, there exist two forms <strong>of</strong> output: a partition<strong>in</strong>g <strong>of</strong> the data <strong>in</strong>to separate clusters<br />

and clusters allow<strong>in</strong>g for overlapp<strong>in</strong>g elements. Overlap may also exist between the sets<br />

<strong>of</strong> dimensions constitut<strong>in</strong>g the clusters.<br />

#"<br />

$"<br />

Figure 5.1: <strong>Data</strong> projected <strong>in</strong> several subspaces.<br />

Design<strong>in</strong>g e ective visualizations to help analyze the cluster<strong>in</strong>g result is not trivial. In<br />

addition to the cluster membership <strong>in</strong>formation, the relevant sets <strong>of</strong> dimensions and the<br />

!"


96 Chapter 5. <strong>Visual</strong> Subspace Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

overlaps <strong>of</strong> memberships and dimensions need to be considered. Although a number <strong>of</strong><br />

techniques (e.g., parallel coord<strong>in</strong>ates [55, 78], scatterplot matrices [17], heat maps [47])<br />

exist for visualiz<strong>in</strong>g traditional cluster<strong>in</strong>g results, little research has been carried out for<br />

visualiz<strong>in</strong>g subspace cluster<strong>in</strong>g results. There is a need for e ective systems that allow the<br />

comparison and analysis <strong>of</strong> clusters <strong>in</strong> arbitrary subspace projections, support<strong>in</strong>g overview<br />

and <strong>in</strong>-depth study <strong>of</strong> the subspace cluster<strong>in</strong>g results.<br />

In this section, we present ClustNails, a novel visualization system for m<strong>in</strong><strong>in</strong>g subspace<br />

clusters and analyz<strong>in</strong>g the results. The system takes high-dimensional data as <strong>in</strong>put, and<br />

applies a user-selectable subspace cluster<strong>in</strong>g algorithm from a set <strong>of</strong> algorithms, to group<br />

the objects <strong>in</strong>to clusters. The system displays the subspace cluster<strong>in</strong>g results us<strong>in</strong>g two appropriately<br />

designed visual representations – Spikes and HeatNails. These representations<br />

support the <strong>in</strong>terpretation <strong>of</strong> the result <strong>of</strong> subspace cluster<strong>in</strong>g algorithms by visualiz<strong>in</strong>g<br />

characteristics <strong>of</strong> the cluster<strong>in</strong>g results from di erent perspectives. Appropriate order<strong>in</strong>g<br />

techniques are <strong>in</strong>tegrated with the visualization to help extract<strong>in</strong>g mean<strong>in</strong>gful patterns<br />

from the cluster<strong>in</strong>g results.<br />

The ma<strong>in</strong> contributions <strong>of</strong> this section are:<br />

• an <strong>in</strong>tegrated data analysis and visualization tool for m<strong>in</strong><strong>in</strong>g patterns <strong>in</strong> multidimensional<br />

data us<strong>in</strong>g subspace cluster<strong>in</strong>g algorithms;<br />

• a characterization <strong>of</strong> subspace cluster analysis tasks and the result<strong>in</strong>g design space;<br />

• two novel visualization techniques, Spike and HeatNail, for analyz<strong>in</strong>g subspace cluster<strong>in</strong>g<br />

results;<br />

• appropriate order<strong>in</strong>g techniques for pattern extraction.<br />

5.1.2 Subspace Cluster<strong>in</strong>g Algorithms<br />

Given a set X <strong>of</strong> data po<strong>in</strong>ts <strong>in</strong> some multidimensional space D, a subspace cluster<strong>in</strong>g<br />

algorithm aims to f<strong>in</strong>d a subset X k <strong>of</strong> data po<strong>in</strong>ts together with a subset D k <strong>of</strong> dimensions<br />

such that the po<strong>in</strong>ts <strong>in</strong> X k are closely clustered <strong>in</strong> the subspace <strong>of</strong> dimension D k .<br />

The most critical part <strong>of</strong> subspace cluster<strong>in</strong>g is the subspace generation. Given a<br />

d-dimensional space, there are 2 d possible subsets <strong>of</strong> dimensions. It is computationally<br />

<strong>in</strong>feasible to exam<strong>in</strong>e each possible subset to f<strong>in</strong>d subspaces <strong>of</strong> <strong>in</strong>terest for a predef<strong>in</strong>ed<br />

pattern. S<strong>in</strong>ce this is clearly not a viable way, every algorithm is based on some k<strong>in</strong>d<br />

<strong>of</strong> heuristic that speeds up the search <strong>in</strong> such a huge comb<strong>in</strong>atoric space. A number <strong>of</strong><br />

subspace cluster<strong>in</strong>g algorithms with strategies for narrow<strong>in</strong>g down the search space have<br />

been proposed <strong>in</strong> the past and some <strong>of</strong> them enumerated <strong>in</strong> Section 2.3.1. As suggested<br />

by Parsons et al. [110], the exist<strong>in</strong>g algorithms can be categorized <strong>in</strong>to bottom-up and<br />

top-down strategies.<br />

The bottom-up approaches implement a so called ”downward closure property” (or<br />

monotonicity property), which means if subspace S conta<strong>in</strong>s a cluster, then any subspace<br />

T S must also conta<strong>in</strong> a cluster. The property is used for prun<strong>in</strong>g – if a subspace T<br />

does not have high enough density, then any superspace S, T S, can be excluded from<br />

the search<strong>in</strong>g space. A common implementation <strong>of</strong> a bottom-up approach starts from one<br />

dimensional dense subspaces, iteratively consider<strong>in</strong>g an <strong>in</strong>creas<strong>in</strong>g number <strong>of</strong> dimensions<br />

and comb<strong>in</strong><strong>in</strong>g the dense units that are adjacent until no more new dense units are found.<br />

A typical algorithm will have three major steps:


5.1.2 Subspace Cluster<strong>in</strong>g Algorithms 97<br />

1. generate high dense units (subspaces) us<strong>in</strong>g an a-priori-like approach;<br />

2. assign cluster membership to each object;<br />

3. remove outliers that have distance to the cluster center higher than the critical value.<br />

The top-down approach starts with an <strong>in</strong>itial configuration where data is clustered us<strong>in</strong>g<br />

the full feature space with equally weighted dimensions. Each dimension is assigned a<br />

weight for each cluster to characterize the relevance <strong>of</strong> the dimension to the cluster. Subsequently<br />

the annotated clusters are re-clustered tak<strong>in</strong>g <strong>in</strong>to account the weights assigned<br />

<strong>in</strong> the preced<strong>in</strong>g step. Typically sampl<strong>in</strong>g techniques are used to improve performance as<br />

the approach <strong>in</strong>volves multiple iterations <strong>of</strong> re-cluster<strong>in</strong>g <strong>in</strong> the full set <strong>of</strong> dimensions.<br />

Any <strong>of</strong> these approaches require some k<strong>in</strong>d <strong>of</strong> parametrization. Bottom-up approaches<br />

generally require specifications <strong>of</strong> threshold densities and b<strong>in</strong> size. Top-down approaches<br />

require a specification <strong>of</strong> the desired number <strong>of</strong> clusters (similar to k-means) and the<br />

average number <strong>of</strong> dimensions <strong>in</strong>cluded <strong>in</strong> a subspace.<br />

In this chapter we use Proclus, which is one <strong>of</strong> the most established algorithms and<br />

has demonstrated advantages over a number <strong>of</strong> subspace cluster<strong>in</strong>g techniques [102]. Proclus<br />

[4] takes a top-down approach and extends the traditional k-medoid cluster<strong>in</strong>g algorithm.<br />

The k-medoid algorithm starts with an <strong>in</strong>itial partition and then iteratively assigns<br />

objects to medoids, computes the quality <strong>of</strong> cluster<strong>in</strong>g, and improves the partition and<br />

medoid. Proclus extends k-medoid by associat<strong>in</strong>g medoids with subspaces and improves<br />

both partitions and subspaces iteratively.<br />

Tak<strong>in</strong>g two <strong>in</strong>put parameters, number <strong>of</strong> clusters k and the average number <strong>of</strong> dimensions<br />

l, the algorithm proceeds <strong>in</strong> 3 phases. (1) In the <strong>in</strong>itialization phase the set <strong>of</strong><br />

k medoid candidates is selected, by pick<strong>in</strong>g a representative sample from the entire data<br />

and choos<strong>in</strong>g the medoids from the representatives by us<strong>in</strong>g a greedy method. (2) In the<br />

iterative phase the medoids are improved and a subspace for each medoid is computed.<br />

This is done by go<strong>in</strong>g through the follow<strong>in</strong>g steps. First a random set <strong>of</strong> k medoids is<br />

selected from the representatives and the optimal set <strong>of</strong> dimensions is determ<strong>in</strong>ed for each<br />

medoid. Then all the objects are assigned to the nearest medoid. If the current cluster<strong>in</strong>g<br />

is better than the previous, than it is kept. These steps are repeated until the cluster<strong>in</strong>g<br />

does not change anymore when determ<strong>in</strong><strong>in</strong>g the bad medoids and replac<strong>in</strong>g them with random<br />

representatives. (3) In the last phase, the cluster ref<strong>in</strong>ement phase, once the best<br />

medoids are found, the cluster<strong>in</strong>g is improved by determ<strong>in</strong><strong>in</strong>g optimal dimension sets for<br />

the medoids and reassign<strong>in</strong>g the objects to clusters. Algorithm 1 presents the pseudocode<br />

from [4] describ<strong>in</strong>g the algorithmic steps <strong>in</strong> more detail.<br />

A number <strong>of</strong> reviews and surveys exist to compare and classify the subspace cluster<strong>in</strong>g<br />

approaches. The survey mentioned above by Parsons et al. [110] organizes the techniques<br />

<strong>in</strong> a hierarchy <strong>of</strong> algorithmic strategies and provide a small experiment on representative<br />

algorithms <strong>of</strong> each class. Kriegel et al. present a more thorough systematization<br />

and updated survey [90], where the broader problem <strong>of</strong> cluster<strong>in</strong>g high-dimensional data<br />

is discussed. The recent work <strong>of</strong> Müller et al. [102] presents a systematic and unique<br />

evaluation <strong>of</strong> subspace cluster<strong>in</strong>g algorithms <strong>in</strong> terms <strong>of</strong> quality <strong>of</strong> generated output and<br />

performance. Accord<strong>in</strong>g to [102], Proclus is one <strong>of</strong> the best partition<strong>in</strong>g algorithms and<br />

has a good runtime compared to other techniques. We rely on this and use Proclus <strong>in</strong> our<br />

experiments.


98 Chapter 5. <strong>Visual</strong> Subspace Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

Algorithm 1 PROCLUS(No. <strong>of</strong> Clusters: k, Avg. Dimensions: l)<br />

{C i is the ith cluster}<br />

{D i is the set <strong>of</strong> dimensions associated with cluster C i }<br />

{M current is the set <strong>of</strong> medoids <strong>in</strong> current iteration}<br />

{M best is the best set <strong>of</strong> medoids found so far }<br />

{N i is the f<strong>in</strong>al set <strong>of</strong> medoids with associated dimensions}<br />

{A, B are constant <strong>in</strong>tegers}<br />

/*1. Initialization Phase: select set <strong>of</strong> k medoid candidates */<br />

S = random sample <strong>of</strong> size A · k<br />

M = GREEDY(S, B · k)<br />

/*2. Iterative Phase: improve medoids and compute subspace for each medoid */<br />

BestObjective = Œ<br />

M current = Random set <strong>of</strong> medoids {m 1 ,m 2 ,...,m k }µM<br />

repeat<br />

/* Approximate the optimal set <strong>of</strong> dimensions */<br />

for each medoid m i œ M current do<br />

Let ” i be the distance to nearest medoid from m i<br />

L i = Po<strong>in</strong>ts <strong>in</strong> sphere centered at m i width radius ” i<br />

end for<br />

L = {L 1 ,...,L k }<br />

(D 1 , D 2 ,...,D k ) = F<strong>in</strong>dDimensions(k, l, L)<br />

{Form the clusters}<br />

(C 1 ,...,C k ) = AssignPo<strong>in</strong>ts(D 1 ,...,D k )<br />

ObjectiveFunction = EvaluateClusters(C 1 ,...,C k , D 1 ,...,D k )<br />

if ObjectiveFunction < BestObjective then<br />

BestObjective = ObjectiveFunction<br />

M best = M current<br />

Compute the bad medoids <strong>in</strong> M best<br />

end if<br />

Compute M current by replac<strong>in</strong>g the bad medoids <strong>in</strong><br />

M best with random po<strong>in</strong>ts from M<br />

until (term<strong>in</strong>ation criterion)<br />

/*3. Cluster Ref<strong>in</strong>ement Phase: improve quality <strong>of</strong> the partitions and subspaces */<br />

L = {C 1 ,...,C k }<br />

(D 1 , D 2 ,...,D k ) = F<strong>in</strong>dDimensions(k, l, L)<br />

(C 1 ,...,C k ) = AssignPo<strong>in</strong>ts(D 1 ,...,D k )<br />

N =(M best , D 1 , D 2 ,...,D k )<br />

return N


5.1.3 Task Def<strong>in</strong>ition and Design Space for <strong>Visual</strong> Subspace Cluster Analysis 99<br />

5.1.3 Task Def<strong>in</strong>ition and Design Space for <strong>Visual</strong> Subspace Cluster Analysis<br />

Subspace cluster visualization rema<strong>in</strong>s a challeng<strong>in</strong>g task due to the multiple types <strong>of</strong><br />

<strong>in</strong>formation conta<strong>in</strong>ed <strong>in</strong> subspace cluster<strong>in</strong>g results such as subspaces, cluster membership<br />

<strong>of</strong> objects, and overlap between subspaces and clusters. Exist<strong>in</strong>g subspace visualization<br />

techniques have been detailed <strong>in</strong> Section 2.4.2. To develop e ective visualization systems<br />

for subspace cluster analysis, it is necessary to take <strong>in</strong>to consideration the di erent tasks<br />

that are <strong>in</strong>volved <strong>in</strong> the data analysis and use it as a base for explor<strong>in</strong>g the design space.<br />

We describe next ma<strong>in</strong> tasks that an appropriate subspace cluster visualization technique<br />

needs to address and, therefore, provide a generic and reusable characterization. We also<br />

analyze the design space and provide: (1) a classification, and (2) a reasoned analysis <strong>of</strong><br />

common design alternatives, from which a basel<strong>in</strong>e design space is derived. This analysis<br />

serves as a basel<strong>in</strong>e not only for the design <strong>of</strong> our proposed subspace cluster visualization<br />

system, but allows to compare with exist<strong>in</strong>g approaches and identify empty areas <strong>in</strong> this<br />

design space for future work.<br />

Scope <strong>of</strong> Subspace Cluster Analysis<br />

Cluster<strong>in</strong>g abstracts a larger data set to a smaller number <strong>of</strong> groups that are presumably<br />

more amenable to analysis and <strong>in</strong>terpretation. Standard cluster<strong>in</strong>g algorithms rely on a<br />

fixed set <strong>of</strong> dimensions used <strong>in</strong> the similarity function <strong>of</strong> the cluster<strong>in</strong>g algorithm. Typically,<br />

the selection <strong>of</strong> dimensions is done outside <strong>of</strong> the cluster<strong>in</strong>g algorithm. Subspace<br />

cluster<strong>in</strong>g methods, on the other hand, provide an extended output, <strong>in</strong>clud<strong>in</strong>g also the set<br />

<strong>of</strong> dimensions relevant to f<strong>in</strong>d<strong>in</strong>g the groups, possibly described with weights <strong>in</strong>dicat<strong>in</strong>g<br />

the importance <strong>of</strong> each dimension for the found result. Depend<strong>in</strong>g on the subspace method<br />

used, there can be an overlap between dimensions and records between the clusters. In<br />

pr<strong>in</strong>ciple, analysis <strong>of</strong> the subspace cluster<strong>in</strong>g can be done without consider<strong>in</strong>g the identified<br />

dimensions. In our work, we are <strong>in</strong>terested <strong>in</strong> jo<strong>in</strong>tly analyz<strong>in</strong>g the cluster<strong>in</strong>g results<br />

and the sets <strong>of</strong> selected dimensions, to provide enhanced analysis capabilities.<br />

Tasks<br />

The analysis <strong>of</strong> properties and relationships with<strong>in</strong> and among clusters are important tasks<br />

<strong>in</strong> cluster analysis. We break these general analysis tasks down to a series <strong>of</strong> subtasks:<br />

T1 Reveal properties <strong>of</strong> <strong>in</strong>dividual clusters<br />

When analyz<strong>in</strong>g cluster<strong>in</strong>g output it is necessary to understand the ma<strong>in</strong> features <strong>of</strong><br />

each generated cluster. In particular, once the cluster<strong>in</strong>g output has been generated<br />

and a visualization is constructed to represent it, it is necessary to perceive the<br />

follow<strong>in</strong>g <strong>in</strong>formation:<br />

T1.1 How many records does the cluster conta<strong>in</strong>?<br />

T1.2 How many dimensions are <strong>in</strong>volved and what are their weights?<br />

T1.3 How are the data values distributed <strong>in</strong> each <strong>of</strong> the conta<strong>in</strong>ed dimensions? (homogeneity<br />

<strong>of</strong> cluster members, central and outlier elements, subgroup<strong>in</strong>g <strong>of</strong><br />

clusters)


100 Chapter 5. <strong>Visual</strong> Subspace Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

T2 Enable cluster comparison<br />

Once the output has been considered and each cluster has been characterized visually,<br />

it is important to display the <strong>in</strong>formation <strong>in</strong> a way that mean<strong>in</strong>gful comparisons can<br />

be made among the clusters. It is important to understand how similar (or distant)<br />

clusters are, which translates <strong>in</strong>to:<br />

T2.1 How do clusters di er with respect to conta<strong>in</strong>ed records and <strong>in</strong>volved dimensions?<br />

T2.2 Is there overlap between records and dimensions or are they dist<strong>in</strong>ct?<br />

T3 Indicate the quality <strong>of</strong> the generated cluster output<br />

Subspace cluster<strong>in</strong>g algorithms, as many methods that work on multidimensional<br />

spaces, are heavily based on heuristics and are dependent on parameterization. For<br />

this reason, cluster<strong>in</strong>g outputs are not always optimal. Even if research <strong>in</strong> subspace<br />

cluster<strong>in</strong>g has largely improved the cluster<strong>in</strong>g quality, it is still important to be able<br />

to judge the output quality by consider<strong>in</strong>g the follow<strong>in</strong>g:<br />

T3.1 How good is the cluster<strong>in</strong>g quality produced by a given algorithm?<br />

T3.2 How sensitive is the output with respect to parameter variations?<br />

We take these task considerations as a basel<strong>in</strong>e for develop<strong>in</strong>g the ClustNails system<br />

presented <strong>in</strong> the next Section. While we have not formally evaluated the degree to which<br />

ClustNails fulfills each <strong>of</strong> these criteria, we f<strong>in</strong>d that they are at the core <strong>of</strong> the functionality<br />

that ClustNails o ers.<br />

Design Space<br />

In terms <strong>of</strong> the previously described tasks, the <strong>in</strong>formation entities <strong>of</strong> <strong>in</strong>terest to be visualized<br />

are: Elements (data records, clusters, dimensions), Relationships (membership <strong>of</strong><br />

records <strong>in</strong> clusters, clusters overlap with respect to records and dimensions), Attributes<br />

(cluster size, dimension distribution, dimension weight, etc.)<br />

We identify two ma<strong>in</strong> categories <strong>of</strong> visualization solutions for the representation <strong>of</strong> the<br />

subspace cluster<strong>in</strong>g output: Cluster-Centric (CC) and <strong>Data</strong>-Centric (DC). Cluster-centric<br />

solutions put their focus on the representation <strong>of</strong> the clusters first, with the <strong>in</strong>tent to allow<br />

their comparison. <strong>Data</strong>-centric solutions put their focus on the representation <strong>of</strong> the data<br />

values with the <strong>in</strong>tent to ease the <strong>in</strong>terpretation <strong>of</strong> each cluster <strong>in</strong> terms <strong>of</strong> their <strong>in</strong>ternal<br />

distributions.<br />

There is a natural tension between these two extremes. Cluster-centric solutions scale<br />

much better <strong>in</strong> terms <strong>of</strong> number <strong>of</strong> data items and dimensions. Their higher level <strong>of</strong><br />

abstraction allows an easier comparison between the cluster features, however, at the<br />

expense <strong>of</strong> limit<strong>in</strong>g their <strong>in</strong>terpretation. On the contrary, data-centric views ease cluster<br />

<strong>in</strong>terpretation but do not scale very well with respect to data size and dimensionality.<br />

In our analysis <strong>of</strong> the design space, we explored several alternative visual designs and<br />

isolated some basic ones for both approaches. To discuss them briefly helps to better<br />

motivate our proposed f<strong>in</strong>al solution.<br />

Record-Centric Designs<br />

In record-centric designs each visual item represents a record. A 2D scatterplot projection<br />

is <strong>of</strong>ten used as a way to identify clusters <strong>of</strong> data elements <strong>in</strong> traditional cluster<strong>in</strong>g,


5.1.4 The ClustNails System 101<br />

however, it is not clear how to extend this design <strong>in</strong> a way that <strong>in</strong>formation about cluster<br />

dimensions is <strong>in</strong>cluded. Parallel coord<strong>in</strong>ates plots (PCP) could <strong>in</strong> pr<strong>in</strong>ciple be extended<br />

to represent subspace clusters by draw<strong>in</strong>g l<strong>in</strong>es between adjacent axes only when these<br />

belong to the cluster be<strong>in</strong>g drawn. But this generates complicated order<strong>in</strong>g problems with<br />

potential extreme cases where the polyl<strong>in</strong>e <strong>of</strong> a whole record might not be drawn at all<br />

because its axes are never adjacent. Also, PCP do not scale well to data <strong>of</strong> even moderate<br />

dimensionality, which <strong>in</strong> turn is the ma<strong>in</strong> focus <strong>of</strong> subspace cluster<strong>in</strong>g. Heat maps (or matrix/tabular<br />

representations) can be extended more easily by us<strong>in</strong>g di erent color scales for<br />

<strong>in</strong>cluded and not <strong>in</strong>cluded dimensions. In addition, their design allows for easy reorder<strong>in</strong>g<br />

<strong>of</strong> records and dimensions so that the structure <strong>of</strong> the clusters can be more easily perceived.<br />

Cluster-Centric Designs<br />

In cluster-centric designs each visual item represents a cluster. A 2D scatterplot projection<br />

is possible, as the one presented <strong>in</strong> VISA [14] (see also Figure 5.7). The clusters are<br />

projected with MDS, or similar techniques, tak<strong>in</strong>g <strong>in</strong>to account their similarity accord<strong>in</strong>g<br />

to some predef<strong>in</strong>ed criteria (e.g., shared number <strong>of</strong> dimensions). This solution permits to<br />

group clusters accord<strong>in</strong>g to their similarity but their visibility and understand<strong>in</strong>g is <strong>of</strong>ten<br />

h<strong>in</strong>dered by the amount <strong>of</strong> overlap the items have. A matrix compar<strong>in</strong>g one cluster to<br />

another <strong>in</strong> terms <strong>of</strong> their shared dimensions and records is also possible but its e ectiveness<br />

depends on how well row and columns are ordered, plus it is not necessarily the most<br />

compact design. F<strong>in</strong>ally, icons or glyphs can be used to provide a rich representation<br />

<strong>of</strong> each cluster <strong>in</strong> a way that every s<strong>in</strong>gle icon can provide <strong>in</strong>formation about cluster<br />

dimensions, records and weights <strong>in</strong> a <strong>in</strong>tegrated fashion.<br />

In ClustNails we <strong>in</strong>tegrate the best <strong>of</strong> the two approaches <strong>in</strong> a multiple views user<br />

<strong>in</strong>terface (see Figure 5.5). A cluster-centric view based on sorted icons provides support for<br />

cluster understand<strong>in</strong>g and comparison (T1.1, T1.2, and T2.2). A data-centric view based<br />

on sorted and compressed heat maps provides support <strong>in</strong> <strong>in</strong>terpret<strong>in</strong>g and compar<strong>in</strong>g the<br />

clusters <strong>in</strong> terms <strong>of</strong> their data distribution (T1.3, T2.1, T2.2). All <strong>of</strong> them <strong>in</strong> turn help<br />

<strong>in</strong>terpret<strong>in</strong>g the quality <strong>of</strong> the generated output (T3.1 and T3.2). In the follow<strong>in</strong>g section,<br />

we describe the whole system and its views <strong>in</strong> detail.<br />

5.1.4 The ClustNails System<br />

ClustNails is designed as an <strong>in</strong>teractive visualization tool for subspace cluster<strong>in</strong>g analysis.<br />

It <strong>in</strong>tegrates a number <strong>of</strong> subspace cluster<strong>in</strong>g algorithms with novel visual representations<br />

and order<strong>in</strong>g techniques to help analysts generate subspace clusters from multidimensional<br />

data and identify <strong>in</strong>terest<strong>in</strong>g patterns from the visualization models. We next provide<br />

an overview <strong>of</strong> the design and ma<strong>in</strong> functionalities <strong>of</strong> the system, as well as a detailed<br />

description <strong>of</strong> the visualization and order<strong>in</strong>g techniques applied.<br />

Overview<br />

ClustNails <strong>in</strong>tegrates the OpenSubspace library <strong>of</strong> Weka [106] that conta<strong>in</strong>s a range <strong>of</strong><br />

subspace cluster<strong>in</strong>g algorithms <strong>in</strong>clud<strong>in</strong>g Clique, Doc, Fires, Proclus, M<strong>in</strong>eClus, INSCY,<br />

P3c, Schism, Statpc, and Subclu. The system takes multidimensional data as <strong>in</strong>put,<br />

clusters the objects us<strong>in</strong>g a user-selected subspace cluster<strong>in</strong>g algorithm, and displays the


...<br />

...<br />

102 Chapter 5. <strong>Visual</strong> Subspace Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

cluster<strong>in</strong>g result <strong>in</strong> a multi-view user <strong>in</strong>terface. A number <strong>of</strong> order<strong>in</strong>g functions allow the<br />

analyst to exam<strong>in</strong>e the results and compare clusters from di erent perspectives. Various<br />

user <strong>in</strong>teractions are added to allow the user to select cluster<strong>in</strong>g algorithms, parameters,<br />

and the order <strong>of</strong> the cluster<strong>in</strong>g results <strong>in</strong> the visualization panels. A l<strong>in</strong>k<strong>in</strong>g-and-brush<strong>in</strong>g<br />

function is implemented such that dimensions/clusters <strong>of</strong> <strong>in</strong>terest can be highlighted <strong>in</strong><br />

di erent views. By plac<strong>in</strong>g the mouse cursor over an item (record, dimension, or cluster)<br />

<strong>in</strong> the visualization panel, the analyst can see detailed <strong>in</strong>formation <strong>of</strong> the item <strong>in</strong> a tooltip.<br />

high-dimensional data<br />

D0 D1 D2 D3 D4 D5 D6 D7 D8 D9<br />

subspace cluster<br />

123 59 81<br />

12 92 93<br />

subspace cluster view<br />

x1<br />

123<br />

43<br />

37<br />

68<br />

66<br />

59 166 81 112 112<br />

. . .<br />

. . .<br />

. . .<br />

x2 102 98 145 99<br />

87<br />

92<br />

134<br />

93<br />

23<br />

23<br />

44<br />

42<br />

93<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

x20 84 33 178 44 24 52 127 42 93 93<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

x40 51 57 37 12 87 57 111 96 23 39<br />

Subspace<br />

Cluster<strong>in</strong>g<br />

Algorithm<br />

. . .<br />

. . .<br />

. . .<br />

51 87 23<br />

. . .<br />

. . .<br />

. . .<br />

Subspace<br />

Cluster<br />

<strong>Visual</strong>ization<br />

cluster and<br />

dimension<br />

order<strong>in</strong>g<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

. . .<br />

Xm 42 103 38 74 61 82 73 121 49<br />

49<br />

61 82 121<br />

DATA SPACE<br />

VISUAL SPACE<br />

Figure 5.2: Workflow <strong>of</strong> subspace cluster analysis us<strong>in</strong>g the ClustNails system.<br />

Figure 5.2 illustrates the workflow supported by our tool. Figure 5.2 (left) shows<br />

that the system loads a d-dimensional data set as <strong>in</strong>put and a user-selected cluster<strong>in</strong>g<br />

algorithm computes the subspace clusters, provided as a list <strong>of</strong> clusters, each conta<strong>in</strong><strong>in</strong>g a<br />

subset <strong>of</strong> records and a subset <strong>of</strong> dimensions. Figure 5.2 (middle) shows that each cluster<br />

is quantified <strong>in</strong> terms <strong>of</strong> the number <strong>of</strong> <strong>in</strong>stances and associated number <strong>of</strong> dimensions;<br />

this <strong>in</strong>formation, together with the records for each subspace cluster is visualized <strong>in</strong> a<br />

multiple view visualization panel, which <strong>in</strong>cludes a Spikes view for cluster-centric analysis<br />

(top), and a HeatNails view for record-centric analysis (bottom). Figure 5.2 (right) shows<br />

that the order <strong>of</strong> clusters, dimensions and records can be rearranged <strong>in</strong> each view for<br />

easy comparison between clusters. Next, we describe the di erent views and supported<br />

order<strong>in</strong>g strategies.<br />

<strong>Visual</strong>ization Components<br />

<strong>Visual</strong>ization <strong>of</strong> Clusters: the Spikes View<br />

The Spikes view is a cluster-oriented view and provides a matrix <strong>of</strong> thumbnails, each<br />

represent<strong>in</strong>g a subspace cluster. Each cluster is visualized <strong>in</strong> a circular area that conta<strong>in</strong>s<br />

radial spikes. The spikes represent the <strong>in</strong>dividual dimensions (the subspace) that def<strong>in</strong>e<br />

the given cluster, and the spike length is scaled accord<strong>in</strong>g to the weight (importance) <strong>of</strong> a<br />

dimension for the cluster (see below for the def<strong>in</strong>ition). The radial dimension sequence is<br />

identical for each spike-glyph. The number <strong>of</strong> records <strong>in</strong> the cluster is represented by the<br />

area size <strong>of</strong> the <strong>in</strong>ner circle.<br />

Subspace cluster<strong>in</strong>g algorithms provide as output a subset <strong>of</strong> dimensions D k for each<br />

cluster SC k , as well as the set <strong>of</strong> <strong>in</strong>stances (records) <strong>of</strong> this cluster X k . Given a dimension<br />

m with<strong>in</strong> the set <strong>of</strong> dimensions D k <strong>in</strong> a subspace cluster SC k , we def<strong>in</strong>e the weight <strong>of</strong> that<br />

dimension <strong>in</strong> that cluster as:<br />

q<br />

wk m x<br />

=<br />

m i œXm |xm k i ≠ c m k |<br />

, (5.2)<br />

|X k |


5.1.4 The ClustNails System 103<br />

where c m k is the center <strong>of</strong> the po<strong>in</strong>ts <strong>in</strong> X k along the dimension m, x m i the value <strong>in</strong> dimension<br />

m <strong>of</strong> the po<strong>in</strong>t x i <strong>of</strong> this cluster and |X k | the number <strong>of</strong> elements <strong>in</strong> SC k . The smaller<br />

wk<br />

m is, the more compact are the po<strong>in</strong>ts around the center <strong>in</strong> dimension m. This implies<br />

that dimensions with smaller weights have better clustered po<strong>in</strong>ts and are def<strong>in</strong>ed as more<br />

important for a cluster. We normalize the weights wk<br />

m for all dimensions <strong>of</strong> all clusters to<br />

the <strong>in</strong>terval [0, 1] and map the correspond<strong>in</strong>g values <strong>in</strong>versely to the length <strong>of</strong> the spike.<br />

The lower wk<br />

m (the more important the dimension), the longer the correspond<strong>in</strong>g spike.<br />

Note that ow<strong>in</strong>g to our def<strong>in</strong>ition <strong>of</strong> wk m , the relationship between weights and importance<br />

is <strong>in</strong>verse, and we reflect this by an <strong>in</strong>verse mapp<strong>in</strong>g between weights and size <strong>of</strong> the visual<br />

attribute (the spikes). Also note that <strong>in</strong> case the given subspace cluster algorithm natively<br />

outputs weights for each dimension, those weights can also be mapped to the spike length<br />

<strong>in</strong>stead.<br />

Figure 5.3: Two subspace clusters visualized as spikes. The clusters share common dimensions<br />

but the importance <strong>of</strong> the dimensions for the clusters are di erent. Dim29 and dim32 <strong>in</strong> the left<br />

cluster show smaller pikes than <strong>in</strong> the right cluster, as they are considered less important for the<br />

def<strong>in</strong>ition <strong>of</strong> that cluster accord<strong>in</strong>g to our measure wk m . Furthermore, the left cluster has fewer<br />

dimensions and more objects than the right cluster.<br />

The visual representation for each subspace cluster is a circle <strong>in</strong> the Spikes view. Each<br />

spike <strong>in</strong> a circle represents a dimension conta<strong>in</strong>ed <strong>in</strong> that subspace. The length <strong>of</strong> the<br />

spike represents the weight <strong>of</strong> the dimension for that particular cluster (the longer, the<br />

more important). The order <strong>of</strong> the dimensions is identical for each cluster. The area <strong>of</strong><br />

the <strong>in</strong>ner circles <strong>in</strong>dicates the number <strong>of</strong> records with<strong>in</strong> each cluster. Figure 5.3 illustrates<br />

the Spikes view.<br />

The result<strong>in</strong>g Spikes view allows users to quickly recognize overlapp<strong>in</strong>g dimensions<br />

by compar<strong>in</strong>g the spike patterns <strong>of</strong> the di erent clusters. To support this comparison, a<br />

background is divided <strong>in</strong>to pies and colored alternatively with two colors (gray and light<br />

red). This supports the comparison <strong>of</strong> the spike angles <strong>in</strong> two di erent clusters.<br />

<strong>Visual</strong>ization <strong>of</strong> Records: the HeatNails View<br />

The HeatNails view is an extended heat map display<strong>in</strong>g the data values and dimensions.<br />

Rows represent dimensions, and columns represent data items (records). Each HeatNail<br />

cell represents a data value <strong>of</strong> a record <strong>in</strong> one dimension. <strong>Data</strong> items are grouped by<br />

clusters. These clusters are aligned next to each other and separated by black l<strong>in</strong>es. <strong>Data</strong><br />

values are normalized globally and mapped to an appropriate color scale. A yellow-togreen<br />

color scale is used for dimensions that are members <strong>of</strong> the given cluster, while a<br />

gray scale is used for the rema<strong>in</strong><strong>in</strong>g dimensions <strong>of</strong> the data set per cluster (see Figure 5.4


104 Chapter 5. <strong>Visual</strong> Subspace Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

(bottom)). This allows for an e ective visual perception <strong>of</strong> the distribution <strong>of</strong> values<br />

across dimensions, and the relation between dimensions and clusters with respect to their<br />

<strong>in</strong>clusion <strong>in</strong> the cluster def<strong>in</strong>ition.<br />

Figure 5.4: HeatNails visualization. Bottom: show<strong>in</strong>g the distribution <strong>of</strong> dimension values for all<br />

dimensions (rows) and records (columns). Top: show<strong>in</strong>g histograms for the values <strong>of</strong> all dimensions<br />

per cluster for comparison purposes.<br />

We also give a summary representation <strong>of</strong> the values <strong>of</strong> the dimensions occurr<strong>in</strong>g<br />

<strong>in</strong> the clusters. The distribution <strong>of</strong> dimension values <strong>of</strong> each cluster is discretized <strong>in</strong>to a<br />

histogram and visualized by color (for dimensions <strong>in</strong>cluded) and gray scales (for dimensions<br />

not <strong>in</strong>cluded). This allows for easy comparison between clusters with respect to data<br />

values. Figure 5.4 (top) shows these histogram views. F<strong>in</strong>ally, depend<strong>in</strong>g on the cluster<strong>in</strong>g<br />

algorithm, it is possible that records are members <strong>in</strong> multiple clusters. We illustrate this by<br />

mark<strong>in</strong>g the cluster IDs <strong>of</strong> multi-cluster members at the bottom <strong>of</strong> the display. In addition<br />

to the Spikes view, the HeatNails view also allows the quick recognition <strong>of</strong> overlapp<strong>in</strong>g<br />

dimensions across the clusters by means <strong>of</strong> the given color and grey-scale patterns. Both<br />

Spikes and HeatNails views <strong>in</strong>corporate l<strong>in</strong>k<strong>in</strong>g-and brush<strong>in</strong>g functionality. Click<strong>in</strong>g on any<br />

set <strong>of</strong> dimensions/clusters <strong>of</strong> <strong>in</strong>terest <strong>in</strong> one view highlights the same dimensions/clusters<br />

<strong>in</strong> all other views.<br />

Order<strong>in</strong>g Heuristics<br />

Order<strong>in</strong>g is implemented to support perception <strong>of</strong> structural similarities <strong>of</strong> clusters with respect<br />

to dimensions and value distributions. As order<strong>in</strong>g problems for clusters, dimensions,<br />

and records are typically complex NP-complete comb<strong>in</strong>atorial optimization problems [9],<br />

we rely on heuristics to order dimensions, records, clusters, and values <strong>in</strong> the various displays.<br />

Our essential idea is to place similar or closely related objects together to help the<br />

analyst f<strong>in</strong>d <strong>in</strong>terest<strong>in</strong>g patterns.<br />

Dimension Order<strong>in</strong>g<br />

To f<strong>in</strong>d a global order<strong>in</strong>g <strong>of</strong> the dimensions, we compute a frequency value for each dimension,<br />

denot<strong>in</strong>g the number <strong>of</strong> subspace clusters that are us<strong>in</strong>g this dimension. We order<br />

the list <strong>of</strong> dimensions by this frequency value start<strong>in</strong>g the sequence <strong>of</strong> dimensions with the<br />

dimension that is most frequently used by the set <strong>of</strong> subclusters. The next positions are<br />

filled <strong>in</strong> the same way: the dimension that co-occurs most frequently with the previous<br />

positioned dimension is placed next. If a co-occurrence is not found, the most frequent<br />

dimension from the rema<strong>in</strong><strong>in</strong>g dimensions is positioned next <strong>in</strong> the order<strong>in</strong>g vector. The


5.1.4 The ClustNails System 105<br />

dimension order<strong>in</strong>g can be applied to both the Spikes view and HeatNails view.<br />

Subspace Cluster Order<strong>in</strong>g<br />

A useful visual representation <strong>of</strong> subspace cluster<strong>in</strong>g results should arrange similar subspaces<br />

next to each other to reduce visual search time by the user. We propose an order<strong>in</strong>g<br />

strategy that is formalized <strong>in</strong> the follow<strong>in</strong>g. Us<strong>in</strong>g the dimension weights def<strong>in</strong>ed <strong>in</strong> Equation<br />

5.2, we propose a measure for the global <strong>in</strong>terest<strong>in</strong>gness I g SC k<br />

<strong>of</strong> a cluster SC k :<br />

q<br />

I g mœD<br />

SC k<br />

=<br />

k<br />

wk<br />

m , (5.3)<br />

|D k |<br />

where wm k is the weight <strong>of</strong> dimension m œ D k <strong>of</strong> SC k , and |D k | is the number <strong>of</strong> dimensions<br />

<strong>in</strong> this subcluster. We def<strong>in</strong>e the global <strong>in</strong>terest<strong>in</strong>gness <strong>of</strong> a cluster k as the average <strong>of</strong> the<br />

weights <strong>of</strong> the dimensions conta<strong>in</strong>ed <strong>in</strong> this subcluster. This measure is used to determ<strong>in</strong>e<br />

the first cluster <strong>in</strong> the order<strong>in</strong>g. We then use the subspace cluster distance (eq. 5.4)<br />

employed <strong>in</strong> [14] to f<strong>in</strong>d the most similar cluster, which is placed next to the <strong>in</strong>itial cluster.<br />

This distance function is a convex sum <strong>of</strong> subspace distance and object distance:<br />

—<br />

A<br />

1 ≠ |D i fl D j |<br />

|D i fi D j |<br />

B<br />

+(1≠ —)<br />

A<br />

1 ≠<br />

|X i fl X j |<br />

m<strong>in</strong>{|X i |, |X j |}<br />

B<br />

[14] (5.4)<br />

where |D i fl D j | is the number <strong>of</strong> common dimensions <strong>of</strong> the two subspaces i and j, and<br />

|X i fl X j | the number <strong>of</strong> shared objects <strong>of</strong> the two subspaces. We cont<strong>in</strong>ue this placement<br />

until all clusters are placed.<br />

Record Order<strong>in</strong>g<br />

Two di erent types <strong>of</strong> record order<strong>in</strong>g strategies are implemented <strong>in</strong> HeatNails. One strategy<br />

is to order the records from m<strong>in</strong> to max with respect to their values <strong>in</strong> the dimension<br />

that has the biggest variance, among all dimensions. A second strategy is to order the<br />

records accord<strong>in</strong>g to the Euclidian distance across the conta<strong>in</strong>ed dimensions <strong>of</strong> the given<br />

subspace, based on a selected start<strong>in</strong>g record. The start<strong>in</strong>g record, <strong>in</strong> turn, may either be<br />

user selected, or selected automatically as the record that shows the largest variance over<br />

all dimensions.<br />

Value Order<strong>in</strong>g<br />

A value order<strong>in</strong>g facility is implemented <strong>in</strong> the HeatMap view and visible <strong>in</strong> the top<br />

summary row <strong>of</strong> the HeatNails view. In each row the distribution <strong>of</strong> values <strong>in</strong> a given<br />

dimension is shown. To that end, we sort the values from m<strong>in</strong> to max, and b<strong>in</strong> them <strong>in</strong>to<br />

a user-selectable number <strong>of</strong> b<strong>in</strong>s. In this view the distribution <strong>of</strong> values per dimension<br />

and cluster is <strong>in</strong>dicated <strong>in</strong> the form <strong>of</strong> a color-coded histogram. The histograms help <strong>in</strong><br />

understand<strong>in</strong>g the distribution <strong>of</strong> data values with<strong>in</strong> each dimension, and may support<br />

f<strong>in</strong>d<strong>in</strong>g out why a particular dimension was selected or not by the cluster<strong>in</strong>g algorithm.<br />

Summary and Discussion <strong>of</strong> the ClustNails System Design<br />

ClustNails is an <strong>in</strong>tegrated system for visual subspace cluster analysis. Its design features<br />

(1) a number <strong>of</strong> subspace cluster<strong>in</strong>g algorithms from which the user can chose and (2) a<br />

design <strong>of</strong> di erent visual representations for the most important aspects <strong>of</strong> the output <strong>of</strong><br />

automatic subspace cluster analysis.


106 Chapter 5. <strong>Visual</strong> Subspace Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

Regard<strong>in</strong>g (1), we provide access to a number <strong>of</strong> state <strong>of</strong> the art algorithms as conta<strong>in</strong>ed<br />

<strong>in</strong> the OpenSubspace library [106]. The list <strong>of</strong> <strong>in</strong>tegrated algorithms is extensive.<br />

Regard<strong>in</strong>g (2), we composed a visual display <strong>of</strong> three aspects. The Spikes view is<br />

<strong>in</strong>spired by star glyphs and dist<strong>in</strong>guishes clusters from each other, <strong>in</strong> terms <strong>of</strong> <strong>in</strong>cluded<br />

dimensions. The radial basis shape <strong>in</strong> the Spikes view is visually dom<strong>in</strong>ant and allows fast<br />

perception <strong>of</strong> cluster properties. Sort<strong>in</strong>g <strong>of</strong> the cluster glyphs by similarity o oads users<br />

(at least partially) from sequential visual search. The Spikes view is complemented by the<br />

HeatNails view, which is a dimension-oriented detail view that we provide <strong>in</strong> a coord<strong>in</strong>ated<br />

view, below the cluster glyphs. The HeatNails view is based on the ideas <strong>of</strong> heat maps and<br />

the pixel-paradigm for show<strong>in</strong>g the maximum possible <strong>in</strong>formation, allocat<strong>in</strong>g eventually<br />

only one pixel per record dimension (bottom view) or histogram b<strong>in</strong> per cluster dimension<br />

(top view). The overall layout <strong>of</strong> the three views follows an overview-first approach, from<br />

the most aggregate view at the top (the Spikes view <strong>of</strong> clusters) to the most detailed view<br />

(the HeatNails record view) on bottom. The histogram view show<strong>in</strong>g the distribution <strong>of</strong><br />

dimensions per cluster is located <strong>in</strong> the middle.<br />

We designed this <strong>in</strong>tegrated layout hav<strong>in</strong>g the di erent subspace cluster<strong>in</strong>g output<br />

parameters <strong>in</strong> m<strong>in</strong>d, and arranged them accord<strong>in</strong>g to the level <strong>of</strong> detail provided. While<br />

we believe our system design is justified from these considerations, we recognize that<br />

other multidimensional visualization techniques do exist, which could be alternative views<br />

<strong>in</strong> our visualization layout. Parallel coord<strong>in</strong>ates <strong>in</strong> conjunction with color-cod<strong>in</strong>g could be<br />

an option. A dedicated user study, as part <strong>of</strong> future work, could explore design alternatives<br />

and compare them with each other.<br />

5.1.5 Use Case and System Comparison<br />

We apply the ClustNails system to a real world data set, demonstrat<strong>in</strong>g its applicability<br />

and illustrat<strong>in</strong>g di erent types <strong>of</strong> analysis one can perform with it. Then we compare it<br />

with the state <strong>of</strong> the art system VISA [14] to validate the e ectiveness <strong>of</strong> the system and<br />

its design.<br />

Use Case: USDA Food Composition <strong>Data</strong> Set<br />

We analyzed the USDA Food Composition data set 1 that conta<strong>in</strong>s a full collection <strong>of</strong><br />

raw and processed foods characterized by their composition <strong>in</strong> terms <strong>of</strong> nutrients. The<br />

data comprises more than 7000 records and 44 dimensions. We selected Proclus for the<br />

cluster<strong>in</strong>g task. As parameters we set the number <strong>of</strong> clusters to 15, and the average number<br />

<strong>of</strong> dimensions to 8. Figure 5.5 shows the result generated by the system with this sett<strong>in</strong>gs.<br />

From Figure 5.5 we can see that cluster C11, C12, C13, and C14 (highlighted red)<br />

all share the same two dimensions water and calories, although the sizes <strong>of</strong> the clusters<br />

vary from 4 to 24 records. All the records share some common features - high water<br />

conta<strong>in</strong>ment and low calories. To ga<strong>in</strong> more understand<strong>in</strong>g <strong>of</strong> the cluster<strong>in</strong>g result, one<br />

can drill down to each record by check<strong>in</strong>g the data table or detail-on-demand <strong>in</strong>formation<br />

displayed <strong>in</strong> tooltips upon mouse-over actions. It is not di cult to f<strong>in</strong>d out that these<br />

groups mostly consist <strong>of</strong> foods that are commonly regarded as “healthy”. Foods <strong>of</strong> similar<br />

nature, e.g., lima and mango beans, various types <strong>of</strong> low-fat dairy products, and soups are<br />

1 http://www.ars.usda.gov/


5.1.5 Use Case and System Comparison 107<br />

C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14<br />

Figure 5.5: <strong>Visual</strong>ization <strong>of</strong> the subspace clusters <strong>of</strong> the USDA Food Composition data set generated<br />

by Proclus.<br />

placed <strong>in</strong> the same groups, which means the cluster<strong>in</strong>g makes good sense.<br />

Us<strong>in</strong>g the value order<strong>in</strong>g function <strong>in</strong> the HeatNails, we can further explore the distribution<br />

<strong>of</strong> data values <strong>in</strong>side each cluster and look for <strong>in</strong>terest<strong>in</strong>g patterns (see Figure 5.6).<br />

We note that most <strong>of</strong> the data values <strong>in</strong> the dimensions not selected by Proclus have relatively<br />

large variance. This is not surpris<strong>in</strong>g as subspace cluster<strong>in</strong>g algorithms are typically<br />

designed to reduce the sparsity <strong>of</strong> data by discard<strong>in</strong>g dimensions that have big variances.<br />

C0 C1 C2 C3 C4C5 C6 C7 C8 C9 C10 C11 C12 C13 C14<br />

Figure 5.6: Sorted view (Value order<strong>in</strong>g function applied).<br />

Tak<strong>in</strong>g a look <strong>in</strong> the sorted view at how the same two dimensions are distributed along<br />

the other clusters, it is not di cult to identify clusters, like C10, which have similar trends<br />

over the two dimensions but have stronger patterns <strong>in</strong> other dimensions (exceptionally low<br />

values for both total lipids and prote<strong>in</strong>s, discussed later), thus the two dimensions are not<br />

selected to characterize the cluster. These types <strong>of</strong> <strong>in</strong>formation are not only useful <strong>in</strong><br />

help<strong>in</strong>g to understand the cluster analysis result, but also add more transparency to the<br />

data m<strong>in</strong><strong>in</strong>g algorithms, which are usually hidden from the user <strong>in</strong> black boxes. At a<br />

closer <strong>in</strong>spection, we can identify a cluster that also shares the two dimensions, but with<br />

an <strong>in</strong>verse trend, that is, low water conta<strong>in</strong>ment and high calories (C6). The detailed<br />

<strong>in</strong>formation reveals that this cluster represents a whole set <strong>of</strong> di erent candies (probably<br />

not the most recommendable food for a diet).<br />

Another <strong>in</strong>terest<strong>in</strong>g cluster is C10, which is characterized by an exceptionally low value<br />

for both total lipids and prote<strong>in</strong>s. All the other records, exclud<strong>in</strong>g the ones <strong>in</strong> C1, have<br />

either consistently high values or higher variances <strong>in</strong> one <strong>of</strong> these two dimensions. They<br />

represent various k<strong>in</strong>ds <strong>of</strong> beverages such as alcoholic beverages, teas, and fruit-based


108 Chapter 5. <strong>Visual</strong> Subspace Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

topp<strong>in</strong>gs. C1 is characterized by the same trend but it forms a di erent cluster with<br />

exceptionally low values for other nutrients like various k<strong>in</strong>ds <strong>of</strong> fats and vitam<strong>in</strong> B12. All<br />

the foods <strong>in</strong> C1 are aga<strong>in</strong> beverages.<br />

Compar<strong>in</strong>g C10 to C1, one can notice that C10 has, <strong>in</strong> fact, a very similar distribution<br />

<strong>of</strong> values <strong>in</strong> the dimensions that are <strong>in</strong>cluded <strong>in</strong> C1. This is a clear example <strong>in</strong> which the<br />

output <strong>of</strong> the algorithm is not optimal and a merge <strong>of</strong> these two would make sense.<br />

Comparison with VISA<br />

Figure 5.7: <strong>Visual</strong>ization <strong>of</strong> the subspace clusters <strong>in</strong> VISA [14] framework discussed <strong>in</strong> Subsection<br />

5.1.5. Cluster view (left), record view (right).<br />

Figure 5.7 shows the representation <strong>in</strong> VISA [14] <strong>of</strong> the same subspace clusters as used<br />

for our above use case (same data set, same cluster<strong>in</strong>g result). As we can see, the 15<br />

clusters are projected <strong>in</strong> the cluster view to a 2D scatterplot us<strong>in</strong>g MDS based on their<br />

dimension similarity (left screenshot <strong>in</strong> Figure 5.7). Each cluster is represented by a circle<br />

scaled accord<strong>in</strong>g to the cluster size. The record-centric view shows the result as a heat<br />

map (right screenshot <strong>in</strong> Figure 5.7), where rows represent records and columns represent<br />

dimensions. Di erent color codes are used <strong>in</strong> the heat map: black for unselected dimensions,<br />

brightness for <strong>in</strong>terest<strong>in</strong>gness, and hue for data values. We recognize the follow<strong>in</strong>g<br />

benefits <strong>in</strong> the ClustNails design regard<strong>in</strong>g VISA:<br />

• Overlap<br />

Circles <strong>of</strong> di erent sizes <strong>in</strong> the VISA MDS projection can cause occlusion problems<br />

and end up with over-cluttered displays. For example, only 9 out <strong>of</strong> 15 clusters<br />

are visible <strong>in</strong> the cluster view <strong>in</strong> Figure 5.7. The Spikes and HeatNails views avoid<br />

overlap. One may argue that scatterplots scale better, but <strong>in</strong> practice the number<br />

<strong>of</strong> clusters <strong>in</strong> a result is usually small, because a large number <strong>of</strong> clusters implies,<br />

<strong>in</strong> many cases, a poor performance <strong>of</strong> the cluster<strong>in</strong>g algorithm [90]. The scatterplot<br />

visualization, on the other hand, su ers from occlusion problems regardless <strong>of</strong> the<br />

number <strong>of</strong> clusters. Also, the ClustNails glyphs provide richer <strong>in</strong>formation for each<br />

cluster, as described next.<br />

• Richer <strong>in</strong>formation<br />

VISA shows only the number <strong>of</strong> records and dimensions <strong>of</strong> each cluster and maps the<br />

similarities between clusters to distances. The Spikes view <strong>in</strong> ClustNails extends this<br />

basic encod<strong>in</strong>g by <strong>in</strong>clud<strong>in</strong>g additional <strong>in</strong>formation about each cluster, permitt<strong>in</strong>g a<br />

user to (1) draw richer <strong>in</strong>formation from the result and (2) detect and understand the


5.1.6 Conclusions and Future Work 109<br />

similarities between clusters more easily. Specifically, the spikes permit one to see the<br />

detailed dimensions and their correspond<strong>in</strong>g importance <strong>in</strong> each subspace and thus<br />

to relate one cluster to another. The l<strong>in</strong>k<strong>in</strong>g-and-brush<strong>in</strong>g technique implemented<br />

<strong>in</strong> the Spikes view helps <strong>in</strong> highlight<strong>in</strong>g the shared dimensions among clusters.<br />

• Order<strong>in</strong>g supports comparison<br />

The ClustNails order<strong>in</strong>g techniques place similar clusters, dimensions, and records<br />

close to each other. These techniques permit to detect similarities and dissimilarities<br />

between the clusters more easily. No order<strong>in</strong>g technique is implemented <strong>in</strong> the<br />

current version <strong>of</strong> VISA, similarity <strong>of</strong> clusters could just be seen <strong>in</strong> the cluster view<br />

represented by the 2D distance <strong>of</strong> clusters <strong>in</strong> the projection.<br />

• Scalability<br />

The heat map solution implemented <strong>in</strong> VISA is <strong>in</strong>itially designed to display a limited<br />

number <strong>of</strong> records that belong to a small subset <strong>of</strong> clusters. The compression<br />

techniques we propose for the thumbnails view <strong>of</strong> HeatNails, can scale up to a much<br />

larger number <strong>of</strong> records and thus is not limited to represent<strong>in</strong>g only a subset <strong>of</strong> the<br />

data. Subspace cluster<strong>in</strong>g algorithms can produce hundreds <strong>of</strong> subspace clusters for<br />

some parameter sett<strong>in</strong>gs. To analyze and understand if the result makes sense the<br />

clusters need to be displayed and compared. Our histogram views can be used to<br />

visualize this output, they can also be ordered l<strong>in</strong>early <strong>in</strong>to more rows, or even a two<br />

dimensional order<strong>in</strong>g heuristic can be developed to make the technique scale.<br />

• Non-member dimensions<br />

In VISA all data values <strong>in</strong> the unselected dimensions are colored <strong>in</strong> black; hence<br />

the <strong>in</strong>formation <strong>in</strong> these segments is miss<strong>in</strong>g from the visualization. This may be<br />

detrimental to data understand<strong>in</strong>g as the <strong>in</strong>formation conta<strong>in</strong>ed <strong>in</strong> those segments<br />

provides evidence <strong>of</strong> why the cluster<strong>in</strong>g algorithm did not select a given dimension<br />

to characterize the cluster. The algorithm choice can be justified if the visualization<br />

shows extreme values or has large variances <strong>in</strong> the unselected dimensions. Our design<br />

displays these dimensions <strong>in</strong> a gray scale so they can be used to understand the result.<br />

5.1.6 Conclusions and Future Work<br />

Subspace cluster<strong>in</strong>g addresses an important problem <strong>in</strong> cluster<strong>in</strong>g multidimensional data.<br />

The algorithms successfully reduce the noise <strong>in</strong> multidimensional data by show<strong>in</strong>g clusters<br />

that exist only <strong>in</strong> subsets <strong>of</strong> dimensions <strong>in</strong> the data. <strong>Visual</strong>ization <strong>of</strong> subspace cluster<strong>in</strong>g<br />

results is challeng<strong>in</strong>g. In addition to the <strong>in</strong>formation conta<strong>in</strong>ed <strong>in</strong> traditional cluster<strong>in</strong>g<br />

results, subsets <strong>of</strong> dimensions that def<strong>in</strong>e clusters, and overlap between dimensions and<br />

records needs to be represented <strong>in</strong> an understandable and uncluttered way. ClustNails<br />

was presented as an <strong>in</strong>teractive data analysis and visualization tool for subspace cluster<strong>in</strong>g<br />

analysis. It provides several novel visualization and order<strong>in</strong>g techniques to help analysts<br />

extract subspace clusters from data and then analyze the results. The system implements<br />

l<strong>in</strong>ked and ordered cluster-centric (Spikes) and a record-centric (HeatNails) views. We<br />

demonstrated the e ectiveness <strong>of</strong> our system design <strong>in</strong> the analysis <strong>of</strong> real world data and<br />

a comparison with exist<strong>in</strong>g visual subspace cluster analysis systems.<br />

For future work one extension <strong>of</strong> the system is really needed – the support <strong>of</strong> parameter<br />

selection, which is a di cult problem given that each algorithm has its own parameters


110 Chapter 5. <strong>Visual</strong> Subspace Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

and di erent sett<strong>in</strong>gs may generate very di erent results. Another extension could be the<br />

development <strong>of</strong> a so called “agreement matrix” among a set <strong>of</strong> results that shows those<br />

parts that most results agree on. The agreement matrix could then be used to evaluate the<br />

quality <strong>of</strong> <strong>in</strong>dividual outputs and to help the analyst to understand the consensus made<br />

by di erent algorithms and parameter sett<strong>in</strong>gs. Another future research direction might<br />

<strong>in</strong>clude improv<strong>in</strong>g the scalability <strong>of</strong> the ClustNails system. While we have not done a<br />

formal evaluation, we assume scalability is restricted to dozens <strong>of</strong> clusters and dimensions,<br />

depend<strong>in</strong>g on the resolution <strong>of</strong> the given display. Some results may conta<strong>in</strong> hundreds <strong>of</strong><br />

clusters and thousands <strong>of</strong> dimensions, for which scalable solutions are needed.<br />

5.2 <strong>Visual</strong> <strong>Analytics</strong> <strong>of</strong> Subspace Search<br />

Many methods are currently available for an explorative data analysis <strong>of</strong> high-dimensional<br />

data spaces. So far, proposed automatic approaches <strong>in</strong>clude dimensionality reduction<br />

and cluster analysis, whereby visual-<strong>in</strong>teractive methods aim to provide e ective visual<br />

mapp<strong>in</strong>gs to show, relate, and navigate this data.<br />

As described before, analyz<strong>in</strong>g high-dimensional data is notoriously di cult as <strong>in</strong>terest<strong>in</strong>g<br />

patterns may occur <strong>in</strong> any possible subspace. We address two important research<br />

directions to discover the patterns hidden <strong>in</strong> this data spaces. One was proposed <strong>in</strong> the<br />

previous section where a visual <strong>in</strong>teractive system to analyze the result <strong>of</strong> subspace cluster<strong>in</strong>g<br />

algorithms for a better understand<strong>in</strong>g was developed. The second direction goes<br />

one step back, before the cluster<strong>in</strong>g step and identifies important subspaces where possible<br />

patterns my occur. Look<strong>in</strong>g at one s<strong>in</strong>gle <strong>in</strong>terest<strong>in</strong>g subspace is <strong>of</strong>ten not su cient s<strong>in</strong>ce<br />

di erent subspaces may show confirmatory, complementary, conjo<strong>in</strong>tly, or contradict<strong>in</strong>g<br />

relations between data items. We propose a novel method for the visual analysis <strong>of</strong> <strong>in</strong>terest<strong>in</strong>g<br />

subspaces, po<strong>in</strong>t<strong>in</strong>g out these type <strong>of</strong> relations. Based on appropriately def<strong>in</strong>ed<br />

subspace similarity functions, we visualize the subspaces and provide navigation facilities<br />

to <strong>in</strong>teractively explore large sets <strong>of</strong> subspaces. Our approach allows users to e ectively<br />

compare and relate subspaces with respect to <strong>in</strong>volved dimensions and clusters <strong>of</strong> objects.<br />

We apply our approach to synthetic and real data sets. We thereby demonstrate its<br />

support for understand<strong>in</strong>g high-dimensional data from di erent perspectives, e ectively<br />

yield<strong>in</strong>g a more complete view on high-dimensional data.<br />

5.2.1 Introduction<br />

For large feature spaces, <strong>in</strong>terest<strong>in</strong>g patterns may <strong>of</strong>ten be located only <strong>in</strong> subspace projections<br />

<strong>of</strong> the data. As <strong>in</strong>sights may not be hidden <strong>in</strong> only one s<strong>in</strong>gle subspace, relevant<br />

analysis should consider also multiple subspaces and their <strong>in</strong>terrelations. Especially, for<br />

high-dimensional data we can expect to have di erent views <strong>of</strong> the same data [58, 107],<br />

i.e., the same objects might group di erently given di erent subspace perspectives (see<br />

Figure 5.8 for an illustration). The existence <strong>of</strong> alternative relevant subspaces may stem<br />

from the data description process, e.g., when dur<strong>in</strong>g preprocess<strong>in</strong>g, features (dimensions)<br />

describ<strong>in</strong>g di erent semantic properties <strong>of</strong> the data, are comb<strong>in</strong>ed. For <strong>in</strong>stance, <strong>in</strong> demographic<br />

analysis, households are <strong>of</strong>ten described by an array <strong>of</strong> many variables, combi-


5.2.1 Introduction 111<br />

nations that constitute di erent conceptual doma<strong>in</strong>s, such as wealth, mobility, or health.<br />

Likewise, it may be the comb<strong>in</strong>ation <strong>of</strong> otherwise not semantically related dimensions,<br />

which by their comb<strong>in</strong>ation result <strong>in</strong> <strong>in</strong>terest<strong>in</strong>g patterns. In the <strong>Data</strong> M<strong>in</strong><strong>in</strong>g community,<br />

a class <strong>of</strong> so-called Subspace Analysis algorithms has been proposed to cope with the problem<br />

<strong>of</strong> identify<strong>in</strong>g <strong>in</strong>terest<strong>in</strong>g subspaces and clusters from a high-dimensional data set. To<br />

date, however, there has been a very limited focus on the presentation and <strong>in</strong>terpretation<br />

<strong>of</strong> the generated output. Furthermore, subspace analysis <strong>of</strong>ten produces highly redundant<br />

results that need to be further manipulated <strong>in</strong> order to get mean<strong>in</strong>gful results [101].<br />

"travel<strong>in</strong>g subspace"<br />

"health subspace"<br />

<strong>in</strong>come<br />

blood pressure<br />

travel<strong>in</strong>g frequency<br />

age<br />

Figure 5.8: Alternative data distributions and group<strong>in</strong>gs from [103] <strong>in</strong> two di erent subspaces <strong>of</strong><br />

a larger high-dimensional data space (doma<strong>in</strong> here: demographic data analysis). Our proposed<br />

visual analysis method <strong>in</strong>tegrates the notion <strong>of</strong> alternative subspaces <strong>in</strong>to the analysis process and<br />

l<strong>in</strong>ks it to the task <strong>of</strong> comparative cluster analysis.<br />

We propose an <strong>in</strong>itial step towards the use <strong>of</strong> visual analytics as a way to explore<br />

alternative views generated by subspace analysis algorithms. We def<strong>in</strong>e an analytical<br />

pipel<strong>in</strong>e made <strong>of</strong> algorithmic and visual components that permits to s<strong>in</strong>gle out and explore<br />

alternative views <strong>in</strong> the data. After be<strong>in</strong>g analyzed by a subspace search algorithm, the<br />

data is structured and further processed <strong>in</strong> an <strong>in</strong>teractive visualization environment to<br />

reduce redundancy.<br />

The ma<strong>in</strong> contribution <strong>of</strong> this section is the operative def<strong>in</strong>ition and implementation<br />

<strong>of</strong> this multistep pipel<strong>in</strong>e that permits to sift through an exponential number <strong>of</strong> subspace<br />

candidates and to reduce the problem to a handful <strong>of</strong> relevant views. More specifically, we<br />

1. <strong>in</strong>troduce a mechanism to deal with subspace redundancy by def<strong>in</strong><strong>in</strong>g topological and<br />

dimensional subspace similarity and by allow<strong>in</strong>g flexible and <strong>in</strong>teractive subspace<br />

aggregation;<br />

2. provide a well-reasoned <strong>in</strong>teractive visualization environment that permits to compare<br />

and assess alternative views by visually compar<strong>in</strong>g topological and dimensional<br />

similarities and strike a balance between visual complexity and level <strong>of</strong> detail.<br />

We evaluate our method through two case studies. The first is based on synthetic<br />

data to check whether the tool does what it is supposed to do. The second is based on<br />

real-world data to demonstrate how the tool can help f<strong>in</strong>d<strong>in</strong>g and <strong>in</strong>terpret<strong>in</strong>g alternative<br />

views <strong>in</strong> high-dimensional data. We believe these results show the potential <strong>of</strong> visual<br />

analytics <strong>in</strong> the context <strong>of</strong> automated m<strong>in</strong><strong>in</strong>g algorithms. It furthermore shows how the<br />

use <strong>of</strong> visual analytics can enhance the understand<strong>in</strong>g <strong>of</strong> the results <strong>of</strong> automated data<br />

analysis methods, and lead to new questions concern<strong>in</strong>g more e ective or more e cient<br />

algorithms.<br />

The rema<strong>in</strong>der <strong>of</strong> this section is structured as follows. In Section 5.2.2, we discuss<br />

concepts underly<strong>in</strong>g the class <strong>of</strong> subspace search algorithms that are important for our


112 Chapter 5. <strong>Visual</strong> Subspace Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

approach. In Section 5.2.3, we then <strong>in</strong>troduce our analytic methodology and suggested<br />

workflow, detail<strong>in</strong>g the employed algorithmic and visual-<strong>in</strong>teractive components employed.<br />

Section 5.2.4 demonstrates the application <strong>of</strong> our tool to synthetic and real data sets,<br />

show<strong>in</strong>g its usefulness for the problem at hand. In Section 5.2.5, we discuss advantages<br />

and limitations <strong>of</strong> our methodology and conclude <strong>in</strong> Section 5.2.6.<br />

5.2.2 Subspace Analysis<br />

In this section, we discuss the challenges for visual subspace analysis <strong>in</strong> more detail and<br />

expla<strong>in</strong> how we tackle these with our new <strong>in</strong>teractive, explorative framework supported<br />

by subspace search algorithms.<br />

As is commonly known <strong>in</strong> subspace cluster<strong>in</strong>g, deal<strong>in</strong>g with high-dimensional data <strong>in</strong> its<br />

subspace projections faces two ma<strong>in</strong> challenges. The first, serious challenge is a reasonable<br />

scalability with regard to the dimensionality <strong>of</strong> the data set. As for a d-dimensional data<br />

set the number <strong>of</strong> possible subspaces S {1,...,d} is q d ! d<br />

k=1 k" =2 d ≠ 1, many subspace<br />

cluster<strong>in</strong>g approaches do not scale well for very high-dimensional data. Every algorithm<br />

has to employ some strategy and heuristics to cope with such an exponential search space.<br />

The second, closely related challenge is deal<strong>in</strong>g with high redundancy, that stems from<br />

the high similarity <strong>of</strong> the exponentially many subspaces. If two subspaces share a high<br />

proportion <strong>of</strong> dimensions, they are likely to exhibit a very similar cluster<strong>in</strong>g structure [58].<br />

A large search result with high redundancy is, however, not beneficial for the user as it<br />

masks the complete <strong>in</strong>formation and is hard to <strong>in</strong>terpret.<br />

A core task <strong>in</strong> analysis <strong>of</strong> high-dimensional data is to apply a cluster<strong>in</strong>g method to<br />

reduce data complexity and identify groups <strong>of</strong> data for comparison. Di erent cluster<strong>in</strong>g algorithms<br />

follow di erent cluster<strong>in</strong>g notions, e.g., there exist density- (e.g., DBSCAN [50])<br />

or compactness-based (e.g., k-means) cluster<strong>in</strong>g methods, and their outcomes <strong>of</strong>ten depend<br />

crucially on non-<strong>in</strong>tuitive parameter sett<strong>in</strong>gs. Usually several cluster<strong>in</strong>g attempts<br />

are required until the user has a usable result. It is obvious that high runtimes <strong>of</strong> subspace<br />

cluster<strong>in</strong>g processes (see Section 2.4.2) are not tolerable for such a workflow. Consequently,<br />

we decided to start the visual data exploration one step before the actual cluster<strong>in</strong>g process<br />

and decouple subspace search and the actual cluster<strong>in</strong>g. Dedicated subspace search algorithms<br />

[16, 38, 84] have been designed to e ciently filter and rank the possible subspaces<br />

accord<strong>in</strong>g to specific quality criteria (or <strong>in</strong>terest<strong>in</strong>gness measures, see also below). After<br />

subspace search has taken place, an arbitrary cluster<strong>in</strong>g approach can be used to cluster<br />

<strong>in</strong> the identified subspaces.<br />

The use <strong>of</strong> subspace search for our purposes has several advantages: (1) It helps to<br />

e ectively filter out those subspaces that based on low <strong>in</strong>terest<strong>in</strong>gness do not need to<br />

be considered by the user. (2) Subspace search approaches are designed to reduce the<br />

search space e ciently and they do not need to compute clusters. And (3) although,<br />

subspace search approaches themselves also rely on certa<strong>in</strong> assumptions <strong>of</strong> what makes a<br />

subspace <strong>in</strong>terest<strong>in</strong>g, these assumptions do not necessarily lead to very di erent subspaces<br />

among di erent approaches. Therefore, the results are not as biased as they are for<br />

di erent cluster<strong>in</strong>g algorithms, which enables the user to already obta<strong>in</strong> valuable results<br />

with one subspace search approach. For example, the quality assessment based on the<br />

k-NN distance [16], favors neither the DBSCAN nor the k-means cluster<strong>in</strong>g notion. And<br />

(4), <strong>in</strong>tegrat<strong>in</strong>g the subspace search <strong>in</strong>to the high-dimensional analysis o ers the user the<br />

opportunity to obta<strong>in</strong> a visual, <strong>in</strong>tuitive overview <strong>of</strong> the cluster<strong>in</strong>g structure before even


5.2.3 Proposed Analytical Workflow 113<br />

start<strong>in</strong>g the actual cluster<strong>in</strong>g. Thus, the user can assess the potential <strong>of</strong> the data to deliver<br />

valuable cluster<strong>in</strong>g results at all; decide which subspaces are to be clustered; decide which<br />

cluster<strong>in</strong>g notion to follow <strong>in</strong> each subspace (s<strong>in</strong>ce the notion does not need to be the same<br />

for all); more easily determ<strong>in</strong>e mean<strong>in</strong>gful parameter sett<strong>in</strong>gs for cluster<strong>in</strong>g approaches.<br />

Subspace search methods guide their search process by specific <strong>in</strong>terest<strong>in</strong>gness scores<br />

that are def<strong>in</strong>ed heuristically. For example, the method proposed <strong>in</strong> [38] considers as<br />

<strong>in</strong>terest<strong>in</strong>gness score the variation <strong>of</strong> the density <strong>of</strong> objects across a regular cell-based partition<strong>in</strong>g<br />

<strong>of</strong> a given subspace. The underly<strong>in</strong>g assumption is that the higher the variation<br />

<strong>of</strong> density the higher the probability that the subspace shows a mean<strong>in</strong>gful structure. As<br />

another example, the SURFING method [16] relies on the histogram <strong>of</strong> the k-nearest neighbor<br />

distances for all objects <strong>in</strong> a given subspace. It considers subspaces with non-uniform<br />

distance distributions more <strong>in</strong>terest<strong>in</strong>g (as they are an <strong>in</strong>dication <strong>of</strong> the presence <strong>of</strong> strong<br />

cluster<strong>in</strong>gs). Here the underly<strong>in</strong>g assumption is that for subspaces that show mean<strong>in</strong>gful<br />

structures (e.g., clusters), di erent k-NN distances will occur. These and other measures<br />

aim at identify<strong>in</strong>g subspaces that show a high “contrast” with respect to the distribution<br />

<strong>of</strong> objects thereby allow<strong>in</strong>g to spot mean<strong>in</strong>gful structure <strong>in</strong> the subspaces.<br />

Subspace search methods also typically conta<strong>in</strong> heuristic approaches for early abandon<strong>in</strong>g<br />

un<strong>in</strong>terest<strong>in</strong>g subspaces, as exhaustive search would be prohibitively expensive.<br />

SURFING for example is based on a bottom-up strategy for search<strong>in</strong>g subspaces by <strong>in</strong>creas<strong>in</strong>g<br />

dimensionality. It is based on test<strong>in</strong>g additional dimensions for subspaces already<br />

known to be <strong>in</strong>terest<strong>in</strong>g. The list <strong>of</strong> currently <strong>in</strong>terest<strong>in</strong>g subspaces is cont<strong>in</strong>uously pruned<br />

to keep only the most <strong>in</strong>terest<strong>in</strong>g subspaces and speed up the search. SURFING has no<br />

dimensionality bias, assumes no specific cluster<strong>in</strong>g structure, and <strong>in</strong> practice, it is parameter<br />

free. Due to these properties, we rely on this method <strong>in</strong> our proposed approach, us<strong>in</strong>g<br />

the implementation provided to us by the orig<strong>in</strong>al authors, but other subspace search<br />

algorithms could be easily used as well.<br />

Overall, us<strong>in</strong>g the results <strong>of</strong> a subspace search algorithm as a start<strong>in</strong>g po<strong>in</strong>t for our<br />

visualization has many advantages. Subspace search methods such as SURFING employ<br />

e cient search strategies tackl<strong>in</strong>g the e ciency challenge <strong>of</strong> subspace analysis. However,<br />

they typically do not solve the challenge <strong>of</strong> high redundancy. Our proposed visual analytical<br />

workflow, which is <strong>in</strong>troduced next, starts precisely at this po<strong>in</strong>t.<br />

5.2.3 Proposed Analytical Workflow<br />

We propose a carefully designed visual analytics workflow for subspace-based exploration<br />

<strong>of</strong> high-dimensional data, mak<strong>in</strong>g use <strong>of</strong> algorithmic subspace search <strong>in</strong> comb<strong>in</strong>ation with<br />

visual-<strong>in</strong>teractive representations for user-based filter<strong>in</strong>g and exploration. Our approach<br />

starts (1) with an automatic subspace search step, where a large number <strong>of</strong> <strong>in</strong>terest<strong>in</strong>g<br />

subspaces is selected by a subspace search algorithm. Current subspace search methods<br />

provide an algorithmic handl<strong>in</strong>g <strong>of</strong> the problem <strong>of</strong> f<strong>in</strong>d<strong>in</strong>g <strong>in</strong>terest<strong>in</strong>g subspaces, yet they<br />

<strong>of</strong>ten produce too many subspaces that may also be redundant and thereby overwhelm the<br />

<strong>in</strong>teractive analysis (see also Section 5.2.2). We therefore employ similarity-based group<strong>in</strong>g<br />

<strong>of</strong> subspaces (2) and perform the <strong>in</strong>teractive exploration <strong>of</strong> <strong>in</strong>terest<strong>in</strong>g subspaces based on<br />

a few group representatives. Appropriate visual representations and <strong>in</strong>teractions support<br />

the visual <strong>in</strong>teractive analysis (3) for better understand<strong>in</strong>g the subspace search results,<br />

<strong>in</strong>clud<strong>in</strong>g the support for comparative cluster analysis.<br />

Figure 5.9 depicts our proposed analytical workflow. We next detail the technical


114 Chapter 5. <strong>Visual</strong> Subspace Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

HD <strong>Data</strong><br />

Subspace Search<br />

e.g. SURFING<br />

Interest<strong>in</strong>g<br />

Subspaces<br />

Subspace<br />

Group<strong>in</strong>g and Filter<strong>in</strong>g<br />

e.g. Hierarchical Cluster<strong>in</strong>g<br />

based on subspace similarity<br />

Redundancy<br />

Reduced View<br />

Subspace Interaction<br />

e.g. color<strong>in</strong>g clusters<br />

Cluster<br />

Colored View<br />

Figure 5.9: Our proposed analysis pipel<strong>in</strong>e. A subspace selection algorithm is applied to automatically<br />

identify a candidate set <strong>of</strong> <strong>in</strong>terest<strong>in</strong>g subspaces. A filter<strong>in</strong>g step reduces the potentially large<br />

and redundant set <strong>of</strong> automatically obta<strong>in</strong>ed subspaces to a user-selectable number <strong>of</strong> represent<strong>in</strong>g<br />

subspaces. <strong>Visual</strong>-<strong>in</strong>teractive user exploration then proceeds on the subspace representations. Subspace<br />

analysis is also supported by comparative cluster views, allow<strong>in</strong>g users to identify mean<strong>in</strong>gful<br />

similar, complementary or even conflict<strong>in</strong>g cluster<strong>in</strong>g structures <strong>in</strong> the set <strong>of</strong> subspaces.<br />

design decisions made for each <strong>of</strong> the analysis steps, <strong>in</strong>clud<strong>in</strong>g discussion <strong>of</strong> alternatives.<br />

Generation <strong>of</strong> <strong>in</strong>terest<strong>in</strong>g subspace candidates<br />

To search for <strong>in</strong>terest<strong>in</strong>g subspaces <strong>of</strong> an high-dimensional data, we propose to use a<br />

subspace search algorithm. We employ automatic subspace search as a tool to serve our<br />

ma<strong>in</strong> purpose, which is to explore high-dimensional data <strong>in</strong> an e ective manner. The<br />

advantages for choos<strong>in</strong>g subspace search, and <strong>in</strong> particular SURFING, have been already<br />

discussed <strong>in</strong> detail <strong>in</strong> Section 5.2.2. We observe that typically subspace search algorithms<br />

output a huge number <strong>of</strong> subspaces that are <strong>of</strong>ten rather redundant with respect to the<br />

reported <strong>in</strong>terest<strong>in</strong>gness <strong>in</strong>dex and the sets <strong>of</strong> <strong>in</strong>volved dimension shows high overlap.<br />

S<strong>in</strong>ce the exam<strong>in</strong>ation <strong>of</strong> all subspaces is <strong>in</strong>feasible, a common approach is to filter the<br />

subspaces based on a certa<strong>in</strong> threshold. This, however, ignores the fact, that the first<br />

ranked subspaces might be only slight variations (i.e., high overlap <strong>of</strong> dimension sets)<br />

<strong>of</strong> the same subspace and therefore are redundant to each other. However, <strong>in</strong>terest<strong>in</strong>g<br />

subspaces with substantially di erent dimension sets, as compared to the top ranked<br />

results, could be found at much later rank<strong>in</strong>g positions, and run the risk to be neglected<br />

from the analysis. Therefore, we apply a group<strong>in</strong>g step based on an appropriately def<strong>in</strong>ed<br />

notion <strong>of</strong> subspace similarity, as described next.<br />

Similarity-based subspace group<strong>in</strong>g and filter<strong>in</strong>g<br />

Given a large number <strong>of</strong> candidate subspaces, we apply hierarchical group<strong>in</strong>g and filter<strong>in</strong>g<br />

to yield a smaller set <strong>of</strong> mutually su ciently di erent, yet <strong>in</strong>dividually <strong>in</strong>terest<strong>in</strong>g groups<br />

<strong>of</strong> subspaces for <strong>in</strong>teractive analysis. Our filter<strong>in</strong>g and group<strong>in</strong>g operation is based on a<br />

custom similarity function def<strong>in</strong>ed on pairs <strong>of</strong> subspaces accord<strong>in</strong>g to two ma<strong>in</strong> criteria:<br />

(1) overlap <strong>of</strong> the sets <strong>of</strong> dimensions that constitute the respective subspaces, and (2)<br />

resemblance <strong>in</strong> the data topology given <strong>in</strong> the respective subspaces.<br />

Similarity based on dimension overlap<br />

Subspaces can be similar regard<strong>in</strong>g their constituent dimensions. We use the Tanimoto<br />

Similarity [117] on bit vectors <strong>in</strong>dicat<strong>in</strong>g the conta<strong>in</strong>ed (active) dimensions <strong>in</strong> a respective<br />

subspace (1 denotes an active dimension, 0 the converse). The Tanimoto Similarity is then<br />

computed as the fraction <strong>of</strong> dimensions conta<strong>in</strong>ed <strong>in</strong> both subspaces (AND-<strong>in</strong>g <strong>of</strong> the bit<br />

vectors), among the total number <strong>of</strong> di erent dimensions occurr<strong>in</strong>g <strong>in</strong> the subspaces (OR<strong>in</strong>g<br />

<strong>of</strong> the bit vectors).


5.2.3 Proposed Analytical Workflow 115<br />

Similarity based on data topology<br />

We also compare subspaces with regard to their data distribution. Specifically, we consider<br />

the similarity <strong>of</strong> k-NN relationships <strong>in</strong> the respective subspaces. For e ciency reasons, we<br />

compute the k-nearest neighborhood (k = 5) lists for a sample <strong>of</strong> 5% <strong>of</strong> the conta<strong>in</strong>ed data<br />

po<strong>in</strong>ts. The similarity between two subspaces is then evaluated as the average percentage<br />

<strong>of</strong> agreement <strong>of</strong> k-NN lists <strong>in</strong> the subspaces. This score measures the similarity <strong>of</strong> the<br />

k-NN topology <strong>of</strong> the data, where k is a parameter and can be adapted to the data sets<br />

at hand by the user. Note that also other similarity measures are <strong>in</strong> pr<strong>in</strong>ciple possible.<br />

For <strong>in</strong>stance, the data could be clustered and the similarity between subspaces evaluated<br />

accord<strong>in</strong>g to the resemblance <strong>of</strong> obta<strong>in</strong>ed cluster<strong>in</strong>gs by an appropriate measure such as<br />

the RandIndex [114].<br />

These two distance functions are the basis for the subspace group<strong>in</strong>g step <strong>in</strong> our analytical<br />

workflow as follows:<br />

1. Subspace group<strong>in</strong>g: We apply hierarchical agglomerative group<strong>in</strong>g <strong>of</strong> subspaces<br />

based on the topologic distance function us<strong>in</strong>g Ward’s m<strong>in</strong>imum variance method [144].<br />

Based on the dendrogram representation <strong>of</strong> the obta<strong>in</strong>ed hierarchical group<strong>in</strong>g, the<br />

user chooses the hierarchy depth level to select a number <strong>of</strong> groups. This way the<br />

user can easily decide how many clusters are desired for the analysis.<br />

2. Subspace filter<strong>in</strong>g: Based on the previously achieved group<strong>in</strong>g <strong>of</strong> subspaces, we<br />

filter one subspace from each group as representative: for each group we consider<br />

the subspaces with the lowest dimensionality and choose the one that exhibits the<br />

highest <strong>in</strong>terest<strong>in</strong>gness score. We note that other rules for filter<strong>in</strong>g representatives<br />

are possible, but f<strong>in</strong>d that this rule is robust and e ective for users, as it tries to<br />

keep the dimensionality as low as possible.<br />

These steps together with both distance functions, take us further towards our goal <strong>of</strong><br />

understand<strong>in</strong>g the di erent k<strong>in</strong>ds <strong>of</strong> relationships between subspaces. They can complement,<br />

confirm, or contradict each other and be<strong>in</strong>g aware <strong>of</strong> these relations can be crucial<br />

for further m<strong>in</strong><strong>in</strong>g tasks.<br />

conta<strong>in</strong>ed dimensions<br />

similar<br />

not similar<br />

data topology<br />

similar<br />

redundant<br />

confirmatory<br />

not similar<br />

dom<strong>in</strong>ant<br />

dimensions<br />

complementary<br />

Figure 5.10: Filter<strong>in</strong>g cases that can be supported by our two def<strong>in</strong>ed subspace similarity functions.<br />

Four basic cases can be identified, each <strong>of</strong> which might be relevant for a given subspace<br />

analysis task:<br />

1. Subspaces that are similar <strong>in</strong> both, their conta<strong>in</strong>ed dimension sets and their data<br />

topology (redundant subspaces);<br />

2. Subspaces that are dissimilar <strong>in</strong> both, their conta<strong>in</strong>ed dimensions and their data<br />

topology (complementary subspaces);


116 Chapter 5. <strong>Visual</strong> Subspace Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

3. Subspaces that are similar with regard to data topology but dissimilar regard<strong>in</strong>g<br />

their conta<strong>in</strong>ed dimensions (confirmatory subspaces: we confirm the same data relationships<br />

<strong>in</strong> di erent subspaces); and<br />

4. Subspaces that are similar with regard to their conta<strong>in</strong>ed dimensions, but dissimilar<br />

regard<strong>in</strong>g topology (this is generally not expected but could <strong>in</strong>dicate the existence<br />

<strong>of</strong> one or a few dimensions that are by their nature very dom<strong>in</strong>ant for the data<br />

topology).<br />

Figure 5.10 illustrates these four basic filter<strong>in</strong>g cases.<br />

<strong>Visual</strong>-<strong>in</strong>teractive design<br />

After hierarchical aggregation and/or filter<strong>in</strong>g <strong>of</strong> the potentially redundant set <strong>of</strong> subspaces<br />

have taken place, we apply a set <strong>of</strong> analytical views for explor<strong>in</strong>g and compar<strong>in</strong>g the<br />

subspaces. Our displays are based on (1) scatterplot-oriented representations <strong>of</strong> <strong>in</strong>dividual<br />

subspaces or groups <strong>of</strong> subspaces, (2) similarity-based or l<strong>in</strong>ear list layouts for sets <strong>of</strong><br />

subspaces, and (3) additional <strong>in</strong>formative views (parallel coord<strong>in</strong>ates and color-cod<strong>in</strong>g for<br />

comparison <strong>of</strong> groups <strong>in</strong> data).<br />

The proposed design is the result <strong>of</strong> several iterations <strong>of</strong> alternative solutions <strong>in</strong> which<br />

we explored and compared several representations. Two design choices are worth discuss<strong>in</strong>g<br />

here: (1) the design <strong>of</strong> a visual representative for subspaces and (2) their layout. We<br />

decided to represent subspaces with scatterplots because they allow for the identification<br />

and comparison <strong>of</strong> groups <strong>in</strong> the data. More abstract representations (like simple colored<br />

marks) would require less space but would not allow the rich topological comparison<br />

provided by the scatterplots. In contrast, representations that are more complex like, e.g.,<br />

parallel coord<strong>in</strong>ates would provide a direct representation <strong>of</strong> the dimensions <strong>in</strong>cluded <strong>in</strong><br />

the subspace but would make their representation much more cluttered. As for the layout,<br />

we tried several tree and graph layouts to make the relationship between the subspaces and<br />

their shared dimensions explicit, however, we found that this rarely provides <strong>in</strong>terest<strong>in</strong>g<br />

<strong>in</strong>sights and makes the visualization too cluttered to be <strong>of</strong> any use.<br />

Figure 5.11: Subspace representation by 2D scatterplots with dimension glyph. We can see the<br />

visual representations <strong>of</strong> two 5D subspaces (left) and one 4D subspace (right).<br />

To represent each subspace <strong>in</strong> a similar way, <strong>in</strong>dependent <strong>of</strong> its dimensionality, we<br />

decided to plot each subspace <strong>in</strong> a 2D scatterplot. The scatterplot representation can<br />

be generated by any appropriate projection technique such as PCA [83], MDS [41] or<br />

t-SNE [143], to name a few. We currently use MDS; however, we experimented with<br />

other dimension reduction techniques and found that other techniques could be used al-


5.2.3 Proposed Analytical Workflow 117<br />

ternatively. To convey the <strong>in</strong>volved subspace dimensions, we add an <strong>in</strong>dex glyph to the<br />

respective scatterplot (see Figure 5.11).<br />

1<br />

2 3<br />

Figure 5.12: (1) L<strong>in</strong>early sorted view <strong>of</strong> subspaces for the 12D synthetical data set from [52]<br />

show<strong>in</strong>g the full result <strong>of</strong> SURFING, consist<strong>in</strong>g <strong>of</strong> 296 subspaces. The selected subspace <strong>in</strong> this<br />

view is shown <strong>in</strong> a (2) s<strong>in</strong>gle subspace view to enable <strong>in</strong>teraction and <strong>in</strong> (3) a parallel coord<strong>in</strong>ates<br />

view with the subspace dimensions as the first axes (highlighted), and all the other data dimension<br />

as the last axes.<br />

The analytical views are comb<strong>in</strong>ed and l<strong>in</strong>ked <strong>in</strong> an application that consists <strong>of</strong> the<br />

follow<strong>in</strong>g components:<br />

L<strong>in</strong>early sorted view <strong>of</strong> subspaces<br />

To obta<strong>in</strong> a first overview <strong>of</strong> the output <strong>of</strong> the subspace search algorithm, we present all the<br />

subspaces <strong>in</strong> a l<strong>in</strong>ear view. The MDS scatterplots represent<strong>in</strong>g the <strong>in</strong>dividual subspaces<br />

are sorted left-to-right and top-down accord<strong>in</strong>g to the <strong>in</strong>terest<strong>in</strong>gness <strong>in</strong>dex provided by<br />

the subspace search method. This view is exclusively used as a detail view for groups <strong>of</strong><br />

topologically similar subspaces. Figure 5.12(1) illustrates the subspaces <strong>of</strong> the synthetic<br />

data set, which is described later <strong>in</strong> Subsection 5.2.4.<br />

Subspace group view<br />

In this view, groups <strong>of</strong> subspaces that have been formed by hierarchical agglomerative<br />

group<strong>in</strong>g are shown. Each group is represented by one selected subspace from that group,<br />

us<strong>in</strong>g the filter<strong>in</strong>g method as described <strong>in</strong> the previous subsection. Figure 5.13 shows the<br />

dendrogram provided by the hierarchical group<strong>in</strong>g algorithm <strong>of</strong> all 296 subspaces visible<br />

<strong>in</strong> the l<strong>in</strong>early sorted view. Each node <strong>in</strong> the dendrogram represents a cluster at a certa<strong>in</strong><br />

similarity. A larger image <strong>of</strong> the dendrogram can be seen <strong>in</strong> Appendix A.4.<br />

The user can navigate trough this hierarchy (possible with the hierarchical navigation<br />

buttons shown <strong>in</strong> Figure 5.16(6)) and specify a certa<strong>in</strong> similarity threshold for cluster<strong>in</strong>g.


Subspaces<br />

FL<br />

FI<br />

DI<br />

BJ<br />

FJ<br />

118 Chapter 5. <strong>Visual</strong> Subspace Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

CFGIJK<br />

CFGIJKL<br />

CFGIK<br />

CFGIKL<br />

CFGJK<br />

CFGJKL<br />

CFGK<br />

CFGKL<br />

CFIKL<br />

CFIJKL<br />

CFIK<br />

CFIJK<br />

CFKL<br />

CFJKL<br />

CFHIKL<br />

CFHIJKL<br />

CFHKL<br />

CFHJKL<br />

CFHK<br />

CFHJK<br />

CFK<br />

CFJK<br />

CFGHJK<br />

CFGHJKL<br />

CFGHK<br />

CFGHKL<br />

CFHIK<br />

CFHIJK<br />

CFGHIK<br />

CFGHIJK<br />

CFIJL<br />

CFHIJL<br />

CFIL<br />

CFHIL<br />

CFJL<br />

CFHJL<br />

CFL<br />

CFHL<br />

CFHI<br />

CFHIJ<br />

CFI<br />

CFIJ<br />

CFH<br />

CFJ<br />

CFHJ<br />

CFGHI<br />

CFGHIJ<br />

CFGI<br />

CFGIJ<br />

CFGJ<br />

CFGHJ<br />

CFG<br />

CFGH<br />

CFGHIL<br />

CFGHIJL<br />

CFGHIKL<br />

CFGHIJKL<br />

CFGHL<br />

CFGHJL<br />

CFGJL<br />

CFGIJL<br />

CFGL<br />

CFGIL<br />

CDGIK<br />

CDGIJK<br />

CDGJK<br />

CDGHJK<br />

CDGK<br />

CDGHK<br />

CDGHKL<br />

CDGHJKL<br />

CDGJKL<br />

CDGIJKL<br />

CDGKL<br />

CDGIKL<br />

CDGHIJ<br />

CDGHIJL<br />

CDGHI<br />

CDGHIL<br />

CDGHIKL<br />

CDGHIJKL<br />

CDGHIK<br />

CDGHIJK<br />

CDGJ<br />

CDGIJ<br />

CDG<br />

CDGI<br />

CDGIL<br />

CDGIJL<br />

CDGJL<br />

CDGHJL<br />

CDGL<br />

CDGHL<br />

CDHIK<br />

CDHKL<br />

CDHIKL<br />

CDIK<br />

CDIKL<br />

CDK<br />

CDKL<br />

CDHJKL<br />

CDHIJKL<br />

CDJKL<br />

CDIJKL<br />

CDHK<br />

CDHJK<br />

CDJK<br />

CDIJK<br />

CDHIJK<br />

CDIJL<br />

CDHIJL<br />

CDIL<br />

CDHIL<br />

CDIJ<br />

CDHIJ<br />

CDI<br />

CDHI<br />

CDGH<br />

CDGHJ<br />

CDH<br />

CDHJ<br />

CDL<br />

CDHL<br />

CDJ<br />

CDJL<br />

CDHJL CF<br />

BCF<br />

CDF<br />

BCDF BC<br />

CD<br />

BCD<br />

CDFGHJ<br />

CDFGHJL<br />

CDFGJ<br />

CDFGJL<br />

CDFGL<br />

CDFGHL<br />

CDFG<br />

CDFGH<br />

CDFJL<br />

CDFHJL<br />

CDFJ<br />

CDFHJ<br />

CDFL<br />

CDFHL<br />

CDFIJL<br />

CDFHIJL<br />

CDFIL<br />

CDFHIL<br />

CDFGIL<br />

CDFGIJL<br />

CDFIJ<br />

CDFGIJ<br />

CDFI<br />

CDFGI<br />

CDFGHIL<br />

CDFGHIJL<br />

CDFGHI<br />

CDFGHIJ<br />

CDFH<br />

CDFHI<br />

CDFHIJ<br />

CDFIK<br />

CDFGIK<br />

CDFK<br />

CDFGK<br />

CDFHJK<br />

CDFHIJK<br />

CDFJK<br />

CDFIJK<br />

CDFHK<br />

CDFHIK<br />

CDFGHIK<br />

CDFGHIJK<br />

CDFGHK<br />

CDFGHJK<br />

CDFHIKL<br />

CDFHIJKL<br />

CDFHKL<br />

CDFHJKL<br />

CDFGHIKL<br />

CDFGHIJKL<br />

CDFGHKL<br />

CDFGHJKL<br />

CDFKL<br />

CDFIKL<br />

CDFJKL<br />

CDFIJKL<br />

CDFGKL<br />

CDFGIKL<br />

CDFGJKL<br />

CDFGIJKL<br />

CDFGJK<br />

CDFGIJK BL<br />

DL<br />

FH BH<br />

DH<br />

DG FG BG DF<br />

BD<br />

BF<br />

BDF BI<br />

GI<br />

IJ<br />

JL<br />

HJ<br />

DJ IL<br />

GL<br />

HL<br />

GH GJ<br />

IK<br />

BK<br />

FK<br />

DK HI<br />

KL<br />

JK<br />

CJ<br />

hierarchical agglomerative group<strong>in</strong>g<br />

synthetic dataset<br />

HK<br />

GK<br />

CGHIL<br />

CGHIJL<br />

CGIL<br />

CGIJL<br />

CHIL<br />

CHIJL<br />

CIL<br />

CIJL CI<br />

CGI<br />

CGIJ<br />

CGHIJ<br />

CIJ<br />

CHIJ<br />

CGJL<br />

CGHJL<br />

CGL<br />

CGHL<br />

CJL<br />

CHJL<br />

CL<br />

CHL<br />

CGJ<br />

CGHJ<br />

CHJ<br />

CG<br />

CGH CH<br />

CHI<br />

CGHI<br />

CGIJKL<br />

CGHIJKL<br />

CGIKL<br />

CGHIKL<br />

CGIJK<br />

CGHIJK<br />

CGIK<br />

CGHIK<br />

CHIKL<br />

CHIJKL<br />

CIKL<br />

CIJKL<br />

CIJK<br />

CHIJK<br />

CIK<br />

CHIK<br />

CGJKL<br />

CGHJKL<br />

CGJK<br />

CGHJK<br />

CGHK<br />

CGHKL<br />

CGK<br />

CGKL<br />

CJKL<br />

CHJKL<br />

CKL<br />

CHKL<br />

CHK<br />

CHJK CK<br />

CJK<br />

Distance (Similarity)<br />

0 5 10 15 20 25 30<br />

Figure 5.13: Hierarchical agglomerative group<strong>in</strong>g <strong>of</strong> the 296 <strong>in</strong>terest<strong>in</strong>g subspaces. The red l<strong>in</strong>e<br />

shows the threshold for 6 groups shown <strong>in</strong> the subspace group view. Each group is marked by a<br />

colored rectangle. The colors are ma<strong>in</strong>ta<strong>in</strong>ed <strong>in</strong> Figure 5.14.<br />

This threshold is <strong>in</strong>dicated by the red l<strong>in</strong>e <strong>in</strong> the figure show<strong>in</strong>g the dendrogram, result<strong>in</strong>g<br />

<strong>in</strong> six groups visible <strong>in</strong> the subspace group view presented <strong>in</strong> Figure 5.14 and illustrated<br />

also <strong>in</strong> the overview-Figure 5.16(1).<br />

Figure 5.14: Subspace group view for the 12D synthetic data set with six subspace groups.<br />

The representative subspaces <strong>of</strong> each group are each visualized by an MDS plot, and<br />

shown side-by-side. A dimension histogram on top <strong>of</strong> each <strong>in</strong>dicates the distribution<br />

<strong>of</strong> dimensions conta<strong>in</strong>ed by the subspaces <strong>in</strong> that group, where the length <strong>of</strong> the bar<br />

encodes the frequency <strong>of</strong> the respective dimension. The last bar encodes the percentage <strong>of</strong><br />

subspaces conta<strong>in</strong>ed <strong>in</strong> this group. It is colored <strong>in</strong> orange to be easily dist<strong>in</strong>guished from<br />

the others.<br />

Each group <strong>of</strong> subspaces from the preced<strong>in</strong>g view can be expanded and its member<br />

subspaces can be seen and compared <strong>in</strong> detail (as Figure 5.16(5) illustrates). This allows a<br />

better understand<strong>in</strong>g <strong>of</strong> the current similarity threshold, and allows to expand or further<br />

collapse the group structure based on visually perceived similarity between subspaces. The<br />

user can <strong>in</strong>vestigate how similar the distribution <strong>of</strong> dimensions is among di erent groups<br />

<strong>of</strong> subspaces. To this end, a click on the dimension histogram icon <strong>of</strong> one particular group<br />

will cross-highlight the dimensions <strong>of</strong> the selected group that are also conta<strong>in</strong>ed by other<br />

clusters. In this example the dimension glyph <strong>of</strong> the green group has been clicked. In summary,<br />

the subspace group view allows a global comparison <strong>of</strong> non-redundant subspaces and<br />

their similarities concern<strong>in</strong>g the conta<strong>in</strong>ed data topology.<br />

Dimension-based subspace similarity view<br />

We also support the comparative analysis <strong>of</strong> all subspaces based on their similarity regard<strong>in</strong>g<br />

the set <strong>of</strong> active dimensions. Consequently a global MDS layout, based on the<br />

Tanimoto distances between the subspaces, as described at the beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> this section, is<br />

generated. Figure 5.15 (respective Figure 5.16(4)) illustrates the subspace similarity view.<br />

For a high number <strong>of</strong> subspaces, this view can only provide an impression <strong>of</strong> the similarity<br />

relationships but by zoom<strong>in</strong>g more details become visible. The agglomerative group<strong>in</strong>g


5.2.3 Proposed Analytical Workflow 119<br />

based on the topologic distance function could be used to reduce the number <strong>of</strong> displayed<br />

subspaces <strong>in</strong> this view. The subspace group view (based on data topology distance) and<br />

Figure 5.15: Dimension-based subspace similarity MDS view <strong>of</strong> the 296 subspaces selected by the<br />

subspace search algorithm.<br />

dimension-similarity view (based on Tanimoto distance) are l<strong>in</strong>ked by color-cod<strong>in</strong>g (outer<br />

frame color<strong>in</strong>g). Thereby, we can compare the similarity <strong>of</strong> subspaces by their topological<br />

and dimension-overlap-based similarity.<br />

Additional views and cluster comparison support<br />

We also <strong>in</strong>tegrated details-on-demand for each subspace by a parallel coord<strong>in</strong>ates view<br />

(Figures 5.12(3) and 5.16(3) illustrate). <strong>High</strong>light<strong>in</strong>g conta<strong>in</strong>ed dimensions helps to understand<br />

the di erence <strong>of</strong> the subspaces <strong>in</strong> more detail. The subspace dimensions are the<br />

first dimensions <strong>of</strong> the parallel coord<strong>in</strong>ates view and highlighted. The others are added <strong>in</strong><br />

a random way, <strong>in</strong> a lighter gray. This enables the comparison to the rest <strong>of</strong> the data set,<br />

and understand<strong>in</strong>g the distribution <strong>of</strong> the subspace dimensions, compared to the rest <strong>of</strong><br />

the data.<br />

Furthermore, <strong>in</strong>teractive exploration <strong>of</strong> the subspaces is enhanced by a s<strong>in</strong>gle subspace<br />

view, provid<strong>in</strong>g an enlarged view <strong>of</strong> a selected subspace scatterplot (Figures 5.12(2) and


120 Chapter 5. <strong>Visual</strong> Subspace Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

5.16(2) illustrate this). This view also allows to manually select clusters <strong>of</strong> objects by a<br />

lasso tool. Cross-color<strong>in</strong>g <strong>of</strong> the selected po<strong>in</strong>ts among the other subspaces and with<strong>in</strong> the<br />

parallel coord<strong>in</strong>ates plot thus allows comparative exploration <strong>of</strong> group<strong>in</strong>g structures – a<br />

core problem <strong>in</strong> mak<strong>in</strong>g e ective use <strong>of</strong> alternative subspaces.<br />

1<br />

4<br />

5<br />

6<br />

2 3<br />

Figure 5.16: All l<strong>in</strong>ked views: (1) Subspace group view for the 12D synthetic data set with six<br />

subspace groups. (2) S<strong>in</strong>gle subspace view show<strong>in</strong>g the representative subspace for the first group.<br />

(3) Details-on-demand <strong>in</strong> the parallel coord<strong>in</strong>ates view for the selected subspace. (4) The MDS<br />

layout <strong>of</strong> the subspace search results based on their dimension similarity. (5) Group detail view for<br />

the three (orange, green, purple) subspace groups. (6) Hierarchical navigation buttons.<br />

5.2.4 Application<br />

We now demonstrate the analytical capabilities <strong>of</strong> our proposed approach by application to<br />

synthetic and real world data <strong>in</strong> two scenarios. This two scenarios have di erent purposes.<br />

First, we use synthetic data as a pro<strong>of</strong> <strong>of</strong> concept and exemplify the suggested workflow.<br />

We show how that relevant subspaces can conveniently be identified. Then, we describe<br />

an explorative sett<strong>in</strong>g <strong>in</strong> which <strong>in</strong>terest<strong>in</strong>g f<strong>in</strong>d<strong>in</strong>gs <strong>in</strong> alternative subspaces <strong>of</strong> a real world<br />

data set are obta<strong>in</strong>ed.<br />

Application Scenario 1: Synthetic <strong>Data</strong><br />

To show the power <strong>of</strong> the proposed approach, we used a 750 record sample <strong>of</strong> the first<br />

12D synthetic data set presented <strong>in</strong> [52] (data set No. 2). This data set consists <strong>of</strong> four<br />

3D Gaussian clusters and two 6D Gaussian clusters. The rema<strong>in</strong><strong>in</strong>g dimensions conta<strong>in</strong><br />

uniformly distributed random noise. The first step <strong>of</strong> our approach is to determ<strong>in</strong>e the<br />

<strong>in</strong>terest<strong>in</strong>g subspaces <strong>of</strong> the high-dimensional data set, by runn<strong>in</strong>g automatic subspace<br />

search us<strong>in</strong>g SURFING (see Section 5.2.3). This subspace search returns a total <strong>of</strong> 296


5.2.4 Application 121<br />

subspaces identified as <strong>in</strong>terest<strong>in</strong>g, out <strong>of</strong> the 4095 possible subspaces. To get a first<br />

impression <strong>of</strong> these subspaces, we use the l<strong>in</strong>early sorted view <strong>of</strong> subspaces shown <strong>in</strong><br />

Figure 5.12, rely<strong>in</strong>g on MDS representations <strong>of</strong> the data <strong>in</strong> the subspaces, and sorted by<br />

the <strong>in</strong>terest<strong>in</strong>gness score <strong>in</strong> decreas<strong>in</strong>g order.<br />

The view shows the diversity <strong>of</strong> subspaces identified dur<strong>in</strong>g the automatic step. The<br />

first elements <strong>in</strong> the first row <strong>of</strong> the view are very similar <strong>in</strong> terms <strong>of</strong> the po<strong>in</strong>t distribution<br />

(show<strong>in</strong>g mostly scattered and spherical po<strong>in</strong>t distributions). However, at later positions,<br />

we also see other varieties <strong>of</strong> po<strong>in</strong>t distributions, <strong>in</strong>clud<strong>in</strong>g parallel stripe patterns, and<br />

stripes mixed with spherical patterns. In a normal (non-visual) analysis case, rely<strong>in</strong>g just<br />

on the subspaces ranked top by the <strong>in</strong>terest<strong>in</strong>gness score, the analyst might miss some <strong>of</strong><br />

these di erent characteristics <strong>of</strong> the subspaces.<br />

Judg<strong>in</strong>g by the shape <strong>of</strong> the MDS projection representations, the overview also confirms<br />

that the subspace search did return a lot <strong>of</strong> redundant subspaces. The next step is therefore<br />

to group the subspaces accord<strong>in</strong>g to their similarity, allow<strong>in</strong>g the user to abstract to a<br />

smaller number <strong>of</strong> relevant subspaces to compare them <strong>in</strong> detail. We used our similarity<br />

function based on the data topology, creat<strong>in</strong>g a hierarchal agglomerative cluster<strong>in</strong>g us<strong>in</strong>g<br />

Ward’s m<strong>in</strong>imum variance method [144]. We found that this method turned out to show<br />

good results, <strong>in</strong> terms <strong>of</strong> provid<strong>in</strong>g clusters <strong>of</strong> subspaces that discrim<strong>in</strong>ate well from each<br />

other. The obta<strong>in</strong>ed cluster<strong>in</strong>g dendrogram has been shown <strong>in</strong> Figure 5.13. By sett<strong>in</strong>g a<br />

similarity threshold, Figure 5.16(1) shows that the number <strong>of</strong> subspaces can be reduced<br />

considerably <strong>in</strong> a mean<strong>in</strong>gful way by the user. The navigation buttons, as shown <strong>in</strong><br />

Figure 5.16(6), allow the user to move through each dendrogram level and to f<strong>in</strong>d the<br />

desired level <strong>of</strong> redundancy. Here the dendrogram was cut at 0.73 (value range (0,1)).<br />

As a result, six groups are found and visualized by their representatives. The number <strong>of</strong><br />

groups can be variated, by select<strong>in</strong>g di erent similarity levels <strong>in</strong> the dendrogram hierarchy.<br />

For this data we quickly found that six groups is the right level <strong>of</strong> detail for our further<br />

<strong>in</strong>vestigation.<br />

We <strong>in</strong>vestigate the components <strong>of</strong> each group <strong>of</strong> subspaces <strong>in</strong> more detail. Figure 5.16(5)<br />

shows the group detail view <strong>of</strong> the orange, green, and purple subspace groups as framed<br />

<strong>in</strong> Figure 5.16(1). Topologically similar subspaces are grouped together. In this way, the<br />

analyst is given an overview <strong>of</strong> the exist<strong>in</strong>g groups and, if needed, can further compare<br />

<strong>in</strong>dividual group components.<br />

On top <strong>of</strong> the scatterplots, a dimension histogram is <strong>in</strong>dicat<strong>in</strong>g the distribution <strong>of</strong> dimensions<br />

for each group. The last bar <strong>of</strong> the histogram is marked <strong>in</strong> orange and represents<br />

the percentage <strong>of</strong> subspaces conta<strong>in</strong>ed <strong>in</strong> this group. It is scaled logarithmically, so that<br />

this bar is also visible for groups with few elements. A click on the dimension histogram<br />

<strong>of</strong> one group representative highlights its dimensions <strong>in</strong> all the other representatives. In<br />

Figure 5.16(1) (enlarged <strong>in</strong> Figure 5.14) the green group was clicked. To understand why<br />

the green- and gray-framed groups are split, we can consult the additional view <strong>in</strong> Figure<br />

5.16(4). It shows an MDS layout <strong>of</strong> all <strong>in</strong>terest<strong>in</strong>g subspaces based on the dimension<br />

overlap (Tanimoto) similarity. In this view closeness <strong>of</strong> two subspaces corresponds to dimension<br />

similarity. We see that the green- and gray-framed cluster groups are located<br />

on the far left side <strong>in</strong> the plot. This shows us that the subspaces are similar <strong>in</strong> terms <strong>of</strong><br />

dimensions, but be<strong>in</strong>g <strong>in</strong> di erent groups, they must show di erent topological similarity<br />

accord<strong>in</strong>g to our similarity measure. The reason is that all the subspaces <strong>of</strong> the grayframed<br />

group conta<strong>in</strong> dimension d12, while none <strong>of</strong> the subspaces <strong>in</strong> the green-framed<br />

group conta<strong>in</strong> this dimension, which is visible by the bars <strong>in</strong> the dimension histogram <strong>of</strong><br />

the gray-framed group (see Figure 5.16(1)). As it is not highlighted, it is not conta<strong>in</strong>ed <strong>in</strong>


122 Chapter 5. <strong>Visual</strong> Subspace Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

the marked green-framed group, and obviously this dimension is responsible for a di erent<br />

data distribution.<br />

We can also go one step further <strong>in</strong> detailed comparison <strong>of</strong> subspaces by cross-colorcod<strong>in</strong>g<br />

clusters <strong>of</strong> po<strong>in</strong>ts <strong>in</strong> the MDS representation. Our lasso tool allows the user to<br />

manually mark clusters <strong>of</strong> po<strong>in</strong>ts <strong>in</strong> the MDS subspace representation, which allows to<br />

cross-compare the group<strong>in</strong>gs among di erent subspaces. For example, we manually marked<br />

six separate clusters <strong>of</strong> po<strong>in</strong>ts <strong>in</strong> the p<strong>in</strong>k-framed subspace group (group number two <strong>in</strong><br />

Figure 5.16(1)) and assigned dist<strong>in</strong>ct colors. By analyz<strong>in</strong>g the distribution <strong>of</strong> colors among<br />

subspace group representatives, we see that other subspaces merge some <strong>of</strong> these clusters<br />

and spread others. This is also true for the purple framed group representative. The dark<br />

blue and p<strong>in</strong>k po<strong>in</strong>t cluster (the upper most <strong>in</strong> the orig<strong>in</strong>al colored subspace) are clustered<br />

<strong>in</strong> the purple subspace but some <strong>of</strong> their po<strong>in</strong>ts also became noise <strong>in</strong> this subspace.<br />

Summ<strong>in</strong>g up, we can see how our visual analytics workflow helps to deal with the<br />

extensive number <strong>of</strong> possibly <strong>in</strong>terest<strong>in</strong>g subspaces <strong>in</strong> a natural overview-first based visual<br />

analytics workflow. In a first step, the SURFING approach reduced the number <strong>of</strong><br />

subspaces <strong>of</strong> the 12 dimensional data set from 4095 to 296 <strong>in</strong>terest<strong>in</strong>g ones. S<strong>in</strong>ce this<br />

set <strong>of</strong> subspaces still showed a high redundancy, <strong>in</strong> our next step we grouped them us<strong>in</strong>g<br />

our topological similarity measure. Based on the grouped subspaces, further <strong>in</strong>vestigations<br />

could take place for compar<strong>in</strong>g the relations and distributions among po<strong>in</strong>ts <strong>of</strong> data with<strong>in</strong><br />

the subspaces.<br />

Application Scenario 2: Exploration/Discovery<br />

We will now demonstrate the exploratory functionalities <strong>of</strong> our proposed approach based on<br />

a real data set. We analyze aga<strong>in</strong> the USDA Food Composition data set 2 , a full collection<br />

<strong>of</strong> raw and processed foods characterized by their composition <strong>in</strong> terms <strong>of</strong> nutrients. The<br />

database conta<strong>in</strong>s more than 7000 records and 44 dimensions. After remov<strong>in</strong>g miss<strong>in</strong>g<br />

values and outliers, as well as normalizations, 722 records (foods) rema<strong>in</strong>ed for which we<br />

selected 18 dimensions <strong>of</strong> the data set that where <strong>in</strong>terpretable.<br />

From this <strong>in</strong>put data set, application <strong>of</strong> the SURFING algorithm returned 216 <strong>in</strong>terest<strong>in</strong>g<br />

subspaces for further exploration. To obta<strong>in</strong> a first impression <strong>of</strong> this data, we<br />

<strong>in</strong>vestigated the l<strong>in</strong>early sorted view (see Figure 5.17 for a cut-out). Many subspaces, <strong>in</strong><br />

particular those ranked with a high <strong>in</strong>terest<strong>in</strong>gness <strong>in</strong>dex, showed a rather skewed distribution<br />

<strong>of</strong> po<strong>in</strong>ts <strong>in</strong> our projection representation, concentrat<strong>in</strong>g along the edges <strong>of</strong> the<br />

diagrams. Only later <strong>in</strong> the rank<strong>in</strong>g, we observed the projections form<strong>in</strong>g out more structure<br />

that could be mean<strong>in</strong>gful. The red color framed subspace <strong>in</strong> Figure 5.17 seems to be<br />

very <strong>in</strong>terest<strong>in</strong>g, form<strong>in</strong>g long, clear stripes. With the help <strong>of</strong> the s<strong>in</strong>gle subspace view, we<br />

further <strong>in</strong>vestigated this subspace (Iron,Maganase,V it D ) by color<strong>in</strong>g each stripe with a<br />

di erent color and compared the formation <strong>of</strong> these clusters across the other subspaces.<br />

Most <strong>of</strong> them seemed to be overspread by the cyan class (see Figure 5.17 right).<br />

At the same time, it is clear that a high level <strong>of</strong> redundancy is still present, and a further<br />

group<strong>in</strong>g is deemed necessary. Therefore, we cont<strong>in</strong>ued with our next analytical step, the<br />

subspace group<strong>in</strong>g by agglomerative hierarchical cluster<strong>in</strong>g. We obta<strong>in</strong>ed di erent groups<br />

<strong>of</strong> subspaces and found out that these clearly striped clusters only appear <strong>in</strong> subspaces<br />

conta<strong>in</strong><strong>in</strong>g Vit D .<br />

We therefore reset the color<strong>in</strong>g and started a new <strong>in</strong>teractive analysis step, beg<strong>in</strong>n<strong>in</strong>g<br />

2 http://www.ars.usda.gov/


5.2.4 Application 123<br />

Figure 5.17: L<strong>in</strong>early sorted view cut-out <strong>of</strong> subspaces for the 18D USDA Food Composition data<br />

set. The full result <strong>of</strong> SURFING, consist<strong>in</strong>g <strong>of</strong> 216 subspaces. We see a rather high level <strong>of</strong><br />

redundancy. Subspaces exhibit<strong>in</strong>g more structure are found <strong>in</strong> particular at the mid and end<br />

positions <strong>in</strong> the rank<strong>in</strong>g. Rely<strong>in</strong>g only on the numerically top ranked results, we would have<br />

omitted such <strong>in</strong>terest<strong>in</strong>g cases from the analysis.<br />

with this stage <strong>of</strong> our workflow. After test<strong>in</strong>g di erent filter<strong>in</strong>g thresholds and compar<strong>in</strong>g<br />

the topological- and the dimension-based similarity relations, we obta<strong>in</strong>ed a number <strong>of</strong> 12<br />

groups, and considered this suitable for subsequent analysis (see Figure 5.19(1)).<br />

A<br />

B<br />

C<br />

D<br />

Figure 5.18: (A) Interest<strong>in</strong>g spotted subspace (Carbohydrat,Fibre) present<strong>in</strong>g two clusters.<br />

(B) Subspace (Carbohydarte,Lipid,Prote<strong>in</strong>) <strong>in</strong> the same cluster group <strong>of</strong> (A) wherethecluster<br />

structure changes. (C) Green marked third cluster <strong>in</strong> subspace from (B). (D) Subspace<br />

(Fiber,Prote<strong>in</strong>,Vit D ) <strong>of</strong> orange color-framed subspace group, where the alternative cluster<strong>in</strong>g <strong>of</strong><br />

po<strong>in</strong>ts is visible.<br />

From the reduced number <strong>of</strong> representative subspaces, one particular subspace stood<br />

out to us (see Figure 5.19(1) for the group representatives and Figure 5.18(A) for the<br />

<strong>in</strong>terest<strong>in</strong>g spotted one). This subspace shows the most structure and allows to discern<br />

two po<strong>in</strong>t clusters (p<strong>in</strong>k and blue). We selected this specific subspace group (framed<br />

brown <strong>in</strong> Figure 5.19) for further analysis. Cross-color<strong>in</strong>g is used to highlight its group<br />

components, that are shown at the bottom <strong>of</strong> the figure. It is visible that the group <strong>of</strong><br />

subspaces are topologically similar, consequently this subspace is a valid representative.<br />

In addition, we observe that there are some subspaces <strong>in</strong> this group where the cluster<strong>in</strong>g<br />

is chang<strong>in</strong>g. One example is shown <strong>in</strong> Figure 5.18(B). We assigned the green color to the<br />

outstand<strong>in</strong>g po<strong>in</strong>ts on the left side s<strong>in</strong>ce they seem to form a di erent structure. In the<br />

group view (see Figure 5.19(1)) we can see that this green cluster overspreads on five


124 Chapter 5. <strong>Visual</strong> Subspace Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

<strong>of</strong> the 12 subspace group representatives. After a closer look to the components <strong>of</strong> the<br />

orange subspace group, we spotted a sharply def<strong>in</strong>ed green cluster (see Figure 5.18(D) and<br />

highlighted <strong>in</strong> Figure 5.19(2)). By highlight<strong>in</strong>g the dimensions <strong>of</strong> the orange group, we<br />

can see that the brown group has a dom<strong>in</strong>ant dimension (Prote<strong>in</strong>) that is not conta<strong>in</strong>ed<br />

by any subspace <strong>of</strong> the orange group. We can therefore assume that this dimension is<br />

decisive for the cluster<strong>in</strong>g <strong>of</strong> the po<strong>in</strong>ts. In the dimension-based similarity view (MDS<br />

Layout <strong>in</strong> Figure 5.19(3)), the subspaces <strong>of</strong> the brown and orange groups are far apart<br />

from each other, which supports our f<strong>in</strong>d<strong>in</strong>g that the groups conta<strong>in</strong> di erent dimensions.<br />

Likewise we can see that the group components <strong>of</strong> the brown group are scattered across<br />

the MDS layout. This is due to the fact that the group subspaces are dissimilar <strong>in</strong> terms<br />

<strong>of</strong> their dimensions, but their topological similarity is dom<strong>in</strong>ated by the shared dimension<br />

(Prote<strong>in</strong>).<br />

1<br />

3<br />

2<br />

Figure 5.19: (1) Grouped view <strong>of</strong> subspaces for the 18D USDA Food Composition <strong>Data</strong> Set with 12<br />

group representatives. (2) The brown and orange group components are shown <strong>in</strong> the components<br />

view. (3) MDS Layout <strong>of</strong> the total number <strong>of</strong> subspaces with cross-colored group representatives.<br />

Summ<strong>in</strong>g up, we demonstrated how our <strong>in</strong>teractive exploratory workflow can be applied<br />

to real data. Compared to the previous scenario, the <strong>in</strong>formation about the clusters is not<br />

known <strong>in</strong> real data sets, mean<strong>in</strong>g that several <strong>in</strong>teractive attempts are needed to <strong>in</strong>vestigate<br />

the vast number <strong>of</strong> <strong>in</strong>terest<strong>in</strong>g subspaces provided by the subspace search algorithm. With<br />

the help <strong>of</strong> the topological similarity functionalities, we could group the redundant clusters<br />

and have a closer look <strong>in</strong> their topological change. Us<strong>in</strong>g the di erent l<strong>in</strong>ked views <strong>of</strong> our<br />

approach helped us to identify di erent subspaces that present alternative cluster<strong>in</strong>gs.<br />

5.2.5 Discussion and Possible Extensions<br />

We will now summarize the ma<strong>in</strong> goal <strong>of</strong> our system, and discuss limitations and possible<br />

extensions next.


5.2.5 Discussion and Possible Extensions 125<br />

Summariz<strong>in</strong>g the Ma<strong>in</strong> Goals <strong>of</strong> our Approach<br />

Our presented approach supports visual-<strong>in</strong>teractive analysis <strong>of</strong> high-dimensional data from<br />

multiple perspectives based on the notion <strong>of</strong> automatic subspace search. The core assumption<br />

for our approach is that useful <strong>in</strong>formation could be extracted <strong>in</strong> a comparative way<br />

from several di erent subspaces resid<strong>in</strong>g <strong>in</strong> a larger high-dimensional data space. This<br />

assumption is the major driv<strong>in</strong>g force beh<strong>in</strong>d subspace search and subspace cluster<strong>in</strong>g<br />

algorithms developed <strong>in</strong> the <strong>Data</strong> M<strong>in</strong><strong>in</strong>g community over the past few years. We exploit<br />

algorithmic subspace search <strong>in</strong> an encompass<strong>in</strong>g visual-<strong>in</strong>teractive system. Our approach<br />

is designed around Shneiderman’s <strong>Visual</strong> Information-Seek<strong>in</strong>g Mantra [127], applied to the<br />

problem <strong>of</strong> analyz<strong>in</strong>g potentially large sets <strong>of</strong> subspaces. Modern subspace search methods<br />

such as SURFING e ciently identify candidate subspaces that are expected to exhibit <strong>in</strong>formative<br />

structure without restriction on a specific nature <strong>of</strong> the structure. Specifically,<br />

<strong>in</strong>teractively detect<strong>in</strong>g and understand<strong>in</strong>g relevant structures <strong>in</strong> subspaces is an explicit<br />

goal <strong>of</strong> our system. Our <strong>in</strong>teractive support allows users to condense and compare subspaces,<br />

and even groups <strong>in</strong> data, whereby the analytical loop from the algorithmic search<br />

<strong>of</strong> subspaces to the sense-mak<strong>in</strong>g by the user is closed. Subspace search algorithms are<br />

very useful as a start<strong>in</strong>g po<strong>in</strong>t. S<strong>in</strong>ce the identification based on <strong>in</strong>terest<strong>in</strong>gness is performed<br />

heuristically, the search methods alone cannot solve the analytical problems at<br />

hand. To this end, capable visual-analytic systems need to be designed based on the output<br />

<strong>of</strong> the subspace search algorithm. We therefore designed, implemented, and applied<br />

an encompass<strong>in</strong>g system design based on a subspace search method (exemplarily we used<br />

SURFING). It allows to explore high-dimensional data tak<strong>in</strong>g <strong>in</strong>to account the curse <strong>of</strong><br />

dimensionality and the possibility to f<strong>in</strong>d alternative clusters <strong>in</strong> di erent subspaces.<br />

Limitations and Possible Extensions<br />

We identify the follow<strong>in</strong>g limitations and improvement opportunities for our approach:<br />

• Computational scalability<br />

We designed and tested our system around data sets <strong>of</strong> moderate high-dimensionality<br />

<strong>of</strong> tens <strong>of</strong> dimensions. For higher-dimensional data, we will have to deal with scalability<br />

issues <strong>in</strong> (1) computational complexity <strong>of</strong> the subspace search and (2) scalability<br />

<strong>of</strong> the visual representation <strong>of</strong> subspaces. Regard<strong>in</strong>g (1), the search space <strong>in</strong>creases<br />

exponentially with dimensionality. Subspace search algorithms probably need more<br />

aggressive filter<strong>in</strong>g mechanisms to keep the number <strong>of</strong> searched subspaces tractable.<br />

A dynamically adjustable threshold could be useful here. However, we still need<br />

to ensure that no relevant results are excluded. To this end, sensitivity analysis is<br />

needed.<br />

• <strong>Visual</strong> scalability<br />

Regard<strong>in</strong>g (2), also scalable visual representations are needed for higher-dimensional<br />

data. We need to scale with the number <strong>of</strong> subspaces and the representation <strong>of</strong> each<br />

subspace. Hierarchical group<strong>in</strong>g <strong>of</strong> subspaces is already <strong>in</strong>cluded <strong>in</strong> our system to<br />

scale with the number <strong>of</strong> subspaces. The l<strong>in</strong>early sorted view per se does not scale<br />

with many subspaces, yet it can be restricted to the representative subspaces obta<strong>in</strong>ed<br />

from hierarchical group<strong>in</strong>g. <strong>Visual</strong> representation <strong>of</strong> subspaces takes place by<br />

projection to show the data po<strong>in</strong>ts and an <strong>in</strong>dex view to show conta<strong>in</strong>ed dimensions.


126 Chapter 5. <strong>Visual</strong> Subspace Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong><br />

In particular, the latter will only scale for a limited number <strong>of</strong> dimensions. How<br />

to design set-oriented views to compare many sets <strong>of</strong> dimensions is a challeng<strong>in</strong>g<br />

problem that if solved, would improve our tool.<br />

• Projection-based subspace representation<br />

We currently represent the subspaces by MDS projections <strong>of</strong> the data resid<strong>in</strong>g <strong>in</strong><br />

respective subspaces. However, projection typically <strong>in</strong>duces loss <strong>in</strong> <strong>in</strong>formation, that<br />

could be <strong>in</strong>corporated <strong>in</strong> our visualization, e.g., by show<strong>in</strong>g the stress values <strong>in</strong> an<br />

overlay visualization [121]. In our experiments, MDS performed very well compared<br />

to us<strong>in</strong>g PCA. Yet, it would be <strong>in</strong>terest<strong>in</strong>g to test other projections. Also, other<br />

subspace representations besides scatterplots could be thought <strong>of</strong>, <strong>in</strong> essence similar<br />

to Value-and-Relation displays [157]. Likewise, many di erent, useful similarity<br />

notions to group and compare subspaces, such as notions based on stress measures,<br />

implicit cluster<strong>in</strong>g structures, relations to outliers, scagnostics features [151], etc.<br />

could be employed. Test<strong>in</strong>g them <strong>in</strong> di erent application doma<strong>in</strong>s is considered<br />

valuable future work. We note that our analytical approach can easily accommodate<br />

alternative subspace search algorithms, representations, and filter<strong>in</strong>g options.<br />

• Interpretable dimensions<br />

To relate subspaces and data groups <strong>in</strong> subspaces, it is important for the analyst to be<br />

aware <strong>of</strong> the mean<strong>in</strong>g <strong>of</strong> the dimensions <strong>of</strong> the respective subspace. Our <strong>in</strong>dex-based<br />

glyph does not convey <strong>in</strong>formation about the type <strong>of</strong> dimension. More semantically<br />

mean<strong>in</strong>gful dimension representations would be useful. Detail-on-demand functions<br />

could be added to help the user <strong>in</strong>terpret the <strong>in</strong>volved dimensions and properties <strong>of</strong><br />

the data po<strong>in</strong>ts more e ciently.<br />

• Def<strong>in</strong>ition <strong>of</strong> <strong>in</strong>terest<strong>in</strong>gness and sensitivity to noise<br />

Subspace search algorithms heuristically identify subspaces as <strong>in</strong>terest<strong>in</strong>g based on<br />

certa<strong>in</strong> properties <strong>of</strong> object relations. Based on the user and application, additional<br />

<strong>in</strong>terest<strong>in</strong>gness formulations are possible and should be supported. Follow<strong>in</strong>g best<br />

practices <strong>in</strong> data analysis, we have applied a data clean<strong>in</strong>g step (outlier and miss<strong>in</strong>g<br />

value removal) to our tested data before we fed it <strong>in</strong>to our system. The SURFING<br />

algorithm is not robust with respect to miss<strong>in</strong>g values, whereas it seems to be robust<br />

with respect to outliers. The orig<strong>in</strong>al paper does not discuss this aspect, and we<br />

did not further <strong>in</strong>vestigate it. The projections used to represent data distributions<br />

<strong>in</strong> subspaces are sensitive to outliers and may generate clamped distributions if not<br />

pre-processed. We postpone the analysis <strong>of</strong> this problem to future work.<br />

• Automatic support for cluster comparison<br />

Add<strong>in</strong>g automatic cluster<strong>in</strong>g <strong>of</strong> data po<strong>in</strong>ts <strong>in</strong> subspaces would be useful as a postprocess<strong>in</strong>g<br />

step. Equipped with automatic cluster<strong>in</strong>g, we can color-code the found<br />

clusters. This could lead to new visual-oriented <strong>in</strong>terest<strong>in</strong>gness measures useful for<br />

select<strong>in</strong>g <strong>in</strong>terest<strong>in</strong>g subspaces <strong>in</strong> the future. User <strong>in</strong>teraction with the subspace<br />

search output could be a useful analytical feature for ref<strong>in</strong>ement. Allow<strong>in</strong>g expert<br />

users to split or merge subspaces, or construct new subspaces by add<strong>in</strong>g or remov<strong>in</strong>g<br />

dimensions, would be one option.<br />

• Usability and user adoption<br />

Our current system design targets users with expertise <strong>in</strong> data m<strong>in</strong><strong>in</strong>g. End-user<br />

applications, e.g., <strong>in</strong> Market Segment analysis, could benefit from subspace analysis.


5.2.6 Conclusions 127<br />

However, we recognize that for end-users, the <strong>in</strong>terface <strong>of</strong> our system would need to<br />

be customized, possibly. Our experience <strong>in</strong> collaborat<strong>in</strong>g with data m<strong>in</strong><strong>in</strong>g experts<br />

showed that the tool can be useful not only for data exploration but also as an<br />

evaluation tool to assess the output generated by subspace analysis algorithms.<br />

5.2.6 Conclusions<br />

We presented an encompass<strong>in</strong>g visual-<strong>in</strong>teractive system for subspace-based analysis <strong>in</strong><br />

high-dimensional data. Subspace-based analysis can constitute a new paradigm for highdimensional<br />

data analysis s<strong>in</strong>ce <strong>in</strong>formative structures <strong>in</strong> the data can be found and compared<br />

<strong>in</strong> di erent subspaces <strong>of</strong> a larger high-dimensional <strong>in</strong>put space. We def<strong>in</strong>ed, implemented,<br />

and demonstrated an analytical workflow based on automatic subspace search. A<br />

larger set <strong>of</strong> automatically identified <strong>in</strong>terest<strong>in</strong>g subspaces is grouped for <strong>in</strong>teractive exploration<br />

by the user. A custom subspace similarity function allows for compar<strong>in</strong>g subspaces.<br />

Our approach is able to e ectively p<strong>in</strong> down several <strong>in</strong>terest<strong>in</strong>g views and helps to come<br />

up with specific f<strong>in</strong>d<strong>in</strong>gs regard<strong>in</strong>g similarities <strong>of</strong> groups <strong>in</strong> data. We discussed a set <strong>of</strong><br />

possible extensions <strong>of</strong> the system, which could be addressed as future work.


128 Chapter 5. <strong>Visual</strong> Subspace Analysis <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong>


6<br />

Conclusion and Future Work<br />

„The important th<strong>in</strong>g is not to stop question<strong>in</strong>g. Curiosity has its own reason<br />

for exist<strong>in</strong>g.”<br />

Albert E<strong>in</strong>ste<strong>in</strong><br />

Contents<br />

6.1 Summary <strong>of</strong> Contributions and Future Work . . . . . . . . . . . 129<br />

T<br />

his chapter takes a step back from the concrete presented solutions <strong>of</strong> each chapter<br />

position<strong>in</strong>g the work <strong>in</strong>to the big picture <strong>of</strong> pattern f<strong>in</strong>d<strong>in</strong>g <strong>in</strong> high-dimensional data<br />

po<strong>in</strong>t<strong>in</strong>g out the contributions <strong>of</strong> this thesis and conclud<strong>in</strong>g the work. Future work is presented<br />

here with respect to the big picture s<strong>in</strong>ce each chapter conta<strong>in</strong>s specific conclusions<br />

and identifies particular further research directions.<br />

All doma<strong>in</strong>s nowadays produce high-dimensional data sets that can hide numerous<br />

important patterns. F<strong>in</strong>d<strong>in</strong>g these patterns is a complex task, but at the same time it is<br />

<strong>of</strong> high significance. Us<strong>in</strong>g visual representation to show the numerical data, automation<br />

to compute the <strong>in</strong>terest<strong>in</strong>g elements, and <strong>in</strong>teraction to navigate the <strong>in</strong>formation spaces<br />

can help to spot the valuable patterns <strong>in</strong> the data. Due to the high-dimensionality <strong>of</strong><br />

the data, one doma<strong>in</strong> alone is not powerful enough, and s<strong>in</strong>ce the results for certa<strong>in</strong><br />

data analysis questions are previous unknown <strong>in</strong> accurateness and complexity, systems<br />

comb<strong>in</strong><strong>in</strong>g visualization, automation, and <strong>in</strong>teraction are needed to <strong>in</strong>vestigate this data.<br />

6.1 Summary <strong>of</strong> Contributions and Future Work<br />

We presented two ma<strong>in</strong> research directions to search for <strong>in</strong>terest<strong>in</strong>g patterns <strong>in</strong> highdimensional<br />

data: (1) reduc<strong>in</strong>g the data dimensionality by projections, rank<strong>in</strong>g them and<br />

visualiz<strong>in</strong>g the best and (2) look<strong>in</strong>g <strong>in</strong>to di erent subspaces <strong>of</strong> the data and spott<strong>in</strong>g the<br />

<strong>in</strong>terest<strong>in</strong>g ones for further analysis, by compar<strong>in</strong>g them <strong>in</strong> terms <strong>of</strong> dimensions, records<br />

and clusters.<br />

For (1) we presented new quality measures to judge the quality <strong>of</strong> projections automatically<br />

and a systematization <strong>of</strong> exist<strong>in</strong>g quality metrics for high-dimensional data. For our<br />

new developed quality metrics, we choose from the large spectrum <strong>of</strong> patterns correlation<br />

and cluster<strong>in</strong>g. Two di erent types <strong>of</strong> metrics were presented <strong>in</strong> Chapter 3, namely data<br />

quality metrics and image quality metrics. Both are developed for two di erent visualization<br />

techniques – scatterplots and parallel coord<strong>in</strong>ates. The new measures are applied on


130 Chapter 6. Conclusion and Future Work<br />

di erent synthetic and real data sets to demonstrate their properties. As these automatic<br />

measures should represent the user’s preference, we conducted an empirical evaluation on<br />

four state <strong>of</strong> the art measures to evaluate their correspondence to the user’s preference.<br />

This study helped us <strong>in</strong> develop<strong>in</strong>g guidel<strong>in</strong>es for further metric development. To see the<br />

big picture regard<strong>in</strong>g the lately proposed quality measures for high-dimensional data visualizations,<br />

we conducted a literature review and present <strong>in</strong> Chapter 4 a systematization<br />

<strong>of</strong> the exist<strong>in</strong>g measures, identify<strong>in</strong>g a number <strong>of</strong> characteristic factors and develop<strong>in</strong>g a<br />

quality metrics pipel<strong>in</strong>e to illustrate the process. The goal is to put the exist<strong>in</strong>g methods<br />

<strong>in</strong>to a common framework, thus eas<strong>in</strong>g the generation <strong>of</strong> new research <strong>in</strong> the field and<br />

identify<strong>in</strong>g important gaps to bridge with future research.<br />

Learn<strong>in</strong>g from the outcome <strong>of</strong> these two chapters, the follow<strong>in</strong>g ma<strong>in</strong> directions are<br />

<strong>in</strong>dicated for future research:<br />

• develop<strong>in</strong>g new quality metrics for purposes like view optimization or visual mapp<strong>in</strong>g<br />

optimization;<br />

• develop<strong>in</strong>g new quality metrics for “non-standard” visualization techniques for highdimensional<br />

data like pixel based techniques, glyphs, etc;<br />

• runn<strong>in</strong>g user evaluations to test the quality metric applicability <strong>in</strong> real world sett<strong>in</strong>gs;<br />

• us<strong>in</strong>g quality metrics to explore projection techniques’ properties like noise or rotation<br />

<strong>in</strong>variance, scalability with respect to data po<strong>in</strong>ts or data dimensions;<br />

• us<strong>in</strong>g quality metrics to select features to be used for build<strong>in</strong>g a model for data<br />

classifiers.<br />

For (2) we presented visual analytics approaches to understand the relations between<br />

subspaces that conta<strong>in</strong> important patterns. We recognize four ma<strong>in</strong> research directions to<br />

use visualization and <strong>in</strong>teraction <strong>in</strong> understand<strong>in</strong>g patterns identified by subspace algorithms:<br />

1. Interactive subspace cluster<strong>in</strong>g result exploration<br />

The probably simplest way to use visualization <strong>in</strong> conjunction with subspace algorithms<br />

is to visualize their results. The workflow <strong>in</strong> Figure 6.1 illustrates the needed<br />

steps. <strong>Data</strong> is processed by a certa<strong>in</strong> subspace algorithm and the results are visualized<br />

provid<strong>in</strong>g <strong>in</strong>teractive facilities to explore the result.<br />

HD <strong>Data</strong><br />

Subspace<br />

Cluster<strong>in</strong>g<br />

<strong>Visual</strong>ization<br />

Figure 6.1: Interactive exploration <strong>of</strong> subspace cluster<strong>in</strong>g results.<br />

In Section 5.1 we presented ClustNails, a tool to visualize subspace cluster<strong>in</strong>g results,<br />

and support comparison <strong>of</strong> di erent subspace clusters regard<strong>in</strong>g their data<br />

distribution and dimension overlap. We proposed order<strong>in</strong>g strategies for dimensions<br />

and clusters to ease the cluster comparison. Brush<strong>in</strong>g and l<strong>in</strong>k<strong>in</strong>g <strong>of</strong> dimensions and<br />

clusters support the exploration.


6.1. Summary <strong>of</strong> Contributions and Future Work 131<br />

2. Interactive subspace search result exploration<br />

One problem for all the subspace cluster<strong>in</strong>g algorithms is the exponential number <strong>of</strong><br />

exist<strong>in</strong>g subspaces. To address this issue, we decoupled the process <strong>of</strong> cluster<strong>in</strong>g and<br />

subspace search, to restrict the number <strong>of</strong> <strong>in</strong>terest<strong>in</strong>g subspaces that are <strong>in</strong>spected<br />

for clusters (see Figure 6.2).<br />

HD <strong>Data</strong><br />

Subspace<br />

Search<br />

<strong>Visual</strong>ization<br />

Cluster<strong>in</strong>g<br />

Figure 6.2: Interactive exploration <strong>of</strong> subspace search results.<br />

In Section 5.2 (see Figure 6.2 for the specific workflow) we addressed this research<br />

direction and presented a subspace search approach to identify alternative views<br />

(valid groups <strong>of</strong> clusters) <strong>in</strong> the data. We first run a subspace search algorithm<br />

on the data to identify possible <strong>in</strong>terest<strong>in</strong>g subspaces. Then we visualize them for<br />

a better understand<strong>in</strong>g. Given the high redundancy <strong>of</strong> spotted subspaces, due to<br />

the high number <strong>of</strong> shared dimensions, group<strong>in</strong>g and filter<strong>in</strong>g functions, based on<br />

topological and dimension similarities are proposed. The groups can be navigated<br />

and <strong>in</strong>teractive lasso tools are available to manually mark clusters. Di erent views<br />

can be identified by compar<strong>in</strong>g di erent subspaces accord<strong>in</strong>g to the marked clusters.<br />

For further work automatic cluster<strong>in</strong>g can be used to <strong>in</strong>crease the number <strong>of</strong> di erent<br />

identified views.<br />

3. <strong>Visual</strong> comparison <strong>of</strong> subspace cluster<strong>in</strong>g results.<br />

Subspace<br />

Cluster<strong>in</strong>g<br />

HD <strong>Data</strong><br />

Subspace<br />

Cluster<strong>in</strong>g<br />

Compar<strong>in</strong>g<br />

<strong>Visual</strong>ization<br />

...<br />

Subspace<br />

Cluster<strong>in</strong>g<br />

Figure 6.3: <strong>Visual</strong> comparison <strong>of</strong> subspace cluster<strong>in</strong>g results us<strong>in</strong>g visualization.<br />

One further research direction would be to use visualization to compare di erent<br />

results <strong>of</strong> subspace algorithms (see Figure 6.3). Di erent results can be obta<strong>in</strong>ed<br />

either by runn<strong>in</strong>g di erent subspace cluster<strong>in</strong>g algorithms on the same data set, or<br />

one algorithm with di erent parameter sett<strong>in</strong>gs. <strong>Visual</strong>ization can then be used to:<br />

• identify what we call a “common sense cluster<strong>in</strong>g” – mean<strong>in</strong>g, clusters that will<br />

probably pop out <strong>in</strong>dependent on the algorithm or parameters;


132 Chapter 6. Conclusion and Future Work<br />

• compare di erent cluster<strong>in</strong>g results, and identify the role <strong>of</strong> parameters for<br />

specific algorithms;<br />

• the user feedback by look<strong>in</strong>g at the visualization can be <strong>in</strong>tegrated <strong>in</strong>to the computation<br />

<strong>of</strong> clusters. Clusters can be labeled with user’s preference (like/dislike),<br />

merged or alternatives to selected cluster can be computed.<br />

4. <strong>Visual</strong>ly-assisted <strong>in</strong>-l<strong>in</strong>e steer<strong>in</strong>g <strong>of</strong> subspace cluster<strong>in</strong>g.<br />

HD <strong>Data</strong><br />

Subspace<br />

Cluster<strong>in</strong>g<br />

<strong>in</strong>termediate<br />

result<br />

feedback<br />

<strong>Visual</strong>ization<br />

<strong>Visual</strong>ization<br />

Figure 6.4: <strong>Visual</strong>-assisted <strong>in</strong>-l<strong>in</strong>e steer<strong>in</strong>g <strong>of</strong> subspace cluster<strong>in</strong>g.<br />

Another research direction, and probably the most complex one, could be an <strong>in</strong>termediate<br />

use <strong>of</strong> visualization and user feedback <strong>in</strong>to the algorithmic process (see<br />

Figure 6.4). This is like open<strong>in</strong>g the box <strong>of</strong> subspace cluster<strong>in</strong>g and provid<strong>in</strong>g a<br />

steerable cluster<strong>in</strong>g tool. The algorithm can be <strong>in</strong>terrupted and <strong>in</strong>termediary results<br />

could be visualized. User’s preference can be <strong>in</strong>tegrated <strong>in</strong> a feedback loop to steer<br />

the algorithm.<br />

In conclusion, we can say that the complexity and amount <strong>of</strong> high-dimensional data<br />

requires a comb<strong>in</strong>ation <strong>of</strong> the strengths <strong>of</strong> visualization, automation and <strong>in</strong>teraction to<br />

discover <strong>in</strong>terest<strong>in</strong>g, unknown facets <strong>of</strong> these data sets. This thesis has addressed some<br />

<strong>of</strong> the relevant research questions <strong>in</strong> this field. At the same time, new questions arose<br />

that will hopefully motivate other researchers to develop applicable solutions <strong>in</strong> the near<br />

future.


List <strong>of</strong> Figures<br />

1.1 Multiple valid and <strong>in</strong>terest<strong>in</strong>g group<strong>in</strong>gs <strong>of</strong> a high-dimensional data set [104]. 3<br />

1.2 Schematic overview <strong>of</strong> the <strong>in</strong>terrelation <strong>of</strong> chapters <strong>in</strong> this thesis. . . . . . . 7<br />

2.1 <strong>High</strong>-dimensional visualization techniques taken from [145]. A: Scatterplot<br />

matrix show<strong>in</strong>g on the diagonal a histogram plot for each dimension. Selected<br />

po<strong>in</strong>ts are marked <strong>in</strong> red <strong>in</strong> all plots. B: Parallel coord<strong>in</strong>ates plot <strong>of</strong><br />

a seven-dimensional data set. One polyl<strong>in</strong>e represent<strong>in</strong>g one data po<strong>in</strong>t is<br />

highlighted <strong>in</strong> red. C: Star glyphs <strong>in</strong> a MDS layout. D: Dense pixel displays<br />

represent<strong>in</strong>g a 14-dimensional data set. . . . . . . . . . . . . . . . . . . . . . 14<br />

2.2 (A) Scagnostics SPLOM hav<strong>in</strong>g as axes scagnostics measures and show<strong>in</strong>g<br />

each data scatterplot as a po<strong>in</strong>t <strong>in</strong> the measures scatterplot [152]. (B)<br />

Scagnostics <strong>in</strong>dices used as quality measures to rank data scatterplots [152]. 21<br />

2.3 <strong>Visual</strong> <strong>in</strong>teractive feature selection systems. A: Rank-by-Feature Framework<br />

presented <strong>in</strong> [125]. B: Feature selection supported by quality measures<br />

[82]. C: DimStiller for feature selection [76]. . . . . . . . . . . . . . . . 23<br />

2.4 Interactive visual analysis systems for cluster<strong>in</strong>g <strong>in</strong> high-dimensional visualization.<br />

A: Interactive exploration <strong>of</strong> hierarchically clustered data along<br />

a dendrogram [124]. B: (a) Group<strong>in</strong>g icons to form clusters based on visual<br />

similarity. (b) User-def<strong>in</strong>ed group<strong>in</strong>g <strong>of</strong> icons [35]. . . . . . . . . . . . . . . . 24<br />

2.5 Interactive visual analysis systems for classification <strong>in</strong> high-dimensional<br />

data. A: <strong>Visual</strong> classification from [11] illustrates the decision tree for DNA<br />

tra<strong>in</strong><strong>in</strong>g data hav<strong>in</strong>g 19 attributes, visualiz<strong>in</strong>g each attribute-value by a<br />

colored pixel arranged <strong>in</strong> bars. B: Decision tree construction system [142],<br />

represent<strong>in</strong>g the tree <strong>in</strong> a node-l<strong>in</strong>k diagram, display<strong>in</strong>g split po<strong>in</strong>ts on the<br />

l<strong>in</strong>ks and the split attributes on the node. . . . . . . . . . . . . . . . . . . . 25<br />

2.6 (a) VISA system [14]. Left: MDS projection for the global view <strong>of</strong> clusters.<br />

Right: Matrix <strong>of</strong> subspace clusters for <strong>in</strong>-depth view. (b) Heidi Matrix [141]<br />

over a subspace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br />

2.7 <strong>Visual</strong>ization techniques applied <strong>in</strong> Ferdosi’s work [52]. Left: 1D subspace.<br />

Middle: 2D subspace. Right: Subspace with 3 or more dimensions. . . . . . 28<br />

3.1 Work<strong>in</strong>g steps for us<strong>in</strong>g quality metrics to rank high-dimensional visualizations<br />

accord<strong>in</strong>g to a given task. . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />

3.2 Scatterplot example and its respective density image. For each pixel we<br />

compute the mass distribution along di erent directions and save the smallest<br />

value, here depicted by the blue l<strong>in</strong>e. . . . . . . . . . . . . . . . . . . . . 33<br />

3.3 2D view and rotated projection axes. The projection on the rotated plane<br />

has less overlap, and the structures <strong>of</strong> the data can be seen even <strong>in</strong> the<br />

projection. This is not possible for a projection on the orig<strong>in</strong>al axes. . . . . 36<br />

3.4 First step <strong>of</strong> the HDM approach: each plot is ranked for di erent rotations<br />

with the 1D-HDM. The best measure value is taken for the plot. . . . . . . 37<br />

3.5 Second step <strong>of</strong> the HDM approach: PCA is computed on the k best selected<br />

dimensions and on all the possible subsets greater than 3 dimensions. The<br />

first two components are plotted <strong>in</strong> scatterplots, that are ranked with the<br />

2D-HDM. The best measure value <strong>in</strong>dicates the best scatterplot where the<br />

class <strong>in</strong>formation is separated. . . . . . . . . . . . . . . . . . . . . . . . . . . 37


134 List <strong>of</strong> Figures<br />

3.6 Synthetic examples <strong>of</strong> parallel coord<strong>in</strong>ates and their respective Hough spaces:<br />

(a) presents two well def<strong>in</strong>ed l<strong>in</strong>e clusters and is more <strong>in</strong>terest<strong>in</strong>g for the<br />

cluster identification task than (b), where no l<strong>in</strong>e cluster can be identified.<br />

Note that the bright areas <strong>in</strong> the fl◊-plane represent the clusters <strong>of</strong> l<strong>in</strong>es<br />

with similar fl and ◊. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39<br />

3.7 Results for the Park<strong>in</strong>son’s Disease data set us<strong>in</strong>g our RVM measure (Section<br />

3.1.2). While clumpy low-correlation bear<strong>in</strong>g views are punished (bottom<br />

row), views conta<strong>in</strong><strong>in</strong>g higher correlation between the variables are<br />

preferred (top row). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42<br />

3.8 Results for the Olives data set us<strong>in</strong>g our CDM measure (Section 3.1.3).<br />

The di erent colors depict the di erent classes (regions) <strong>of</strong> the data set.<br />

While it is impossible for this data set to f<strong>in</strong>d views completely separat<strong>in</strong>g<br />

all classes, our CDM measure still found views where most <strong>of</strong> the classes<br />

are mutually separated (top row). In the worst ranked views the classes<br />

clearly overlap with each other (bottom row). . . . . . . . . . . . . . . . . . 43<br />

3.9 Results for the Olives data set us<strong>in</strong>g our HDM measure (Section 3.1.3). The<br />

best ranked plot is the PCA <strong>of</strong> dim(4,5,8) reveal<strong>in</strong>g a good view on all the<br />

classes, the second best is the PCA <strong>of</strong> dim(1,2,4) and the third is the PCA<br />

on all 8 dimensions. The di erences between the last two are small because<br />

the variance <strong>in</strong> that additional dimensions for the 3rd eigenvector relative<br />

to the 2nd, is not big. The di erence between the last two views and the<br />

first view is clearly visible (e.g. look<strong>in</strong>g at the yellow class). . . . . . . . . 43<br />

3.10 Results for the W<strong>in</strong>e data set us<strong>in</strong>g our CSM measure (Section 3.1.3). The<br />

best ranked plots present a large distance between the centers <strong>of</strong> the class<br />

clusters while the worst ranked views show only cluttered data. . . . . . . . 44<br />

3.11 Results for the W<strong>in</strong>e data set us<strong>in</strong>g our CDM measure (Section 3.1.3). Note<br />

that the second best ranked view, (dim1,dim7) (with CDM = 89), is not<br />

considered good us<strong>in</strong>g the CSM measure (CSM = 58). . . . . . . . . . . . . 45<br />

3.12 Results on the WDBC data set for the RVM (top) and the CDM (bottom).<br />

In this example, views with a quality value <strong>of</strong> less than 0.95 have been<br />

faded out. This way many irrelevant views can be faded out reduc<strong>in</strong>g the<br />

number <strong>of</strong> the plots to be <strong>in</strong>spected by the user <strong>in</strong> more detail to a better<br />

manageable number. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46<br />

3.13 Results for the non-classified version <strong>of</strong> the Park<strong>in</strong>sons Disease data set.<br />

Best and worst ranked visualizations us<strong>in</strong>g our HSM measure for nonclassified<br />

data (ref. Section 3.1.4). Top row: The three best ranked visualizations<br />

and their respective normalized measures. Well def<strong>in</strong>ed clusters<br />

<strong>in</strong> the data set are favored. Bottom row: The three worst ranked visualizations.<br />

The large amount <strong>of</strong> spread exacerbates <strong>in</strong>terpretation. Note<br />

that the user task related to this measure is not to f<strong>in</strong>d possible correlation<br />

between the dimensions but to detect good separated clusters. . . . . . . . 47<br />

3.14 Results <strong>of</strong> the SM for the Cars data set. Cars us<strong>in</strong>g benz<strong>in</strong>e are shown <strong>in</strong><br />

black, diesel <strong>in</strong> red. Best and worst ranked visualizations us<strong>in</strong>g our Hough<br />

Similarity Measure (Section 3.1.5) for parallel coord<strong>in</strong>ates. Top row: The<br />

three best ranked visualizations and their respective normalized measures.<br />

Bottom row: The three worst ranked visualizations. . . . . . . . . . . . . . 48


List <strong>of</strong> Figures 135<br />

3.15 Results <strong>of</strong> the OM for the WDBC data set. Malign nuclei are colored black<br />

while healthy nuclei are red. Best and worst ranked visualizations us<strong>in</strong>g<br />

our Overlap Measure (Section 3.1.5) for parallel coord<strong>in</strong>ates. Top row: The<br />

three best ranked visualizations. Despite good similarity, which are similar<br />

to clusters, visualizations are favored that m<strong>in</strong>imize the overlap between the<br />

classes, so that the di erence between malign and benign cells becomes more<br />

clear. Bottom row: The three worst ranked visualizations. The overlap <strong>of</strong><br />

the data complicates the analysis and the <strong>in</strong>formation is useless for the task<br />

<strong>of</strong> discrim<strong>in</strong>at<strong>in</strong>g malign and benign cells. . . . . . . . . . . . . . . . . . . . 48<br />

3.16 Results <strong>of</strong> the HSM for the synthetic data set from [82] present<strong>in</strong>g the best<br />

and worst ranked visualizations us<strong>in</strong>g our HSM measure for non-classified<br />

data (ref. Section 3.1.4). Top row: The three best ranked visualizations and<br />

their respective normalized measures. Well def<strong>in</strong>ed clusters <strong>in</strong> the data set<br />

are favored. Bottom row: The three worst ranked visualizations. The large<br />

amount <strong>of</strong> spread exacerbates <strong>in</strong>terpretation. Note that the user task related<br />

to this measure is not to f<strong>in</strong>d high correlation between the dimensions but<br />

to detect good separated clusters. . . . . . . . . . . . . . . . . . . . . . . . . 49<br />

3.17 Matrix for the synthetical data set with scatterplots above the ma<strong>in</strong> diagonal<br />

and parallel coord<strong>in</strong>ate plots bellow. . . . . . . . . . . . . . . . . . . . 50<br />

3.18 Results <strong>of</strong> the 7 measures for classified and unclassified data. The left<br />

column shows the result for the scatterplot measures and the right column<br />

for the parallel coord<strong>in</strong>ates measures. The ranks are sorted decreas<strong>in</strong>g and<br />

the target patterns are marked with red crosses. . . . . . . . . . . . . . . . 51<br />

3.19 Scatterplot <strong>of</strong> the first two components <strong>of</strong> the PCA over dimensions 2, 5<br />

and 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52<br />

3.20 Projections <strong>of</strong> scatterplots used <strong>in</strong> the experiment. Participants had to<br />

select the best five projections and order them by their quality. The order<br />

<strong>of</strong> the scatterplots was permuted for each participant separately us<strong>in</strong>g the<br />

Lat<strong>in</strong>-Square method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59<br />

3.21 Correlation <strong>of</strong> measures with users’ classification shows highest R 2 values<br />

for the 2D-HDM measure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60<br />

3.22 Correlation <strong>of</strong> measures with users’ classification for highest and one lowest<br />

quality projection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61<br />

3.23 Surpris<strong>in</strong>g study results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61<br />

4.1 (Top row <strong>of</strong> Figure 3.8) Rank<strong>in</strong>g projections accord<strong>in</strong>g to the Class Density<br />

Measure, favor<strong>in</strong>g projections with m<strong>in</strong>imal overlap between predef<strong>in</strong>ed<br />

classes (i.e., the colors) [133]. . . . . . . . . . . . . . . . . . . . . . . . . . . 68<br />

4.2 Clutter reduction achieved through axes reorder<strong>in</strong>g <strong>in</strong> a scatterplot matrix<br />

(<strong>in</strong>itial visualization on the left, reordered on the right) [112]. . . . . . . . . 68<br />

4.3 <strong>Data</strong> abstraction algorithm based on sampl<strong>in</strong>g, aim<strong>in</strong>g at reduc<strong>in</strong>g data size<br />

while preserv<strong>in</strong>g relevant patterns. Orig<strong>in</strong>al visualization on the left with<br />

16384 data items. Sampled visualization on the right with 987 items and a<br />

visual quality <strong>of</strong> 0.95 [80]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69


136 List <strong>of</strong> Figures<br />

4.4 Quality metrics pipel<strong>in</strong>e. The pipel<strong>in</strong>e provides an additional layer named<br />

quality metrics base automation on top <strong>of</strong> the traditional <strong>in</strong>formation visualization<br />

pipel<strong>in</strong>e [36]. The layer obta<strong>in</strong>s <strong>in</strong>formation from the stages <strong>of</strong> the<br />

pipel<strong>in</strong>e (the boxes) and <strong>in</strong>fluences the processes <strong>of</strong> the pipel<strong>in</strong>e through the<br />

metrics it calculates. The user is always <strong>in</strong> control. . . . . . . . . . . . . . . 72<br />

4.5 Mapp<strong>in</strong>g a 10 dimensional data set to a scatterplot with four visual primitives<br />

(x-axis, y-axis, size, and color) has over 5000 possible alternative<br />

mapp<strong>in</strong>gs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72<br />

4.6 Quality metrics pipel<strong>in</strong>e for the first example from [133]: (A) generation <strong>of</strong><br />

alternatives; (B) evaluation <strong>of</strong> alternatives (image space); (C) creation <strong>of</strong><br />

the f<strong>in</strong>al representation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79<br />

4.7 Interactive chart to select number <strong>of</strong> dimensions to keep vs. <strong>in</strong>formation<br />

loss [82]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80<br />

4.8 Top: best order<strong>in</strong>g to enhance cluster<strong>in</strong>g. Bottom: best order<strong>in</strong>g to enhance<br />

correlation [82]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80<br />

4.9 Quality metrics pipel<strong>in</strong>e for the second example from [82]: (A) dimensions<br />

ranked by their importance; (B) selection <strong>of</strong> number <strong>of</strong> dimensions to reta<strong>in</strong><br />

vs. <strong>in</strong>formation loss; (C) creation <strong>of</strong> the f<strong>in</strong>al mapp<strong>in</strong>g with order<strong>in</strong>g. . . . . 81<br />

4.10 <strong>Visual</strong> abstraction <strong>of</strong> a scatterplot matrix from [42]. . . . . . . . . . . . . . 81<br />

4.11 Quality metrics pipel<strong>in</strong>e for example three from [42]: (A) data features compared<br />

between the orig<strong>in</strong>al data and the abstracted data; (B) <strong>in</strong>stantiation<br />

<strong>of</strong> the desired abstraction level guided by quality metrics. . . . . . . . . . . 82<br />

4.12 <strong>Visual</strong> abstraction chart with threshold sett<strong>in</strong>g for the abstraction level and<br />

feedback on abstraction quality [42]. . . . . . . . . . . . . . . . . . . . . . . 82<br />

4.13 Left: star glyphs represent<strong>in</strong>g orig<strong>in</strong>al data set. Right: visualized data after<br />

DOSFA was applied [158]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83<br />

4.14 Quality metrics pipel<strong>in</strong>e for example four from [158]: (A) construct hierarchical<br />

structure <strong>of</strong> dimensions by cluster<strong>in</strong>g; (B) filter dimensions by<br />

similarity and importance; (C) map dimensions order<strong>in</strong>g to visualization;<br />

(D) <strong>in</strong>fluence the view accord<strong>in</strong>g to the quality measured (spac<strong>in</strong>g the parallel<br />

coord<strong>in</strong>ates accord<strong>in</strong>g to their similarity). The user can steer all these<br />

steps, after <strong>in</strong>teract<strong>in</strong>g with the clustered dimensions showed <strong>in</strong> an Inter-<br />

R<strong>in</strong>g visualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83<br />

4.15 Taxonomy <strong>of</strong> factors <strong>in</strong> visual cluster separation, where factor axes are<br />

marked to show the ranges where exist<strong>in</strong>g measures are successful; gaps<br />

represent failure cases. The centroid measure (CDM) is marked <strong>in</strong> blue and<br />

the grid (2D-HDM) is marked <strong>in</strong> red. All positions are approximate estimates.<br />

Marked along the factor axes are six data sets that are exemplified<br />

<strong>in</strong> the paper. (Used with permission by [122].) . . . . . . . . . . . . . . . . 88<br />

4.16 A taxonomy <strong>of</strong> data characteristics with respect to class separation <strong>in</strong> scatterplots.<br />

Some factors are organized as axes (arrows) while others are<br />

b<strong>in</strong>ned. Between-Class factors <strong>of</strong>ten result from the variance <strong>of</strong> With<strong>in</strong>-<br />

Class factors (horizontal dependencies), and factors at the top can strongly<br />

<strong>in</strong>fluence factors below them (vertical dependencies). Class Separation is<br />

therefore dependent on all other factors (used with permission by [122]). . . 89<br />

5.1 <strong>Data</strong> projected <strong>in</strong> several subspaces. . . . . . . . . . . . . . . . . . . . . . . 95<br />

5.2 Workflow <strong>of</strong> subspace cluster analysis us<strong>in</strong>g the ClustNails system. . . . . 102


List <strong>of</strong> Figures 137<br />

5.3 Two subspace clusters visualized as spikes. The clusters share common dimensions<br />

but the importance <strong>of</strong> the dimensions for the clusters are di erent.<br />

Dim29 and dim32 <strong>in</strong> the left cluster show smaller pikes than <strong>in</strong> the right<br />

cluster, as they are considered less important for the def<strong>in</strong>ition <strong>of</strong> that cluster<br />

accord<strong>in</strong>g to our measure wk m . Furthermore, the left cluster has fewer<br />

dimensions and more objects than the right cluster. . . . . . . . . . . . . . . 103<br />

5.4 HeatNails visualization. Bottom: show<strong>in</strong>g the distribution <strong>of</strong> dimension<br />

values for all dimensions (rows) and records (columns). Top: show<strong>in</strong>g histograms<br />

for the values <strong>of</strong> all dimensions per cluster for comparison purposes.104<br />

5.5 <strong>Visual</strong>ization <strong>of</strong> the subspace clusters <strong>of</strong> the USDA Food Composition data<br />

set generated by Proclus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107<br />

5.6 Sorted view (Value order<strong>in</strong>g function applied). . . . . . . . . . . . . . . . . 107<br />

5.7 <strong>Visual</strong>ization <strong>of</strong> the subspace clusters <strong>in</strong> VISA [14] framework discussed <strong>in</strong><br />

Subsection 5.1.5. Cluster view (left), record view (right). . . . . . . . . . . . 108<br />

5.8 Alternative data distributions and group<strong>in</strong>gs from [103] <strong>in</strong> two di erent subspaces<br />

<strong>of</strong> a larger high-dimensional data space (doma<strong>in</strong> here: demographic<br />

data analysis). Our proposed visual analysis method <strong>in</strong>tegrates the notion<br />

<strong>of</strong> alternative subspaces <strong>in</strong>to the analysis process and l<strong>in</strong>ks it to the task <strong>of</strong><br />

comparative cluster analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . 111<br />

5.9 Our proposed analysis pipel<strong>in</strong>e. A subspace selection algorithm is applied<br />

to automatically identify a candidate set <strong>of</strong> <strong>in</strong>terest<strong>in</strong>g subspaces. A filter<strong>in</strong>g<br />

step reduces the potentially large and redundant set <strong>of</strong> automatically<br />

obta<strong>in</strong>ed subspaces to a user-selectable number <strong>of</strong> represent<strong>in</strong>g subspaces.<br />

<strong>Visual</strong>-<strong>in</strong>teractive user exploration then proceeds on the subspace representations.<br />

Subspace analysis is also supported by comparative cluster views,<br />

allow<strong>in</strong>g users to identify mean<strong>in</strong>gful similar, complementary or even conflict<strong>in</strong>g<br />

cluster<strong>in</strong>g structures <strong>in</strong> the set <strong>of</strong> subspaces. . . . . . . . . . . . . . 114<br />

5.10 Filter<strong>in</strong>g cases that can be supported by our two def<strong>in</strong>ed subspace similarity<br />

functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115<br />

5.11 Subspace representation by 2D scatterplots with dimension glyph. We can<br />

see the visual representations <strong>of</strong> two 5D subspaces (left) and one 4D subspace<br />

(right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116<br />

5.12 (1) L<strong>in</strong>early sorted view <strong>of</strong> subspaces for the 12D synthetical data set from<br />

[52] show<strong>in</strong>g the full result <strong>of</strong> SURFING, consist<strong>in</strong>g <strong>of</strong> 296 subspaces. The<br />

selected subspace <strong>in</strong> this view is shown <strong>in</strong> a (2) s<strong>in</strong>gle subspace view to<br />

enable <strong>in</strong>teraction and <strong>in</strong> (3) a parallel coord<strong>in</strong>ates view with the subspace<br />

dimensions as the first axes (highlighted), and all the other data dimension<br />

as the last axes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117<br />

5.13 Hierarchical agglomerative group<strong>in</strong>g <strong>of</strong> the 296 <strong>in</strong>terest<strong>in</strong>g subspaces. The<br />

red l<strong>in</strong>e shows the threshold for 6 groups shown <strong>in</strong> the subspace group view.<br />

Each group is marked by a colored rectangle. The colors are ma<strong>in</strong>ta<strong>in</strong>ed <strong>in</strong><br />

Figure 5.14. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118<br />

5.14 Subspace group view for the 12D synthetic data set with six subspace groups.118<br />

5.15 Dimension-based subspace similarity MDS view <strong>of</strong> the 296 subspaces selected<br />

by the subspace search algorithm. . . . . . . . . . . . . . . . . . . . . 119


138 List <strong>of</strong> Figures<br />

5.16 All l<strong>in</strong>ked views: (1) Subspace group view for the 12D synthetic data set<br />

with six subspace groups. (2) S<strong>in</strong>gle subspace view show<strong>in</strong>g the representative<br />

subspace for the first group. (3) Details-on-demand <strong>in</strong> the parallel<br />

coord<strong>in</strong>ates view for the selected subspace. (4) The MDS layout <strong>of</strong> the subspace<br />

search results based on their dimension similarity. (5) Group detail<br />

view for the three (orange, green, purple) subspace groups. (6) Hierarchical<br />

navigation buttons. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120<br />

5.17 L<strong>in</strong>early sorted view cut-out <strong>of</strong> subspaces for the 18D USDA Food Composition<br />

data set. The full result <strong>of</strong> SURFING, consist<strong>in</strong>g <strong>of</strong> 216 subspaces.<br />

We see a rather high level <strong>of</strong> redundancy. Subspaces exhibit<strong>in</strong>g more structure<br />

are found <strong>in</strong> particular at the mid and end positions <strong>in</strong> the rank<strong>in</strong>g.<br />

Rely<strong>in</strong>g only on the numerically top ranked results, we would have omitted<br />

such <strong>in</strong>terest<strong>in</strong>g cases from the analysis. . . . . . . . . . . . . . . . . . . . . 123<br />

5.18 (A) Interest<strong>in</strong>g spotted subspace (Carbohydrat,Fibre)present<strong>in</strong>gtwoclusters.<br />

(B) Subspace (Carbohydarte,Lipid,Prote<strong>in</strong>) <strong>in</strong> the same cluster<br />

group <strong>of</strong> (A) where the cluster structure changes. (C) Green marked third<br />

cluster <strong>in</strong> subspace from (B). (D) Subspace (Fiber,Prote<strong>in</strong>,Vit D ) <strong>of</strong> orange<br />

color-framed subspace group, where the alternative cluster<strong>in</strong>g <strong>of</strong> po<strong>in</strong>ts<br />

is visible. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123<br />

5.19 (1) Grouped view <strong>of</strong> subspaces for the 18D USDA Food Composition <strong>Data</strong><br />

Set with 12 group representatives. (2) The brown and orange group components<br />

are shown <strong>in</strong> the components view. (3) MDS Layout <strong>of</strong> the total<br />

number <strong>of</strong> subspaces with cross-colored group representatives. . . . . . . . . 124<br />

6.1 Interactive exploration <strong>of</strong> subspace cluster<strong>in</strong>g results. . . . . . . . . . . . . . 130<br />

6.2 Interactive exploration <strong>of</strong> subspace search results. . . . . . . . . . . . . . . . 131<br />

6.3 <strong>Visual</strong> comparison <strong>of</strong> subspace cluster<strong>in</strong>g results us<strong>in</strong>g visualization. . . . . 131<br />

6.4 <strong>Visual</strong>-assisted <strong>in</strong>-l<strong>in</strong>e steer<strong>in</strong>g <strong>of</strong> subspace cluster<strong>in</strong>g. . . . . . . . . . . . . . 132<br />

A.1 Empirical study experiment form version A. . . . . . . . . . . . . . . . . . . 153<br />

A.2 Empirical study experiment form version B. . . . . . . . . . . . . . . . . . . 154<br />

A.3 The eight projections that where never selected by a user as be<strong>in</strong>g on the<br />

scale 1 to 5 <strong>in</strong> terms <strong>of</strong> separability <strong>of</strong> classes among the 18 presented plots. 155<br />

A.4 Pipel<strong>in</strong>e for “A Projection Pursuit Algorithm for Exploratory <strong>Data</strong> Analysis”<br />

by Friedman and Tukey [54]: (A) di erent 2D l<strong>in</strong>ear, but not axisparallel,<br />

data projections are computed and evaluated by the quality metric;<br />

(B) the best projection direction is chosen by the quality metric, called “usefulness”<br />

<strong>in</strong>dex, that measures the quality <strong>of</strong> a projection axis and varies the<br />

projection direction so that the <strong>in</strong>dex is maximized. . . . . . . . . . . . . . 156


List <strong>of</strong> Figures 139<br />

A.5 Pipel<strong>in</strong>e for “A Rank-by-Feature Framework for Interactive Exploration <strong>of</strong><br />

Multidimensional <strong>Data</strong>” by Seo and Shneiderman [126]: (A) generation <strong>of</strong><br />

projections and each 1D and 2D projection is evaluated/ranked by a quality<br />

metric selected by the user; (B) best projections are presented; (C) present<br />

rank<strong>in</strong>g scores <strong>in</strong> a color coded grid (“Score Overview”), as well as an colorcoded<br />

“Ordered List” for each projection. The user selects one view <strong>in</strong> the<br />

list or grid, and can also change dimension axes and then the view adapts.<br />

Please note: here we have a visualization <strong>of</strong> dimensions and quality metric<br />

scores, that are highly <strong>in</strong>teractive, rather than a static projection <strong>of</strong> data<br />

records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156<br />

A.6 Pipel<strong>in</strong>e for “F<strong>in</strong>d<strong>in</strong>g and <strong>Visual</strong>iz<strong>in</strong>g Relevant Subspaces for Cluster<strong>in</strong>g<br />

<strong>High</strong>-<strong>Dimensional</strong> Astronomical <strong>Data</strong> Us<strong>in</strong>g Connected Morphological Operators”<br />

by Ferdosi et al. [52]: (A) generation <strong>of</strong> projections, all above 3D<br />

are reduced with PCA; the user can change the smooth<strong>in</strong>g parameter, what<br />

<strong>in</strong>fluences the number <strong>of</strong> projections; (B) evaluate each view; the user can<br />

select the view to <strong>in</strong>spect. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157<br />

A.7 Pipel<strong>in</strong>e for “Graph-Theoretic Scagnostics” by Wilk<strong>in</strong>son et al. [151]: (A)<br />

generation <strong>of</strong> projections; (B) all 2D views are ranked by several metrics; (C)<br />

once the metrics have been computed, they are used to create the SPLOM<br />

(rows and columns are the metrics) - projections are mapped as data po<strong>in</strong>ts. 157<br />

A.8 Pipel<strong>in</strong>e for “Select<strong>in</strong>g good views <strong>of</strong> high-dimensional data us<strong>in</strong>g class consistency”<br />

by Sips et al. [129]: (A) all 2D projections are ranked with the<br />

quality metric; (B) each view is associated with a quality metric computed<br />

<strong>in</strong> A; (C) view transformation decides which scatterplot to highlight (fade<br />

out) depend<strong>in</strong>g on the quality values and the set threshold. . . . . . . . . . 157<br />

A.9 Pipel<strong>in</strong>e for “Coord<strong>in</strong>at<strong>in</strong>g computational and visual approaches for <strong>in</strong>teractive<br />

feature selection and multivariate cluster<strong>in</strong>g” by Guo [59]: (A) all 2D<br />

projections are evaluated with the “m<strong>in</strong>imum conditional entropy (MCE)”;<br />

(B) orig<strong>in</strong>al dimensions are clustered to f<strong>in</strong>d an order<strong>in</strong>g accord<strong>in</strong>g to their<br />

MCE value; (C) matrix ordered accord<strong>in</strong>g to dimension cluster<strong>in</strong>g. The<br />

user can 1) select, add to, or subtract from a variable subset that is analyzed<br />

further; 2) move the threshold bar for the connect<strong>in</strong>g edges, and<br />

clusters are automatically extracted and colored; 3) <strong>in</strong>teract to l<strong>in</strong>k, brush<br />

and select elements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158<br />

A.10 Pipel<strong>in</strong>e for “Explor<strong>in</strong>g <strong>High</strong>-D Spaces with Multiform Matrices and Small<br />

Multiples” by MacEachren et al. [98]: (A) automatic selection <strong>of</strong> potentially<br />

<strong>in</strong>terest<strong>in</strong>g subspaces <strong>of</strong> variables; the user can also manually select<br />

subspaces; (B) all 2D plots are ranked with a quality metric (conditional<br />

entropy based); (C) the matrix view is colored and ordered accord<strong>in</strong>g to<br />

the quality metric value. The user can select a dimension subset to be<br />

visualized with other visualization techniques. . . . . . . . . . . . . . . . . . 158<br />

A.11 Pipel<strong>in</strong>e for “Improv<strong>in</strong>g the <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-dimensional <strong>Data</strong>sets<br />

Us<strong>in</strong>g Quality Measures” by Albuquerque et al. [8] for Jigsaw Maps: (A)<br />

mapp<strong>in</strong>g <strong>of</strong> dimension to 2D displays; (B) all 2D plots are ranked with a<br />

quality metric to select the best. . . . . . . . . . . . . . . . . . . . . . . . . 158


140 List <strong>of</strong> Figures<br />

A.12 Pipel<strong>in</strong>e for “Improv<strong>in</strong>g the <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-dimensional <strong>Data</strong>sets<br />

Us<strong>in</strong>g Quality Measures” by Albuquerque et al. [8] for RadVis: (A) all views<br />

are ranked with a quality metric; (B) dimensions are ordered accord<strong>in</strong>g to<br />

quality values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158<br />

A.13 Pipel<strong>in</strong>e for “Improv<strong>in</strong>g the <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-dimensional <strong>Data</strong>sets<br />

Us<strong>in</strong>g Quality Measures” by Albuquerque et al. [8] for Table Lens: (A)<br />

quality metric is computed on the data (B) user can select an area, mark<strong>in</strong>g<br />

dimensions and records; the view is than transformed accord<strong>in</strong>g to the user<br />

<strong>in</strong>teraction; (C) colors are mapped accord<strong>in</strong>g to the quality metrics values<br />

for outliers and correlation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 159<br />

A.14 Pipel<strong>in</strong>e for “Pragnostics: Screen-Space Metrics for Parallel Coord<strong>in</strong>ates”<br />

by Dasputa and Kosara [43]: (A) all 2D views are evaluated accord<strong>in</strong>g to the<br />

metrics; (B) the best pairs are selected to compute the best order<strong>in</strong>g <strong>of</strong> dimensions.<br />

The user can also <strong>in</strong>fluence this decision by select<strong>in</strong>g <strong>in</strong>terest<strong>in</strong>g<br />

plots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159<br />

A.15 Pipel<strong>in</strong>e for “Comb<strong>in</strong><strong>in</strong>g automated analysis and visualization techniques<br />

for e ective exploration <strong>of</strong> high-dimensional data” by Tatu et al. [133] for<br />

HDM: (A) all 2D data tables are evaluated accord<strong>in</strong>g to the 1D-HDM; (B)<br />

create the best nD visible on the 2D plot (with PCA), evaluated by the<br />

2D-HDM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159<br />

A.16 Pipel<strong>in</strong>e for “<strong>High</strong>-<strong>Dimensional</strong> <strong>Visual</strong> <strong>Analytics</strong>: Interactive Exploration<br />

Guided by Pairwise Views <strong>of</strong> Po<strong>in</strong>t Distributions” by Wilk<strong>in</strong>son et al. [152]:<br />

(A) generation <strong>of</strong> projections; (B) all 2D views are evaluated accord<strong>in</strong>g to<br />

quality metric; (C) a sorted/highlighted view is created us<strong>in</strong>g the metrics.<br />

The user can navigate trough the ranked list, and sort and highlight plots<br />

<strong>in</strong> this and the SPLOM view. . . . . . . . . . . . . . . . . . . . . . . . . . . 159<br />

A.17 Pipel<strong>in</strong>e for “Clutter Reduction <strong>in</strong> Multi-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization<br />

Us<strong>in</strong>g Dimension Reorder<strong>in</strong>g” by Peng et al. [112]: (A) quality metric is<br />

computed on the data; (B) quality metric calculated also dependent on the<br />

visual abstraction; (C) best visual mapp<strong>in</strong>g (order<strong>in</strong>g) decided based on<br />

metric values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160<br />

A.18 Pipel<strong>in</strong>e for “Similarity Cluster<strong>in</strong>g <strong>of</strong> Dimensions for an Enhanced <strong>Visual</strong>ization<br />

<strong>of</strong> Multidimensional <strong>Data</strong>” by Ankerst et al. [9]: (A) quality metric<br />

is computed on the data; (B) quality metric calculated also dependent on<br />

the visual abstraction; (C) best visual mapp<strong>in</strong>g (order<strong>in</strong>g) decided based<br />

on metric values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160<br />

A.19 Pipel<strong>in</strong>e for “Quality Metrics for 2D Scatterplot Graphics: Automatically<br />

Reduc<strong>in</strong>g <strong>Visual</strong> Clutter” by Bert<strong>in</strong>i and Santucci [24]: (A) quality metric<br />

is computed on the data density and screen density and compared; (B)<br />

projection and sampl<strong>in</strong>g based on metric values. . . . . . . . . . . . . . . . 160<br />

A.20 Pipel<strong>in</strong>e for “A Screen Space Quality Method for <strong>Data</strong> Abstraction” by Johansson<br />

and Cooper [80]: (A) sampled and orig<strong>in</strong>al data tables are associated<br />

to quality metric computed on the views <strong>of</strong> sampled and orig<strong>in</strong>al data;<br />

(B) the values are used to decide upon the sampl<strong>in</strong>g rate. . . . . . . . . . . 160


List <strong>of</strong> Figures 141<br />

A.21 Pipel<strong>in</strong>e for “Enabl<strong>in</strong>g Automatic Clutter Reduction <strong>in</strong> Parallel Coord<strong>in</strong>ate<br />

Plots” by Ellis and Dix [48]: (A) pixel occlusion is measured <strong>in</strong> the view<br />

space; the user can move a w<strong>in</strong>dow (lens) and sampl<strong>in</strong>g and measur<strong>in</strong>g<br />

occlusion is done only <strong>in</strong> this w<strong>in</strong>dow (B) the values <strong>of</strong> the quality metric<br />

are used to decide upon the sampl<strong>in</strong>g rate. . . . . . . . . . . . . . . . . . . . 161<br />

A.22 Pipel<strong>in</strong>e for “Pixnostics: Towards Measur<strong>in</strong>g the Value <strong>of</strong> <strong>Visual</strong>ization”<br />

by Schneidew<strong>in</strong>d et al. [120]: (A) a subset <strong>of</strong> dimensions is selected with<br />

standard m<strong>in</strong><strong>in</strong>g techniques; (B) alternative mapp<strong>in</strong>gs <strong>of</strong> selected data are<br />

evaluated on the screen space; (C) and (D) based on the quality value the<br />

best subset and mapp<strong>in</strong>g is determ<strong>in</strong>ed. The user can decide to fix map<br />

some data features to visual features manually. . . . . . . . . . . . . . . . . 161<br />

A.23 Hierarchical agglomerative group<strong>in</strong>g <strong>of</strong> the 296 <strong>in</strong>terest<strong>in</strong>g subspaces. The<br />

red l<strong>in</strong>e shows the threshold for 6 groups shown <strong>in</strong> the subspace group view.<br />

Each group is marked by a colored rectangle. The colors are ma<strong>in</strong>ta<strong>in</strong>ed <strong>in</strong><br />

Figure 5.14. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162


142 List <strong>of</strong> Figures


List <strong>of</strong> Tables<br />

3.1 Overview and classification <strong>of</strong> our quality measures. . . . . . . . . . . . . . 31<br />

3.2 Overview over the data sets used to show the measures properties. . . . . . 41<br />

3.3 Overview <strong>of</strong> the analyzed measures with the reference for additional details. 55<br />

3.4 Results <strong>of</strong> the regression analysis. . . . . . . . . . . . . . . . . . . . . . . . . 60<br />

4.1 <strong>Visual</strong>ization techniques categorized by their layout dimensionality (i.e., the<br />

number <strong>of</strong> axes <strong>of</strong> the visualization). . . . . . . . . . . . . . . . . . . . . . . 77<br />

4.2 Quality metrics papers classified accord<strong>in</strong>g to quality metrics factors (sorted<br />

by purpose). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78<br />

A.1 Dimension names for the Cars data set. . . . . . . . . . . . . . . . . . . . . 146<br />

A.2 Dimension names for the Olives data set [163]. . . . . . . . . . . . . . . . . 146<br />

A.3 Dimension names for the Park<strong>in</strong>son’s Disease data set [95, 96]. . . . . . . . 147<br />

A.4 Dimension names for the W<strong>in</strong>e data set [53]. . . . . . . . . . . . . . . . . . 147<br />

A.5 Dimension names for the WDBC data set [131]. . . . . . . . . . . . . . . . . 148


144 List <strong>of</strong> Tables


A<br />

Appendix<br />

Contents<br />

A.1 Orig<strong>in</strong>al <strong>Data</strong> Dimensions for Used <strong>Data</strong> Sets . . . . . . . . . . 145<br />

A.2 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149<br />

A.2.1 General Questions Form . . . . . . . . . . . . . . . . . . . . . . . 149<br />

A.2.2 Experiment Form . . . . . . . . . . . . . . . . . . . . . . . . . . . 152<br />

A.2.3 Additional Experiment Results . . . . . . . . . . . . . . . . . . . 155<br />

A.3 Quality Metrics Pipel<strong>in</strong>es for the Literature Review . . . . . . 156<br />

A.4 Hierarchical Group<strong>in</strong>g <strong>of</strong> Interest<strong>in</strong>g Subspaces . . . . . . . . . 162<br />

A.1 Orig<strong>in</strong>al <strong>Data</strong> Dimensions for Used <strong>Data</strong> Sets<br />

The Cars data set was collected by another <strong>in</strong>stitute from Braunschweig and provided to<br />

our partners there and has the orig<strong>in</strong>al dimensions enumerated <strong>in</strong> Table A.1.


146 Appendix A. Appendix<br />

Table A.1: Dimension names for the Cars data set.<br />

orig<strong>in</strong>al<br />

TYPEOFMOTOR<br />

MANUFACTURER<br />

TYPE<br />

PRICE<br />

CYLINDERCAPACITY<br />

POWER<br />

RPM<br />

TORQUE<br />

VMAX<br />

ACCELERATION<br />

FUELCONSUMPTION<br />

CO2EMISSION<br />

WEIGHT<br />

LENGTH<br />

WIDTH<br />

HEIGHT<br />

WHEELBASE<br />

LOADCAPACITY<br />

TRUNK<br />

TOWINGCAPACITY<br />

ROOFLOAD<br />

TANKCAPACITY<br />

TAXES<br />

renamed<br />

dim0 (class)<br />

dim1<br />

dim2<br />

dim3<br />

dim4<br />

dim5<br />

dim6<br />

dim7<br />

dim8<br />

dim9<br />

dim10<br />

dim11<br />

dim12<br />

dim13<br />

dim14<br />

dim15<br />

dim16<br />

dim17<br />

dim18<br />

dim19<br />

dim20<br />

dim21<br />

dim22<br />

The Olives data set can be found at http://www2.chemie.uni-erlangen.de/publications/<br />

ANN-book/datasets/oliveoil/<strong>in</strong>dex.html and has the orig<strong>in</strong>al dimensions enumerated<br />

<strong>in</strong> Table A.2.<br />

Table A.2: Dimension names for the Olives data set [163].<br />

orig<strong>in</strong>al<br />

palmitic<br />

palmitoleic<br />

stearic<br />

oleic<br />

l<strong>in</strong>oleic<br />

l<strong>in</strong>olenic<br />

arachidic<br />

eicosenoic<br />

area<br />

renamed<br />

dim1<br />

dim2<br />

dim3<br />

dim4<br />

dim5<br />

dim6<br />

dim7<br />

dim8<br />

dim9 (class)


A.1. Orig<strong>in</strong>al <strong>Data</strong> Dimensions for Used <strong>Data</strong> Sets 147<br />

The Park<strong>in</strong>son’s Disease data set can be found at http://archive.ics.uci.edu/ml/<br />

datasets/Park<strong>in</strong>sons and has the orig<strong>in</strong>al dimensions enumerated <strong>in</strong> Table A.3.<br />

Table A.3: Dimension names for the Park<strong>in</strong>son’s Disease data set [95, 96].<br />

orig<strong>in</strong>al<br />

status - health status <strong>of</strong> the subject (one) - Park<strong>in</strong>son’s, (zero) - healthy<br />

MDVP:Fo(Hz) - average vocal fundamental frequency<br />

MDVP:Fhi(Hz) - maximum vocal fundamental frequency<br />

MDVP:Flo(Hz) - m<strong>in</strong>imum vocal fundamental frequency<br />

MDVP:Shimmer(dB) - measure <strong>of</strong> variation <strong>in</strong> amplitude<br />

HNR - measure <strong>of</strong> ratio <strong>of</strong> noise to tonal components <strong>in</strong> the voice<br />

RPDE - nonl<strong>in</strong>ear dynamical complexity measure<br />

D2 - nonl<strong>in</strong>ear dynamical complexity measure<br />

DFA - signal fractal scal<strong>in</strong>g exponent<br />

spread1 - nonl<strong>in</strong>ear measure <strong>of</strong> fundamental frequency variation<br />

spread2 - nonl<strong>in</strong>ear measure <strong>of</strong> fundamental frequency variation<br />

PPE - nonl<strong>in</strong>ear measure <strong>of</strong> fundamental frequency variation<br />

renamed<br />

dim1 (class)<br />

dim2<br />

dim3<br />

dim4<br />

dim5<br />

dim6<br />

dim7<br />

dim8<br />

dim9<br />

dim10<br />

dim11<br />

dim12<br />

The W<strong>in</strong>e data set can be found at http://archive.ics.uci.edu/ml/datasets/<br />

W<strong>in</strong>e and has the orig<strong>in</strong>al dimensions enumerated <strong>in</strong> Table A.4.<br />

Table A.4: Dimension names for the W<strong>in</strong>e data set [53].<br />

orig<strong>in</strong>al<br />

Alcohol<br />

Malic acid<br />

Ash<br />

Alcal<strong>in</strong>ity <strong>of</strong> ash<br />

Magnesium<br />

Total phenols<br />

Flavanoids<br />

Nonflavanoid phenols<br />

Proanthocyan<strong>in</strong>s<br />

Color <strong>in</strong>tensity<br />

Hue<br />

OD280/OD315 <strong>of</strong> diluted w<strong>in</strong>es<br />

Prol<strong>in</strong>e<br />

cluster ID<br />

renamed<br />

dim1<br />

dim2<br />

dim3<br />

dim4<br />

dim5<br />

dim6<br />

dim7<br />

dim8<br />

dim9<br />

dim10<br />

dim11<br />

dim12<br />

dim13<br />

dim14 (class)


148 Appendix A. Appendix<br />

The Wiscons<strong>in</strong> Diagnostic Breast Cancer (WDBC) data set can be found at http://<br />

archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wiscons<strong>in</strong>+(Diagnostic) and has<br />

the orig<strong>in</strong>al dimensions enumerated <strong>in</strong> Table A.5. [131] conta<strong>in</strong>s detailed descriptions <strong>of</strong><br />

how these features are computed.<br />

Table A.5: Dimension names for the WDBC data set [131].<br />

orig<strong>in</strong>al<br />

diagnosis<br />

radius<br />

texture<br />

perimeter<br />

area<br />

smoothness (local variation <strong>in</strong> radius lengths)<br />

compactness (perimeter 2 / area - 1.0)<br />

concavity (severity <strong>of</strong> concave portions <strong>of</strong> the contour)<br />

concave po<strong>in</strong>ts (number <strong>of</strong> concave portions <strong>of</strong> the contour)<br />

symmetry<br />

fractal dimension (“coastl<strong>in</strong>e approximation” - 1)<br />

renamed<br />

dim1 (class)<br />

dim2-dim31<br />

The mean, standard error, and “worst” or largest (mean <strong>of</strong> the three largest values) <strong>of</strong><br />

these features were computed for each image, result<strong>in</strong>g <strong>in</strong> 30 features. For <strong>in</strong>stance, dim2<br />

is Mean Radius, dim12 is Radius SE, dim22 is Worst Radius.


A.2. Empirical Study 149<br />

A.2 Empirical Study<br />

A.2.1<br />

General Questions Form<br />

On the next two pages we present the <strong>in</strong>troductory form to our study. The participants<br />

were asked to fill <strong>in</strong> personal <strong>in</strong>formation on the first page and the experiment task was<br />

expla<strong>in</strong>ed on the second page trough an example.<br />

Please note that the study took place at University <strong>of</strong> Konstanz, therefore the forms<br />

are <strong>in</strong> German language.


150 Appendix A. Appendix<br />

Fragen zur Person<br />

(* Zutreffendes bitte ankreutzen)<br />

Studienfach:<br />

Anzahl der Fachsemester:<br />

Geschlecht*: männlich weiblich<br />

Alter:<br />

Wie <strong>of</strong>t haben Sie sich mit Daten und ihrer Auswertung beschäftigt*?<br />

(wie z.B. Excel-Tabellen, Datenbanken, usw.)<br />

Laufend Oft Manchmal Selten Nie<br />

Verwendete S<strong>of</strong>tware:<br />

Wie <strong>of</strong>t haben Sie sich mit der graphischen Darstellung von Daten beschäftigt*?<br />

(wie z.B. Excel-Diagramme, usw.)<br />

Laufend Oft Manchmal Selten Nie<br />

Verwendete S<strong>of</strong>tware:<br />

(* Zutreffendes bitte ankreutzen)


A.2.1 General Questions Form 151<br />

Wir bitten Sie die Anweisung aufmerksam durchzulesen und dann den folgenden Bogen<br />

ohne Unterbrechung durchzuarbeiten.<br />

Stellen Sie sich vor Sie s<strong>in</strong>d We<strong>in</strong>händler und haben e<strong>in</strong> großes Repertoire an<br />

We<strong>in</strong>flaschen. Ihre We<strong>in</strong>flaschen lassen sich <strong>in</strong> drei We<strong>in</strong>sorten e<strong>in</strong>teilen (Apperetive-,<br />

Likör-, und Tafelwe<strong>in</strong>). Alle We<strong>in</strong>flaschen haben e<strong>in</strong>e Reihe von Standartanalysen<br />

durchlaufen, die Aufschluss über ihre Eigenschaften liefern, wie z.B. Alkoholgehalt,<br />

Farbtönung, usw. Die Ergebnisse dieser Analysen s<strong>in</strong>d <strong>in</strong> 18 Streudiagrammen<br />

dargestellt, <strong>in</strong> denen immer zwei Eigenschaften (z.B.: X–Y, etc.) gegene<strong>in</strong>ander<br />

aufgetragen s<strong>in</strong>d. Jede We<strong>in</strong>flasche ist durch e<strong>in</strong>en Punkt im Diagramm dargestellt, die<br />

Farben der Punkte stehen für die drei We<strong>in</strong>sorten. An Hand dieser Darstellungen<br />

müssen Sie bestimmen welches Eigenschaftspaar sich am besten zur Unterscheidung<br />

der We<strong>in</strong>sorten eignet, wie im folgenden Beispiel gezeigt wird:<br />

Eigenschaft Y<br />

Sorte A<br />

Sorte B<br />

Eigenschaft X<br />

Sorte C<br />

Nun liegt Ihre Aufgabe dar<strong>in</strong> die Darstellungen auszuwählen,<br />

die sich gut zur Unterscheidung von We<strong>in</strong>sorten eignen!<br />

Bitte vergeben Sie unter den 5 besten Darstellungen die Zahlen 1 bis 5 (1 steht für die<br />

beste Darstellung). Die Zahlen s<strong>in</strong>d <strong>in</strong> die Kästchen neben der Darstellung e<strong>in</strong>zutragen.<br />

Die Kästchen der anderen Darstellungen können leer bleiben.<br />

Während der Bearbeitung der nächsten Seite bitten wir sie um Ruhe und Konzentration.<br />

Vielen Dank für Ihre Teilnahme!


152 Appendix A. Appendix<br />

A.2.2<br />

Experiment Form<br />

On the next page we show two examples <strong>of</strong> the study forms for the participants <strong>of</strong> the<br />

empirical study described <strong>in</strong> Section 3.2. Every participant were shown the same plots<br />

but ordered by a di erent permutation.


A.2.2 Experiment Form 153<br />

Vergeben Sie unter den 5 besten Darstellungen die Zahlen 1 - 5 (1 für die Beste).A<br />

Figure A.1: Empirical study experiment form version A.


154 Appendix A. Appendix<br />

Vergeben Sie unter den 5 besten Darstellungen die Zahlen 1 - 5 (1 für die Beste).B<br />

Figure A.2: Empirical study experiment form version B.


A.2.3 Additional Experiment Results 155<br />

A.2.3<br />

Additional Experiment Results<br />

This plots have never been selected by a user as be<strong>in</strong>g on a scale from 1-5 between the<br />

best plots out <strong>of</strong> the 18 presented.<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

Figure A.3: The eight projections that where never selected by a user as be<strong>in</strong>g on the scale 1 to 5<br />

<strong>in</strong> terms <strong>of</strong> separability <strong>of</strong> classes among the 18 presented plots.


156 Appendix A. Appendix<br />

A.3 Quality Metrics Pipel<strong>in</strong>es for the Literature Review<br />

Here we attach all the quality metrics pipel<strong>in</strong>es for all the papers from the taxonomy<br />

presented <strong>in</strong> Section 4.1 and summarized <strong>in</strong> Table 4.2 that are not part <strong>of</strong> the examples<br />

<strong>of</strong> this section. We ordered them <strong>in</strong> the same order that the papers are presented <strong>in</strong> the<br />

taxonomy’s table.<br />

Quality-Metrics-Driven Automation<br />

B<br />

A<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

Figure A.4: Pipel<strong>in</strong>e for “A Projection Pursuit Algorithm for Exploratory <strong>Data</strong> Analysis” by Friedman<br />

and Tukey [54]: (A) di erent 2D l<strong>in</strong>ear, but not axis-parallel, data projections are computed<br />

and evaluated by the quality metric; (B) the best projection direction is chosen by the quality<br />

metric, called “usefulness” <strong>in</strong>dex, that measures the quality <strong>of</strong> a projection axis and varies the<br />

projection direction so that the <strong>in</strong>dex is maximized.<br />

Quality-Metrics-Driven Automation<br />

B<br />

A<br />

C<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

Figure A.5: Pipel<strong>in</strong>e for “A Rank-by-Feature Framework for Interactive Exploration <strong>of</strong> Multidimensional<br />

<strong>Data</strong>” by Seo and Shneiderman [126]: (A) generation <strong>of</strong> projections and each 1D and<br />

2D projection is evaluated/ranked by a quality metric selected by the user; (B) best projections<br />

are presented; (C) present rank<strong>in</strong>g scores <strong>in</strong> a color coded grid (“Score Overview”), as well as an<br />

color-coded “Ordered List” for each projection. The user selects one view <strong>in</strong> the list or grid, and<br />

can also change dimension axes and then the view adapts. Please note: here we have a visualization<br />

<strong>of</strong> dimensions and quality metric scores, that are highly <strong>in</strong>teractive, rather than a static projection<br />

<strong>of</strong> data records.


A.3. Quality Metrics Pipel<strong>in</strong>es for the Literature Review 157<br />

Quality-Metrics-Driven Automation<br />

A<br />

B<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

Figure A.6: Pipel<strong>in</strong>e for “F<strong>in</strong>d<strong>in</strong>g and <strong>Visual</strong>iz<strong>in</strong>g Relevant Subspaces for Cluster<strong>in</strong>g <strong>High</strong>-<br />

<strong>Dimensional</strong> Astronomical <strong>Data</strong> Us<strong>in</strong>g Connected Morphological Operators” by Ferdosi et al. [52]:<br />

(A) generation <strong>of</strong> projections, all above 3D are reduced with PCA; the user can change the smooth<strong>in</strong>g<br />

parameter, what <strong>in</strong>fluences the number <strong>of</strong> projections; (B) evaluate each view; the user can<br />

select the view to <strong>in</strong>spect.<br />

Quality-Metrics-Driven Automation<br />

A C B<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

Figure A.7: Pipel<strong>in</strong>e for “Graph-Theoretic Scagnostics” by Wilk<strong>in</strong>son et al. [151]: (A) generation<br />

<strong>of</strong> projections; (B) all 2D views are ranked by several metrics; (C) once the metrics have been<br />

computed, they are used to create the SPLOM (rows and columns are the metrics) - projections<br />

are mapped as data po<strong>in</strong>ts.<br />

Quality-Metrics-Driven Automation<br />

A<br />

C<br />

B<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

Figure A.8: Pipel<strong>in</strong>e for “Select<strong>in</strong>g good views <strong>of</strong> high-dimensional data us<strong>in</strong>g class consistency”<br />

by Sips et al. [129]: (A) all 2D projections are ranked with the quality metric; (B) each view is<br />

associated with a quality metric computed <strong>in</strong> A; (C) view transformation decides which scatterplot<br />

to highlight (fade out) depend<strong>in</strong>g on the quality values and the set threshold.


158 Appendix A. Appendix<br />

Quality-Metrics-Driven Automation<br />

B A A C<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

1 2 3<br />

Views<br />

Figure A.9: Pipel<strong>in</strong>e for “Coord<strong>in</strong>at<strong>in</strong>g computational and visual approaches for <strong>in</strong>teractive feature<br />

selection and multivariate cluster<strong>in</strong>g” by Guo [59]: (A) all 2D projections are evaluated with the<br />

“m<strong>in</strong>imum conditional entropy (MCE)”; (B) orig<strong>in</strong>al dimensions are clustered to f<strong>in</strong>d an order<strong>in</strong>g<br />

accord<strong>in</strong>g to their MCE value; (C) matrix ordered accord<strong>in</strong>g to dimension cluster<strong>in</strong>g. The user<br />

can 1) select, add to, or subtract from a variable subset that is analyzed further; 2) move the<br />

threshold bar for the connect<strong>in</strong>g edges, and clusters are automatically extracted and colored; 3)<br />

<strong>in</strong>teract to l<strong>in</strong>k, brush and select elements.<br />

Quality-Metrics-Driven Automation<br />

A B C<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

Figure A.10: Pipel<strong>in</strong>e for “Explor<strong>in</strong>g <strong>High</strong>-D Spaces with Multiform Matrices and Small Multiples”<br />

by MacEachren et al. [98]: (A) automatic selection <strong>of</strong> potentially <strong>in</strong>terest<strong>in</strong>g subspaces <strong>of</strong> variables;<br />

the user can also manually select subspaces; (B) all 2D plots are ranked with a quality metric<br />

(conditional entropy based); (C) the matrix view is colored and ordered accord<strong>in</strong>g to the quality<br />

metric value. The user can select a dimension subset to be visualized with other visualization<br />

techniques.<br />

Quality-Metrics-Driven Automation<br />

A<br />

B<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

Figure A.11: Pipel<strong>in</strong>e for “Improv<strong>in</strong>g the <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-dimensional <strong>Data</strong>sets Us<strong>in</strong>g<br />

Quality Measures” by Albuquerque et al. [8] for Jigsaw Maps: (A) mapp<strong>in</strong>g <strong>of</strong> dimension to 2D<br />

displays; (B) all 2D plots are ranked with a quality metric to select the best.<br />

Quality-Metrics-Driven Automation<br />

B<br />

A<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

Figure A.12: Pipel<strong>in</strong>e for “Improv<strong>in</strong>g the <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-dimensional <strong>Data</strong>sets Us<strong>in</strong>g<br />

Quality Measures” by Albuquerque et al. [8] for RadVis: (A) all views are ranked with a quality<br />

metric; (B) dimensions are ordered accord<strong>in</strong>g to quality values.


A.3. Quality Metrics Pipel<strong>in</strong>es for the Literature Review 159<br />

Quality-Metrics-Driven Automation<br />

A<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

C<br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

B<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

Figure A.13: Pipel<strong>in</strong>e for “Improv<strong>in</strong>g the <strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-dimensional <strong>Data</strong>sets Us<strong>in</strong>g<br />

Quality Measures” by Albuquerque et al. [8] for Table Lens: (A) quality metric is computed on<br />

the data (B) user can select an area, mark<strong>in</strong>g dimensions and records; the view is than transformed<br />

accord<strong>in</strong>g to the user <strong>in</strong>teraction; (C) colors are mapped accord<strong>in</strong>g to the quality metrics values<br />

for outliers and correlation.<br />

Quality-Metrics-Driven Automation<br />

B<br />

A<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

Figure A.14: Pipel<strong>in</strong>e for “Pragnostics: Screen-Space Metrics for Parallel Coord<strong>in</strong>ates” by Dasputa<br />

and Kosara [43]: (A) all 2D views are evaluated accord<strong>in</strong>g to the metrics; (B) the best pairs are<br />

selected to compute the best order<strong>in</strong>g <strong>of</strong> dimensions. The user can also <strong>in</strong>fluence this decision by<br />

select<strong>in</strong>g <strong>in</strong>terest<strong>in</strong>g plots.<br />

Quality-Metrics-Driven Automation<br />

B<br />

A<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

Figure A.15: Pipel<strong>in</strong>e for “Comb<strong>in</strong><strong>in</strong>g automated analysis and visualization techniques for e ective<br />

exploration <strong>of</strong> high-dimensional data” by Tatu et al. [133] for HDM: (A) all 2D data tables are<br />

evaluated accord<strong>in</strong>g to the 1D-HDM; (B) create the best nD visible on the 2D plot (with PCA),<br />

evaluated by the 2D-HDM.<br />

Quality-Metrics-Driven Automation<br />

A<br />

C<br />

B<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

Figure A.16: Pipel<strong>in</strong>e for “<strong>High</strong>-<strong>Dimensional</strong> <strong>Visual</strong> <strong>Analytics</strong>: Interactive Exploration Guided by<br />

Pairwise Views <strong>of</strong> Po<strong>in</strong>t Distributions” by Wilk<strong>in</strong>son et al. [152]: (A) generation <strong>of</strong> projections;<br />

(B) all 2D views are evaluated accord<strong>in</strong>g to quality metric; (C) a sorted/highlighted view is created<br />

us<strong>in</strong>g the metrics. The user can navigate trough the ranked list, and sort and highlight plots <strong>in</strong><br />

this and the SPLOM view.


160 Appendix A. Appendix<br />

Quality-Metrics-Driven Automation<br />

A<br />

C<br />

B<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

Figure A.17: Pipel<strong>in</strong>e for “Clutter Reduction <strong>in</strong> Multi-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization Us<strong>in</strong>g Dimension<br />

Reorder<strong>in</strong>g” by Peng et al. [112]: (A) quality metric is computed on the data; (B) quality<br />

metric calculated also dependent on the visual abstraction; (C) best visual mapp<strong>in</strong>g (order<strong>in</strong>g)<br />

decided based on metric values.<br />

Quality-Metrics-Driven Automation<br />

A<br />

C<br />

B<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

Figure A.18: Pipel<strong>in</strong>e for “Similarity Cluster<strong>in</strong>g <strong>of</strong> Dimensions for an Enhanced <strong>Visual</strong>ization<br />

<strong>of</strong> Multidimensional <strong>Data</strong>” by Ankerst et al. [9]: (A) quality metric is computed on the data;<br />

(B) quality metric calculated also dependent on the visual abstraction; (C) best visual mapp<strong>in</strong>g<br />

(order<strong>in</strong>g) decided based on metric values.<br />

Quality-Metrics-Driven Automation<br />

B A A<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

Figure A.19: Pipel<strong>in</strong>e for “Quality Metrics for 2D Scatterplot Graphics: Automatically Reduc<strong>in</strong>g<br />

<strong>Visual</strong> Clutter” by Bert<strong>in</strong>i and Santucci [24]: (A) quality metric is computed on the data density<br />

and screen density and compared; (B) projection and sampl<strong>in</strong>g based on metric values.<br />

Quality-Metrics-Driven Automation<br />

B A A<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

Figure A.20: Pipel<strong>in</strong>e for “A Screen Space Quality Method for <strong>Data</strong> Abstraction” by Johansson<br />

and Cooper [80]: (A) sampled and orig<strong>in</strong>al data tables are associated to quality metric computed<br />

on the views <strong>of</strong> sampled and orig<strong>in</strong>al data; (B) the values are used to decide upon the sampl<strong>in</strong>g<br />

rate.


A.3. Quality Metrics Pipel<strong>in</strong>es for the Literature Review 161<br />

Quality-Metrics-Driven Automation<br />

B<br />

A<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

Figure A.21: Pipel<strong>in</strong>e for “Enabl<strong>in</strong>g Automatic Clutter Reduction <strong>in</strong> Parallel Coord<strong>in</strong>ate Plots” by<br />

Ellis and Dix [48]: (A) pixel occlusion is measured <strong>in</strong> the view space; the user can move a w<strong>in</strong>dow<br />

(lens) and sampl<strong>in</strong>g and measur<strong>in</strong>g occlusion is done only <strong>in</strong> this w<strong>in</strong>dow (B) the values <strong>of</strong> the<br />

quality metric are used to decide upon the sampl<strong>in</strong>g rate.<br />

Quality-Metrics-Driven Automation<br />

A D C B<br />

Source<br />

<strong>Data</strong><br />

<strong>Data</strong><br />

Transformation<br />

Transformed<br />

<strong>Data</strong><br />

<strong>Visual</strong> Mapp<strong>in</strong>g<br />

<strong>Visual</strong><br />

Structures<br />

View<br />

Transformation<br />

Render<strong>in</strong>g<br />

Views<br />

Figure A.22: Pipel<strong>in</strong>e for “Pixnostics: Towards Measur<strong>in</strong>g the Value <strong>of</strong> <strong>Visual</strong>ization” by Schneidew<strong>in</strong>d<br />

et al. [120]: (A) a subset <strong>of</strong> dimensions is selected with standard m<strong>in</strong><strong>in</strong>g techniques; (B)<br />

alternative mapp<strong>in</strong>gs <strong>of</strong> selected data are evaluated on the screen space; (C) and (D) based on the<br />

quality value the best subset and mapp<strong>in</strong>g is determ<strong>in</strong>ed. The user can decide to fix map some<br />

data features to visual features manually.


162 Appendix A. Appendix<br />

A.4 Hierarchical Group<strong>in</strong>g <strong>of</strong> Interest<strong>in</strong>g Subspaces<br />

hierarchical agglomerative group<strong>in</strong>g<br />

synthetic dataset<br />

0 5 10 15 20 25 30<br />

CFGIJK<br />

CFGIJKL<br />

CFGIK<br />

CFGIKL<br />

CFGJK<br />

CFGJKL<br />

CFGK<br />

CFGKL<br />

CFIKL<br />

CFIJKL<br />

CFIK<br />

CFIJK<br />

CFKL<br />

CFJKL<br />

CFHIKL<br />

CFHIJKL<br />

CFHKL<br />

CFHJKL<br />

CFHK<br />

CFHJK<br />

CFK<br />

CFJK<br />

CFGHJK<br />

CFGHJKL<br />

CFGHK<br />

CFGHKL<br />

CFHIK<br />

CFHIJK<br />

CFGHIK<br />

CFGHIJK<br />

CFIJL<br />

CFHIJL<br />

CFIL<br />

CFHIL<br />

CFJL<br />

CFHJL<br />

CFL<br />

CFHL<br />

CFHI<br />

CFHIJ<br />

CFI<br />

CFIJ<br />

CFH<br />

CFJ<br />

CFHJ<br />

CFGHI<br />

CFGHIJ<br />

CFGI<br />

CFGIJ<br />

CFGJ<br />

CFGHJ<br />

CFG<br />

CFGH<br />

CFGHIL<br />

CFGHIJL<br />

CFGHIKL<br />

CFGHIJKL<br />

CFGHL<br />

CFGHJL<br />

CFGJL<br />

CFGIJL<br />

CFGL<br />

CFGIL<br />

CDGIK<br />

CDGIJK<br />

CDGJK<br />

CDGHJK<br />

CDGK<br />

CDGHK<br />

CDGHKL<br />

CDGHJKL<br />

CDGJKL<br />

CDGIJKL<br />

CDGKL<br />

CDGIKL<br />

CDGHIJ<br />

CDGHIJL<br />

CDGHI<br />

CDGHIL<br />

CDGHIKL<br />

CDGHIJKL<br />

CDGHIK<br />

CDGHIJK<br />

CDGJ<br />

CDGIJ<br />

CDG<br />

CDGI<br />

CDGIL<br />

CDGIJL<br />

CDGJL<br />

CDGHJL<br />

CDGL<br />

CDGHL<br />

CDHIK<br />

CDHKL<br />

CDHIKL<br />

CDIK<br />

CDIKL<br />

CDK<br />

CDKL<br />

CDHJKL<br />

CDHIJKL<br />

CDJKL<br />

CDIJKL<br />

CDHK<br />

CDHJK<br />

CDJK<br />

CDIJK<br />

CDHIJK<br />

CDIJL<br />

CDHIJL<br />

CDIL<br />

CDHIL<br />

CDIJ<br />

CDHIJ<br />

CDI<br />

CDHI<br />

CDGH<br />

CDGHJ<br />

CDH<br />

CDHJ<br />

CDL<br />

CDHL<br />

CDJ<br />

CDJL<br />

CDHJL<br />

CF<br />

BCF<br />

CDF<br />

BCDF BC<br />

CD<br />

BCD<br />

CDFGHJ<br />

CDFGHJL<br />

CDFGJ<br />

CDFGJL<br />

CDFGL<br />

CDFGHL<br />

CDFG<br />

CDFGH<br />

CDFJL<br />

CDFHJL<br />

CDFJ<br />

CDFHJ<br />

CDFL<br />

CDFHL<br />

CDFIJL<br />

CDFHIJL<br />

CDFIL<br />

CDFHIL<br />

CDFGIL<br />

CDFGIJL<br />

CDFIJ<br />

CDFGIJ<br />

CDFI<br />

CDFGI<br />

CDFGHIL<br />

CDFGHIJL<br />

CDFGHI<br />

CDFGHIJ<br />

CDFH<br />

CDFHI<br />

CDFHIJ<br />

CDFIK<br />

CDFGIK<br />

CDFK<br />

CDFGK<br />

CDFHJK<br />

CDFHIJK<br />

CDFJK<br />

CDFIJK<br />

CDFHK<br />

CDFHIK<br />

CDFGHIK<br />

CDFGHIJK<br />

CDFGHK<br />

CDFGHJK<br />

CDFHIKL<br />

CDFHIJKL<br />

CDFHKL<br />

CDFHJKL<br />

CDFGHIKL<br />

CDFGHIJKL<br />

CDFGHKL<br />

CDFGHJKL<br />

CDFKL<br />

CDFIKL<br />

CDFJKL<br />

CDFIJKL<br />

CDFGKL<br />

CDFGIKL<br />

CDFGJKL<br />

CDFGIJKL<br />

CDFGJK<br />

CDFGIJK BL<br />

FL<br />

DL<br />

FH<br />

BH<br />

DH<br />

DG FG<br />

BG DF<br />

BD<br />

BF<br />

BDF BI<br />

FI<br />

DI<br />

BJ<br />

FJ<br />

DJ IL GI<br />

IJ<br />

JL<br />

GL<br />

HL<br />

GH GJ<br />

HJ<br />

BK<br />

FK<br />

DK HI<br />

IK<br />

HK KL<br />

GK JK<br />

CGHIL<br />

CGHIJL<br />

CGIL<br />

CGIJL<br />

CHIL<br />

CHIJL<br />

CIL<br />

CIJL CI<br />

CGI<br />

CGIJ<br />

CGHIJ<br />

CIJ<br />

CHIJ<br />

CGJL<br />

CGHJL<br />

CGL<br />

CGHL<br />

CJL<br />

CHJL<br />

CL<br />

CHL<br />

CGJ<br />

CGHJ<br />

CJ<br />

CHJ<br />

CG<br />

CGH CH<br />

CHI<br />

CGHI<br />

CGIJKL<br />

CGHIJKL<br />

CGIKL<br />

CGHIKL<br />

CGIJK<br />

CGHIJK<br />

CGIK<br />

CGHIK<br />

CHIKL<br />

CHIJKL<br />

CIKL<br />

CIJKL<br />

CIJK<br />

CHIJK<br />

CIK<br />

CHIK<br />

CGJKL<br />

CGHJKL<br />

CGJK<br />

CGHJK<br />

CGHK<br />

CGHKL<br />

CGK<br />

CGKL<br />

CJKL<br />

CHJKL<br />

CKL<br />

CHKL<br />

CHK<br />

CHJK CK<br />

CJK<br />

Subspaces<br />

Distance (Similarity)<br />

Figure A.23: Hierarchical agglomerative group<strong>in</strong>g <strong>of</strong> the 296 <strong>in</strong>terest<strong>in</strong>g subspaces. The red l<strong>in</strong>e<br />

shows the threshold for 6 groups shown <strong>in</strong> the subspace group view. Each group is marked by a<br />

colored rectangle. The colors are ma<strong>in</strong>ta<strong>in</strong>ed <strong>in</strong> Figure 5.14.


Bibliography<br />

[1] ggplot2. http://had.co.nz/ggplot2/.<br />

[2] Protovis. http://vis.stanford.edu/protovis/.<br />

[3] Tableau. http://www.tableaus<strong>of</strong>tware.com/.<br />

[4] C. C. Aggarwal, J. L. Wolf, P. S. Yu, C. Procopiuc, and J. S. Park. Fast algorithms for<br />

projected cluster<strong>in</strong>g. In Proceed<strong>in</strong>gs <strong>of</strong> the ACM SIGMOD International Conference on<br />

Management <strong>of</strong> <strong>Data</strong> (SIGMOD ’99), pages 61–72. ACM, 1999.<br />

[5] C. C. Aggarwal and P. S. Yu. Redef<strong>in</strong><strong>in</strong>g cluster<strong>in</strong>g for high-dimensional applications. IEEE<br />

Transactions on Knowledge and <strong>Data</strong> Eng<strong>in</strong>eer<strong>in</strong>g, 14(2):210–225, 2002.<br />

[6] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic Subspace Cluster<strong>in</strong>g <strong>of</strong><br />

<strong>High</strong> <strong>Dimensional</strong> <strong>Data</strong> for <strong>Data</strong> M<strong>in</strong><strong>in</strong>g Applications. In Proceed<strong>in</strong>gs <strong>of</strong> the ACM SIGMOD<br />

International Conference on Management <strong>of</strong> <strong>Data</strong> (SIGMOD ’98), volume 27, pages 94–105.<br />

ACM, 1998.<br />

[7] R. Agrawal, T. Imiel<strong>in</strong>ski, and A. Swami. M<strong>in</strong><strong>in</strong>g Association Rules between Sets <strong>of</strong> Items<br />

<strong>in</strong> Large <strong>Data</strong>bases. In Proceed<strong>in</strong>gs <strong>of</strong> the ACM SIGMOD International Conference on<br />

Management <strong>of</strong> <strong>Data</strong> (SIGMOD ’93), pages 207–216. ACM, 1993.<br />

[8] G. Albuquerque, M. Eisemann, D. J. Lehmann, H. Theisel, and M. Magnor. Improv<strong>in</strong>g the<br />

<strong>Visual</strong> Analysis <strong>of</strong> <strong>High</strong>-dimensional <strong>Data</strong>sets Us<strong>in</strong>g Quality Measures. In Proceed<strong>in</strong>gs <strong>of</strong><br />

the IEEE Symposium on <strong>Visual</strong> <strong>Analytics</strong> Science and Technology (VAST ’10), pages 19–26.<br />

IEEE CS Press, 2010.<br />

[9] M. Ankerst, S. Berchtold, and D. A. Keim. Similarity cluster<strong>in</strong>g <strong>of</strong> dimensions for an enhanced<br />

visualization <strong>of</strong> multidimensional data. In Proceed<strong>in</strong>gs <strong>of</strong> the IEEE Symposium Information<br />

<strong>Visual</strong>ization (InfoVis ’98), pages 52–60. IEEE CS Press, 1998.<br />

[10] M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander. Optics: order<strong>in</strong>g po<strong>in</strong>ts to identify<br />

the cluster<strong>in</strong>g structure. In Proceed<strong>in</strong>gs <strong>of</strong> the ACM SIGMOD International Conference on<br />

Management <strong>of</strong> <strong>Data</strong> (SIGMOD ’99), pages 49–60. ACM, 1999.<br />

[11] M. Ankerst, M. Ester, and H. P. Kriegel. Towards an e ective cooperation <strong>of</strong> the user and the<br />

computer for classification. In Proceed<strong>in</strong>gs <strong>of</strong> the ACM SIGKDD International Conference<br />

on Knowledge Discovery and <strong>Data</strong> M<strong>in</strong><strong>in</strong>g (KDD ’00), pages 179–188, 2000.<br />

[12] D. L. Applegate, R. E. Bixby, V. Chvatal, and W. J. Cook. The Travel<strong>in</strong>g Salesman Problem:<br />

A Computational Study (Pr<strong>in</strong>ceton Series <strong>in</strong> Applied Mathematics). Pr<strong>in</strong>ceton University<br />

Press, 2007.<br />

[13] D. Asimov. The Grand Tour: A Tool for View<strong>in</strong>g Multidimensional <strong>Data</strong>. Journal on<br />

Scientific and Statistical Comput<strong>in</strong>g, 6(1):128–143, 1985.<br />

[14] I. Assent, R. Krieger, E. Müller, and T. Seidl. VISA: <strong>Visual</strong> Subspace Cluster<strong>in</strong>g Analysis.<br />

ACM SIGKDD Explorations Newsletter - Special Issue on <strong>Visual</strong> <strong>Analytics</strong>, 9(2):5–12, 2007.<br />

[15] R. A. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. ACM Press/Addison-<br />

Wesley, 1999.<br />

[16] C. Baumgartner, C. Plant, K. Kail<strong>in</strong>g, H.-P. Kriegel, and P. Kröger. Subspace selection for<br />

cluster<strong>in</strong>g high-dimensional data. In Proceed<strong>in</strong>gs <strong>of</strong> the Fourth IEEE Conference on <strong>Data</strong><br />

M<strong>in</strong><strong>in</strong>g (ICDM ’04), pages 11–18. IEEE CS Press, 2004.<br />

[17] R. Becker and W. Cleveland. Brush<strong>in</strong>g scatterplots. Technometrics, 29:127–142, 1987.<br />

[18] R. A. Becker, W. S. Cleveland, and M.-J. Shyu. The visual design and control <strong>of</strong> trellis<br />

display. Journal <strong>of</strong> Computational and Graphical Statistics, 5(2):123–155, 1996.


164 Bibliography<br />

[19] B. B. Bederson, J. D. Hollan, K. Perl<strong>in</strong>, J. Meyer, D. Bacon, and G. Furnas. Pad++: A<br />

Zoomable Graphical Sketchpad For Explor<strong>in</strong>g Alternate Interface Physics. Journal <strong>of</strong> <strong>Visual</strong><br />

Languages & Comput<strong>in</strong>g, 7(1):3–32, 1996.<br />

[20] R. Bellman. Dynamic Programm<strong>in</strong>g. Pr<strong>in</strong>ceton University Press, 1st edition, 1957.<br />

[21] P. Berkh<strong>in</strong>. A Survey <strong>of</strong> Cluster<strong>in</strong>g <strong>Data</strong> M<strong>in</strong><strong>in</strong>g Techniques. Group<strong>in</strong>g Multidimensional<br />

<strong>Data</strong>, pages 25–71, 2006.<br />

[22] J. Bert<strong>in</strong>. Semiology <strong>of</strong> graphics. University <strong>of</strong> Wiscons<strong>in</strong> Press, 1983.<br />

[23] E. Bert<strong>in</strong>i and D. Lalanne. Investigat<strong>in</strong>g and reflect<strong>in</strong>g on the <strong>in</strong>tegration <strong>of</strong> automatic data<br />

analysis and visualization <strong>in</strong> knowledge discovery. ACM SIGKDD Explorations Newsletter,<br />

11:9–18, 2010.<br />

[24] E. Bert<strong>in</strong>i and G. Santucci. Quality Metrics for 2D Scatterplot Graphics: Automatically<br />

Reduc<strong>in</strong>g <strong>Visual</strong> Clutter. In Proceed<strong>in</strong>gs Smart Graphics (SG), volume 3031, pages 77–89,<br />

2004.<br />

[25] E. Bert<strong>in</strong>i and G. Santucci. Give chance a chance: model<strong>in</strong>g density to enhance scatter plot<br />

quality through random data sampl<strong>in</strong>g. Information <strong>Visual</strong>ization, 5(2):95–110, 2006.<br />

[26] E. Bert<strong>in</strong>i and G. Santucci. <strong>Visual</strong> Quality Metrics. In Proceed<strong>in</strong>gs <strong>of</strong> the 2006 AVI workshop<br />

on BEyond time and errors: noveL evaluation methods for Information <strong>Visual</strong>ization<br />

(BELIV), pages 1–5. ACM, 2006.<br />

[27] E. Bert<strong>in</strong>i, A. Tatu, and D. A. Keim. Quality Metrics <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong> <strong>Visual</strong>ization:<br />

An Overview and Systematization. Proceed<strong>in</strong>gs <strong>of</strong> the IEEE Symposium on Information<br />

<strong>Visual</strong>ization (InfoVis ’11), 17(12):2203–2212, 2011.<br />

[28] K. Beyer, J. Goldste<strong>in</strong>, R. Ramakrishnan, and U. Shaft. When Is ”Nearest Neighbor” Mean<strong>in</strong>gful?<br />

In Proceed<strong>in</strong>gs <strong>of</strong> the 7th International Conference on <strong>Data</strong>base Theory (ICDT ’99),<br />

pages 217–235, 1999.<br />

[29] E. A. Bier, M. C. Stone, K. Pier, K. Fishk<strong>in</strong>, T. Baudel, M. Conway, W. Buxton, and<br />

T. DeRose. Toolglass and Magic Lenses: The See-Through Interface. In Conference Companion<br />

on Human Factors <strong>in</strong> Comput<strong>in</strong>g Systems (CHI ’94), pages 445–446. ACM, 1994.<br />

[30] T. Boogaerts, L.-C. Tranchevent, G. A. Pavlopoulos, J. Aerts, and J. Vandewalle. <strong>Visual</strong>iz<strong>in</strong>g<br />

high dimensional datasets us<strong>in</strong>g parallel coord<strong>in</strong>ates: Application to gene prioritization. In<br />

IEEE 12th International Conference on Bio<strong>in</strong>formatics & Bioeng<strong>in</strong>eer<strong>in</strong>g (BIBE ’12), pages<br />

52–57. IEEE CS Press, 2012.<br />

[31] I. Borg and P. Groenen. Modern Multidimensional Scal<strong>in</strong>g: Theory and Applications.<br />

Spr<strong>in</strong>ger, 2005.<br />

[32] R. Brath. Metrics for e ective <strong>in</strong>formation visualization. In Proceed<strong>in</strong>gs <strong>of</strong> the IEEE Symposium<br />

Information <strong>Visual</strong>ization (InfoVis ’97), pages 108–111, 1997.<br />

[33] S. Bremm, T. v. Landesberger, J. Bernard, and T. Schreck. Assisted descriptor selection<br />

based on visual comparative data analysis. Computer Graphics Forum, 30(3):891–900, 2011.<br />

[34] S. Bremm, T. v. Landesberger, M. Heß, T. Schreck, P. Weil, and K. Hamacher. Interactive<br />

visual comparison <strong>of</strong> multiple trees. In Proceed<strong>in</strong>gs <strong>of</strong> IEEE Symposium on <strong>Visual</strong> <strong>Analytics</strong><br />

Science and Technology (VAST ’11), pages 31–40. IEEE CS Press, 2011.<br />

[35] N. Cao, D. Gotz, J. Sun, and H. Qu. DICON: Interactive <strong>Visual</strong> Analysis <strong>of</strong> Multidimensional<br />

Clusters. IEEE Transactions on <strong>Visual</strong>ization and Computer Graphics (TVCG ’ 11),<br />

17:2581–2590, 2011.<br />

[36] S. K. Card, J. D. Mack<strong>in</strong>lay, and B. Shneiderman. Read<strong>in</strong>gs <strong>in</strong> <strong>in</strong>formation visualization:<br />

us<strong>in</strong>g vision to th<strong>in</strong>k. Morgan Kaufmann Publishers Inc., 1999.


Bibliography 165<br />

[37] D. B. Carr, R. J. Littlefield, and W. L. Nichloson. Scatterplot Matrix Techniques for Large<br />

N. In Proceed<strong>in</strong>gs <strong>of</strong> the Seventeenth Symposium on the Interface <strong>of</strong> Computer Sciences and<br />

Statistics on Computer Science and Statistics, pages 297–306. Elsevier North-Holland, Inc.,<br />

1986.<br />

[38] C.-H. Cheng, A. W. Fu, and Y. Zhang. Entropy-based subspace cluster<strong>in</strong>g for m<strong>in</strong><strong>in</strong>g numerical<br />

data. In Proceed<strong>in</strong>gs <strong>of</strong> the fifth ACM SIGKDD International Conference on Knowledge<br />

Discovery and <strong>Data</strong> M<strong>in</strong><strong>in</strong>g (KDD ’99), pages 84–93. ACM, 1999.<br />

[39] E. H. Chi. A Taxonomy <strong>of</strong> <strong>Visual</strong>ization Techniques Us<strong>in</strong>g the <strong>Data</strong> State Reference Model.<br />

In Proceed<strong>in</strong>gs <strong>of</strong> the IEEE Symposium on Information <strong>Visual</strong>ization (InfoVis ’00), pages<br />

69–75. IEEE CS Press, 2000.<br />

[40] K. W. Church and P. Hanks. Word association norms, mutual <strong>in</strong>formation, and lexicography.<br />

Computational L<strong>in</strong>guistics, 16(1):22–29, 1990.<br />

[41] T. Cox and M. Cox. Multidimensional Scal<strong>in</strong>g. Chapman & Hall, 1994.<br />

[42] Q. Cui, M. Ward, E. Rundenste<strong>in</strong>er, and J. Yang. Measur<strong>in</strong>g <strong>Data</strong> Abstraction Quality <strong>in</strong><br />

Multiresolution <strong>Visual</strong>izations. IEEE Transactions on <strong>Visual</strong>ization and Computer Graphics<br />

(TVCG ’06), 12:709–716, 2006.<br />

[43] A. Dasgupta and R. Kosara. Pargnostics: Screen-Space Metrics for Parallel Coord<strong>in</strong>ates.<br />

IEEE Transactions on <strong>Visual</strong>ization and Computer Graphics (TVCG ’10), 16:1017–1026,<br />

2010.<br />

[44] R. Duda, P. Hart, and D. Stork. Pattern Classification. Wiley-Interscience, 2nd edition,<br />

2001.<br />

[45] C. Dunne and B. Shneiderman. Improv<strong>in</strong>g graph draw<strong>in</strong>g readability by <strong>in</strong>corporat<strong>in</strong>g readability<br />

metrics: A s<strong>of</strong>tware tool for network analysts. Technical Report HCIL-2009-13, University<br />

<strong>of</strong> Maryland, 2009.<br />

[46] S. G. Eick and G. J. Wills. <strong>High</strong> Interaction Graphics. European Journal <strong>of</strong> Operations<br />

Research, 81(3):445–459, 1995.<br />

[47] M. Eisen, P. Spellman, P. Brown, and D. Botste<strong>in</strong>. Cluster analysis and display <strong>of</strong> genomewide<br />

expression patterns. Proceed<strong>in</strong>gs <strong>of</strong> the National Academy <strong>of</strong> Sciences, 95(25):14863–<br />

14868, 1998.<br />

[48] G. Ellis and A. Dix. Enabl<strong>in</strong>g Automatic Clutter Reduction <strong>in</strong> Parallel Coord<strong>in</strong>ate Plots.<br />

IEEE Transactions on <strong>Visual</strong>ization and Computer Graphics (TVCG ’06), 12:717–724, 2006.<br />

[49] G. Ellis and A. Dix. A taxonomy <strong>of</strong> clutter reduction for <strong>in</strong>formation visualisation. IEEE<br />

Transactions on <strong>Visual</strong>ization and Computer Graphics (TVCG ’07), 13:1216–1223, 2007.<br />

[50] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discover<strong>in</strong>g<br />

clusters <strong>in</strong> large spatial databases with noise. In Proceed<strong>in</strong>gs <strong>of</strong> the Second ACM SIGKDD<br />

International Conference on Knowledge Discovery and <strong>Data</strong> M<strong>in</strong><strong>in</strong>g (KDD ’96), pages 226–<br />

231. AAAI Press, 1996.<br />

[51] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. The KDD process for extract<strong>in</strong>g useful<br />

knowledge from volumes <strong>of</strong> data. Communications <strong>of</strong> the ACM, 39:27–34, 1996.<br />

[52] B. J. Ferdosi, H. Buddelmeijer, S. Trager, M. H. F. Wilk<strong>in</strong>son, and J. B. T. M. Roerd<strong>in</strong>k.<br />

F<strong>in</strong>d<strong>in</strong>g and visualiz<strong>in</strong>g relevant subspaces for cluster<strong>in</strong>g high-dimensional astronomical data<br />

us<strong>in</strong>g connected morphological operators. In Proceed<strong>in</strong>gs <strong>of</strong> the IEEE Symposium on <strong>Visual</strong><br />

<strong>Analytics</strong> Science and Technology (VAST ’11), pages 35–42. IEEE CS Press, 2010.<br />

[53] A. Frank and A. Asuncion. University <strong>of</strong> California Irv<strong>in</strong>e (UCI) Mach<strong>in</strong>e Learn<strong>in</strong>g Repository,<br />

2010.<br />

[54] J. H. Friedman and J. W. Tukey. A projection pursuit algorithm for exploratory data analysis.<br />

IEEE Transactions on Computers, 23:881–890, 1974.


166 Bibliography<br />

[55] Y.-H. Fua, M. Ward, and E. Rundenste<strong>in</strong>er. Hierarchical parallel coord<strong>in</strong>ates for exploration<br />

<strong>of</strong> large data sets. In Proceed<strong>in</strong>gs <strong>of</strong> the Conference on <strong>Visual</strong>ization (VIS ’99), pages 43–50.<br />

IEEE CS Press, 1999.<br />

[56] K. Fukunaga. Introduction to statistical pattern recognition. Academic Press Pr<strong>of</strong>essional,<br />

Inc., 2nd edition, 1990.<br />

[57] S. Guha, R. Rastogi, and K. Shim. Cure: an e cient cluster<strong>in</strong>g algorithm for large databases.<br />

In Proceed<strong>in</strong>gs <strong>of</strong> the ACM SIGMOD International Conference on Management <strong>of</strong> <strong>Data</strong><br />

(SIGMOD ’98), pages 73–84. ACM, 1998.<br />

[58] S. Günnemann, E. Müller, I. Färber, and T. Seidl. Detection <strong>of</strong> orthogonal concepts <strong>in</strong> subspaces<br />

<strong>of</strong> high dimensional data. In Proceed<strong>in</strong>gs <strong>of</strong> the 18th ACM conference on Information<br />

and knowledge management (CIKM ’09), pages 1317–1326, 2009.<br />

[59] D. Guo. Coord<strong>in</strong>at<strong>in</strong>g computational and visual approaches for <strong>in</strong>teractive feature selection<br />

and multivariate cluster<strong>in</strong>g. Information <strong>Visual</strong>ization, 2(4):232–246, 2003.<br />

[60] D. Guo, J. Chen, A. M. MacEachren, and K. Liao. A visualization system for space-time<br />

and multivariate patterns (vis-stamp). IEEE Transactions on <strong>Visual</strong>ization and Computer<br />

Graphics (TVCG ’06), 12(6):1461–1474, 2006.<br />

[61] I. Guyon and A. Elissee . An <strong>in</strong>troduction to variable and feature selection. Journal <strong>of</strong><br />

Mach<strong>in</strong>e Learn<strong>in</strong>g Research - Special Issue on Variable and Feature Selection, (3):1157–1182,<br />

2003.<br />

[62] M. Hahsler, K. Hornik, and C. Buchta. Gett<strong>in</strong>g th<strong>in</strong>gs <strong>in</strong> order: An <strong>in</strong>troduction to the R<br />

package seriation. Journal <strong>of</strong> Statistical S<strong>of</strong>tware, 25(3):1–34, 2008.<br />

[63] J. Han and M. Kamber. <strong>Data</strong> M<strong>in</strong><strong>in</strong>g: Concepts and Techniques. Morgan Kaufmann Publishers<br />

Inc., 1st edition, 2000.<br />

[64] J. Han and M. Kamber. <strong>Data</strong> M<strong>in</strong><strong>in</strong>g: Concepts and Techniques. Morgan Kaufmann Publishers<br />

Inc., 2nd edition, 2006.<br />

[65] S. Haroz and K.-L. Ma. Natural visualization. In Proceed<strong>in</strong>gs <strong>of</strong> Eurographics <strong>Visual</strong>ization<br />

Symposium, pages 43–50, 2006.<br />

[66] P. N. Hart, N. Nilsson, and B. Raphael. A Formal Basis for the Heuristic Determ<strong>in</strong>ation <strong>of</strong><br />

M<strong>in</strong>imum Cost Paths. IEEE Transactions on Systems Science and Cybernetics, 4(2):100–107,<br />

1968.<br />

[67] C. G. Healey, K. S. Booth, and J. T. Enns. <strong>High</strong>-speed visual estimation us<strong>in</strong>g preattentive<br />

process<strong>in</strong>g. ACM Transactions on Computer-Human Interaction (TOCHI ’96), 3(2):107–135,<br />

1996.<br />

[68] C. G. Healey and J. T. Enns. Build<strong>in</strong>g perceptual textures to visualize multidimensional<br />

datasets. In Proceed<strong>in</strong>gs <strong>of</strong> the Conference on <strong>Visual</strong>ization (VIS ’98), pages 111–118. IEEE<br />

CS Press, 1998.<br />

[69] A. H<strong>in</strong>neburg, C. C. Aggarwal, and D. A. Keim. What is the nearest neighbor <strong>in</strong> high<br />

dimensional spaces? In Proceed<strong>in</strong>gs <strong>of</strong> the 26th International Conference on Very Large<br />

<strong>Data</strong> Bases (VLDB ’00), pages 506–515. Morgan Kaufmann Publishers Inc., 2000.<br />

[70] A. H<strong>in</strong>neburg and D. A. Keim. An E cient Approach to Cluster<strong>in</strong>g <strong>in</strong> Large Multimedia<br />

<strong>Data</strong>bases with Noise. In Proceed<strong>in</strong>gs 4th International Conference on Knowledge Discovery<br />

<strong>in</strong> <strong>Data</strong>bases (KDD ’98), pages 58–65, 1998.<br />

[71] A. H<strong>in</strong>neburg and D. A. Keim. Optimal grid-cluster<strong>in</strong>g: Towards break<strong>in</strong>g the curse <strong>of</strong> dimensionality<br />

<strong>in</strong> high-dimensional cluster<strong>in</strong>g. In Proceed<strong>in</strong>gs <strong>of</strong> the 25th International Conference<br />

on Very Large <strong>Data</strong> Bases (VLDB ’99), pages 506–517. Morgan Kaufmann Publishers Inc.,<br />

1999.


Bibliography 167<br />

[72] P. Ho man, G. Gr<strong>in</strong>ste<strong>in</strong>, and D. P<strong>in</strong>kney. <strong>Dimensional</strong> anchors: a graphic primitive for<br />

multidimensional multivariate <strong>in</strong>formation visualizations. In Proceed<strong>in</strong>gs Workshop on New<br />

Paradigms <strong>in</strong> Information <strong>Visual</strong>ization and Manipulation (NPIVM ’99), pages 9–16.<br />

[73] P. V. C. Hough. Method and means for recogniz<strong>in</strong>g complex patterns. US Patent, 3069654,<br />

1962.<br />

[74] P. J. Huber. Projection pursuit. The Annals <strong>of</strong> Statistics, 13(2):435–475, 1985.<br />

[75] C. B. Hurley and R. W. Oldford. Pairwise display <strong>of</strong> high-dimensional <strong>in</strong>formation via<br />

eulerian tours and hamiltonian decompositions. Journal <strong>of</strong> Computational and Graphical<br />

Statistics, 19(4):861–886, 2010.<br />

[76] S. Ingram, T. Munzner, V. Irv<strong>in</strong>e, M. Tory, S. Bergner, and T. Möller. DimStiller: Workflows<br />

for dimensional analysis and reduction. In Proceed<strong>in</strong>gs <strong>of</strong> the IEEE Symposium on <strong>Visual</strong><br />

<strong>Analytics</strong> Science and Technology (VAST ’10). IEEE CS Press, 2010.<br />

[77] A. Inselberg. The plane with parallel coord<strong>in</strong>ates. The <strong>Visual</strong> Computer, 1(4):69–91, 1985.<br />

[78] A. Inselberg and B. Dimsdale. Parallel coord<strong>in</strong>ates: a tool for visualiz<strong>in</strong>g multi-dimensional<br />

geometry. In Proceed<strong>in</strong>gs <strong>of</strong> the IEEE Conference on <strong>Visual</strong>ization (VIS ’90). IEEECS<br />

Press, 1990.<br />

[79] H. Jänicke and M. Chen. A Salience-based Quality Metric for <strong>Visual</strong>ization. Computer<br />

Graphics Forum (Proc. EuroVis), 29(3):1183–1192, 2010.<br />

[80] J. Johansson and M. Cooper. A Screen Space Quality Method for <strong>Data</strong> Abstraction. Computer<br />

Graphics Forum (Proc. EuroVis), 27(3):1039–1046, 2008.<br />

[81] J. Johansson, C. Forsell, M. L<strong>in</strong>d, and M. Cooper. Perceiv<strong>in</strong>g patterns <strong>in</strong> parallel coord<strong>in</strong>ates:<br />

determ<strong>in</strong><strong>in</strong>g thresholds for identification <strong>of</strong> relationships. Information <strong>Visual</strong>ization,<br />

7(2):152–162, 2008.<br />

[82] S. Johansson and J. Johansson. Interactive <strong>Dimensional</strong>ity Reduction Through User-def<strong>in</strong>ed<br />

Comb<strong>in</strong>ations <strong>of</strong> Quality Metrics. IEEE Transactions on <strong>Visual</strong>ization and Computer Graphics<br />

(TVCG ’09), 15:993–1000, 2009.<br />

[83] I. T. Jolli e. Pr<strong>in</strong>cipal Component Analysis. Spr<strong>in</strong>ger, 2nd edition, 2002.<br />

[84] K. Kail<strong>in</strong>g, H.-P. Kriegel, P. Kröger, and S. Wanka. Rank<strong>in</strong>g <strong>in</strong>terest<strong>in</strong>g subspaces for cluster<strong>in</strong>g<br />

high dimensional data. In Proceed<strong>in</strong>gs <strong>of</strong> the 7th European Conference on Pr<strong>in</strong>ciples<br />

and Practice <strong>of</strong> Knowledge Discovery <strong>in</strong> <strong>Data</strong>bases (PKDD ’03), pages 241–252, 2003.<br />

[85] L. Kaufman and P. J. Rousseeuw. F<strong>in</strong>d<strong>in</strong>g Groups <strong>in</strong> <strong>Data</strong>: An Introduction to Cluster<br />

Analysis. Wiley-Interscience, 9th edition, 1990.<br />

[86] D. A. Keim, M. Ankerst, and M. Sips. <strong>Visual</strong> <strong>Data</strong>-M<strong>in</strong><strong>in</strong>g Techniques, pages 813–825.<br />

Kolam Publish<strong>in</strong>g, 2004.<br />

[87] D. A. Keim, M. C. Hao, U. Dayal, and M. Hsu. Pixel bar charts: A visualization technique<br />

for very large multi-attribute data sets. Information <strong>Visual</strong>ization, 1(1):20–34, 2002.<br />

[88] D. A. Keim, F. Mansmann, J. Schneidew<strong>in</strong>d, J. Thomas, and H. Ziegler. <strong>Visual</strong> analytics:<br />

Scope and challenges. In S. J. Simo , M. H. Böhlen, and A. Mazeika, editors, <strong>Visual</strong> <strong>Data</strong><br />

M<strong>in</strong><strong>in</strong>g: Theory, Techniques and Tools for <strong>Visual</strong> <strong>Analytics</strong>, pages 76–90. Spr<strong>in</strong>ger-Verlag,<br />

2008.<br />

[89] Y. Koren and L. Carmel. <strong>Visual</strong>ization <strong>of</strong> labeled data us<strong>in</strong>g l<strong>in</strong>ear transformations. Proceed<strong>in</strong>gs<br />

<strong>of</strong> the IEEE Symposium on Information <strong>Visual</strong>ization (InfoVis ’03), 0:16, 2003.<br />

[90] H.-P. Kriegel, P. Kröger, and A. Zimek. Cluster<strong>in</strong>g high-dimensional data: A survey on<br />

subspace cluster<strong>in</strong>g, pattern-based cluster<strong>in</strong>g, and correlation cluster<strong>in</strong>g. ACM Transactions<br />

on Knowledge Discovery from <strong>Data</strong> (TKDD ’09), 3(1):1–58, 2009.


168 Bibliography<br />

[91] J. LeBlanc, M. O. Ward, and N. Wittels. Explor<strong>in</strong>g N-dimensional databases. In Proceed<strong>in</strong>gs<br />

<strong>of</strong> the IEEE Conference on <strong>Visual</strong>ization (VIS ’90). IEEE CS Press, 1990.<br />

[92] Y. K. Leung and M. D. Aerley. A review and taxonomy <strong>of</strong> distortion-oriented presentation<br />

techniques. ACM Transactions on Computer-Human Interaction, 1(2):126–160, 1994.<br />

[93] A. Lex, M. Streit, C. Partl, and D. Schmalstieg. Comparative analysis <strong>of</strong> multidimensional,<br />

quantitative data. IEEE Transactions on <strong>Visual</strong>ization and Computer Graphics (TVCG ’10),<br />

16(6):1027–1035, 2010.<br />

[94] J. Li, J.-B. Martens, and J. J. van Wijk. Judg<strong>in</strong>g correlation from scatterplots and parallel<br />

coord<strong>in</strong>ate plots. Information <strong>Visual</strong>ization, 9(1):13–30, 2008.<br />

[95] M. A. Little, P. E. McSharry, E. J. Hunter, and L. O. Ramig. Suitability <strong>of</strong> dysphonia<br />

measurements for telemonitor<strong>in</strong>g <strong>of</strong> park<strong>in</strong>son’s disease. In IEEE Transactions on Biomedical<br />

Eng<strong>in</strong>eer<strong>in</strong>g, pages 1015–1022, 2009.<br />

[96] M. A. Little, P. E. Mcsharry, S. J. Roberts, D. A. E. Costello, and I. M. Moroz. Exploit<strong>in</strong>g<br />

nonl<strong>in</strong>ear recurrence and fractal scal<strong>in</strong>g properties for voice disorder detection. BioMedical<br />

Eng<strong>in</strong>eer<strong>in</strong>g OnL<strong>in</strong>e, 6(1):23, 2007.<br />

[97] H. Liu and H. Motoda. Computational Methods <strong>of</strong> Feature Selection. Chapman & Hall/CRC,<br />

2008. edited by Huan Liu and Hiroshi Motoda.; Includes bibliographical references and <strong>in</strong>dex.<br />

[98] A. MacEachren, X. Dai, F. Hardisty, D. Guo, and G. Lengerich. Explor<strong>in</strong>g high-D spaces<br />

with multiform matrices and small multiples. In Proceed<strong>in</strong>gs <strong>of</strong> the IEEE Symposium on<br />

Information <strong>Visual</strong>ization (InfoVis ’03), pages 31–38. IEEE CS Press, 2003.<br />

[99] J. Mack<strong>in</strong>lay. Automat<strong>in</strong>g the design <strong>of</strong> graphical presentations <strong>of</strong> relational <strong>in</strong>formation.<br />

ACM Transactions on Graphics, 5(2):110–141, 1986.<br />

[100] N. Miller, B. Hetzler, G. Nakamura, and P. Whitney. The need for metrics <strong>in</strong> visual <strong>in</strong>formation<br />

analysis. In Proceed<strong>in</strong>gs <strong>of</strong> the Workshop on New Paradigms <strong>in</strong> Information <strong>Visual</strong>ization<br />

and Manipulation. ACM, 1997.<br />

[101] E. Müller, I. Assent, S. Günnemann, R. Krieger, and T. Seidl. Relevant subspace cluster<strong>in</strong>g:<br />

M<strong>in</strong><strong>in</strong>g the most <strong>in</strong>terest<strong>in</strong>g non-redundant concepts <strong>in</strong> high dimensional data. In Proceed<strong>in</strong>gs<br />

<strong>of</strong> the IEEE International Conference on <strong>Data</strong> M<strong>in</strong><strong>in</strong>g (ICDM ’09), pages 377–386, 2009.<br />

[102] E. Müller, S. Günnemann, I. Assent, and T. Seidl. Evaluat<strong>in</strong>g cluster<strong>in</strong>g <strong>in</strong> subspace projections<br />

<strong>of</strong> high dimensional data. In Proceed<strong>in</strong>gs <strong>of</strong> the International Conference on Very<br />

Large <strong>Data</strong> Bases (VLDB ’09), volume 2, pages 1270–1281, 2009.<br />

[103] E. Müller, S. Günnemann, I. Färber, and T. Seidl. Discover<strong>in</strong>g multiple cluster<strong>in</strong>g solutions:<br />

Group<strong>in</strong>g objects <strong>in</strong> di erent views <strong>of</strong> the data. In Proceed<strong>in</strong>gs <strong>of</strong> the 10th IEEE Conference<br />

on <strong>Data</strong> M<strong>in</strong><strong>in</strong>g (ICDM ’10), page 1220, 2010.<br />

[104] E. Müller, S. Günnemann, I. Färber, and T. Seidl. Discover<strong>in</strong>g multiple cluster<strong>in</strong>g solutions:<br />

Group<strong>in</strong>g objects <strong>in</strong> di erent views <strong>of</strong> the data. In Tutorial at the 16th Pacific-Asia<br />

Conference on Knowledge Discovery and <strong>Data</strong> M<strong>in</strong><strong>in</strong>g (PAKDD ’12), 2012.<br />

[105] T. Munzner. <strong>Visual</strong>ization (Chapter 27). In Fundamentals <strong>of</strong> Graphics, pages 675–707. AK<br />

Peters, 3rd edition, 2009.<br />

[106] E. MÃ ller, I. Assent, S. GÃ nnemann, T. Jansen, and T. Seidl. Opensubspace: An open<br />

source framework for evaluation and exploration <strong>of</strong> subspace cluster<strong>in</strong>g algorithms <strong>in</strong> weka.<br />

In Proceed<strong>in</strong>gs <strong>of</strong> the 1st Open Source <strong>in</strong> <strong>Data</strong> M<strong>in</strong><strong>in</strong>g Workshop (OSDM ’09) <strong>in</strong> conjunction<br />

with 13th Pacific-Asia Conference on Knowledge Discovery and <strong>Data</strong> M<strong>in</strong><strong>in</strong>g (PAKDD ’09),<br />

pages 2–13, 2009.<br />

[107] D. Niu, J. G. Dy, and M. I. Jordan. Multiple non-redundant spectral cluster<strong>in</strong>g views.<br />

In Proceed<strong>in</strong>gs <strong>of</strong> the 27th International Conference on Mach<strong>in</strong>e Learn<strong>in</strong>g (ICML), pages<br />

831–838. Omnipress, 2010.


Bibliography 169<br />

[108] C. North. Toward measur<strong>in</strong>g visualization <strong>in</strong>sight. IEEE Computer Graphics and Applications,<br />

26(3):6–9, 2006.<br />

[109] D. Oelke, H. Janetzko, S. Simon, K. Neuhaus, and D. A. Keim. <strong>Visual</strong> Boost<strong>in</strong>g <strong>in</strong> Pixelbased<br />

<strong>Visual</strong>izations. Computer Graphics Forum (Proc. EuroVis), 30(3):871–880, 2011.<br />

[110] L. Parsons, E. Haque, and H. Liu. Subspace Cluster<strong>in</strong>g for <strong>High</strong> <strong>Dimensional</strong> <strong>Data</strong>: A<br />

Review. ACM SIGKDD Explorations Newsletter - Special Issue on Learn<strong>in</strong>g from Imbalanced<br />

<strong>Data</strong>sets, 6(1):90–105, 2004.<br />

[111] F. Paulovich, M. Oliveira, and R. M<strong>in</strong>ghim. The Projection Explorer: A Flexible Tool<br />

for Projection-based Multidimensional <strong>Visual</strong>ization. In Proceed<strong>in</strong>gs <strong>of</strong> the XX Brazilian<br />

Symposium on Computer Graphics and Image Process<strong>in</strong>g (SIBGRAPI ’07), pages 27–36,<br />

Oct.<br />

[112] W. Peng, M. O. Ward, and E. A. Rundenste<strong>in</strong>er. Clutter Reduction <strong>in</strong> Multi-<strong>Dimensional</strong><br />

<strong>Data</strong> <strong>Visual</strong>ization Us<strong>in</strong>g Dimension Reorder<strong>in</strong>g. In Proceed<strong>in</strong>gs <strong>of</strong> the IEEE Symposium on<br />

Information <strong>Visual</strong>ization (InfoVis ’04), pages 89–96. IEEE CS Press, 2004.<br />

[113] C. Plaisant, J.-D. Fekete, and G. Gr<strong>in</strong>ste<strong>in</strong>. Promot<strong>in</strong>g <strong>in</strong>sight-based evaluation <strong>of</strong> visualizations:<br />

From contest to benchmark repository. IEEE Transactions on <strong>Visual</strong>ization and<br />

Computer Graphics (TVCG ’08), 14(1):120–134, 2008.<br />

[114] W. M. Rand. Objective criteria for the evaluation <strong>of</strong> cluster<strong>in</strong>g methods. Journal <strong>of</strong> the<br />

American Statistical Association, 66(336):846–850, 1971.<br />

[115] R. Rao and S. K. Card. The table lens: merg<strong>in</strong>g graphical and symbolic representations <strong>in</strong><br />

an <strong>in</strong>teractive focus + context visualization for tabular <strong>in</strong>formation. In Proceed<strong>in</strong>gs <strong>of</strong> the<br />

SIGCHI Conference on Human Factors <strong>in</strong> Comput<strong>in</strong>g Systems (CHI ’94). ACM, 1994.<br />

[116] R. A. Rens<strong>in</strong>k and G. Baldridge. The perception <strong>of</strong> correlation <strong>in</strong> scatterplots. Computer<br />

Graphics Forum (Proc. EuroVis), 29(3):1203–1210, 2010.<br />

[117] D. J. Rogers and T. T. Tanimoto. A Computer Program for Classify<strong>in</strong>g Plants. Science,<br />

132(3434):1115–1118, 1960.<br />

[118] R. Rosenholtz, Y. Li, J. Mansfield, and Z. J<strong>in</strong>. Feature congestion: a measure <strong>of</strong> display<br />

clutter. In Proceed<strong>in</strong>gs <strong>of</strong> the SIGCHI Conference on Human Factors <strong>in</strong> Comput<strong>in</strong>g Systems<br />

(CHI ’05), pages 761–770. ACM, 2005.<br />

[119] M. Schaefer, L. Zhang, T. Schreck, A. Tatu, J. A. Lee, M. Verleysen, and D. A. Keim.<br />

Improv<strong>in</strong>g projection-based data analysis by feature space transformations. In Proceed<strong>in</strong>gs<br />

<strong>of</strong> SPIE 8654, <strong>Visual</strong>ization and <strong>Data</strong> Analysis (VDA ’13), volume 8654, pages 86540H–<br />

86540H–15, 2013.<br />

[120] J. Schneidew<strong>in</strong>d, M. Sips, and D. A. Keim. Pixnostics: Towards measur<strong>in</strong>g the value <strong>of</strong> visualization.<br />

In Proceed<strong>in</strong>gs <strong>of</strong> the IEEE Symposium on <strong>Visual</strong> <strong>Analytics</strong> Science and Technology<br />

(VAST ’06), pages 199–206. IEEE CS Press, 2006.<br />

[121] T. Schreck, T. von Landesberger, and S. Bremm. Techniques for precision-based visual<br />

analysis <strong>of</strong> projected data. Palgrave Macmillan Information <strong>Visual</strong>ization, 9(3):181–193,<br />

2010.<br />

[122] M. Sedlmair, A. Tatu, T. Munzner, and M. Tory. A taxonomy <strong>of</strong> visual cluster separation<br />

factors. Computer Graphics Forum (Proc. EuroVis), 31(3):1335–1344, 2012.<br />

[123] E. Segel and J. Heer. Narrative visualization: Tell<strong>in</strong>g stories with data. IEEE Transactions<br />

on <strong>Visual</strong>ization and Computer Graphics (TVCG ’10), 16:1139–1148, 2010.<br />

[124] J. Seo and B. Shneiderman. Interactively explor<strong>in</strong>g hierarchical cluster<strong>in</strong>g results. Computer,<br />

35(7):80–86, 2002.


170 Bibliography<br />

[125] J. Seo and B. Shneiderman. A rank-by-feature framework for unsupervised multidimensional<br />

data exploration us<strong>in</strong>g low dimensional projections. In Proceed<strong>in</strong>gs <strong>of</strong> IEEE Symposium on<br />

Information <strong>Visual</strong>ization (InfoVis ’04), pages 65–72. IEEE CS Press, 2004.<br />

[126] J. Seo and B. Shneiderman. A rank-by-feature framework for <strong>in</strong>teractive exploration <strong>of</strong><br />

multidimensional data. Information <strong>Visual</strong>ization, 4(2):96–113, 2005.<br />

[127] B. Shneiderman. The Eyes Have It: A Task by <strong>Data</strong> Type Taxonomy for Information<br />

<strong>Visual</strong>izations. In Proceed<strong>in</strong>gs <strong>of</strong> the IEEE Symposium on <strong>Visual</strong> Languages (VL), pages<br />

336–343. IEEE CS Press, 1996.<br />

[128] J. H. Siegel, E. J. Farrell, R. M. Goldwyn, and H. P. Friedman. The surgical implication <strong>of</strong><br />

physiologic patterns <strong>in</strong> myocardial <strong>in</strong>farction shock. Surgery, 72:126–141, 1972.<br />

[129] M. Sips, B. Neubert, J. P. Lewis, and P. Hanrahan. Select<strong>in</strong>g good views <strong>of</strong> high-dimensional<br />

data us<strong>in</strong>g class consistency. Computer Graphics Forum (Proc. EuroVis), 28(3):831–838,<br />

2009.<br />

[130] A. Strauss and J. M. Corb<strong>in</strong>. Basics <strong>of</strong> Qualitative Research: Techniques and Procedures for<br />

Develop<strong>in</strong>g Grounded Theory. SAGE Publications, 1998.<br />

[131] W. Street, W. Wolberg, and O. Mangasarian. Nuclear feature extraction for breast tumor<br />

diagnosis. IS&T / SPIE International Symposium on Electronic Imag<strong>in</strong>g: Science and<br />

Technology, 1905:861–870, 1993.<br />

[132] A. Tatu, G. Albuquerque, M. Eisemann, P. Bak, H. Theisel, M. Magnor, and D. A. Keim.<br />

Automated <strong>Visual</strong> Analysis Methods for an E ective Exploration <strong>of</strong> <strong>High</strong>-<strong>Dimensional</strong> <strong>Data</strong>.<br />

IEEE Transactions on <strong>Visual</strong>ization and Computer Graphics (TVCG ’11), 17(5):pp. 584–<br />

597, 2011.<br />

[133] A. Tatu, G. Albuquerque, M. Eisemann, J. Schneidew<strong>in</strong>d, H. Theisel, M. Magnor, and<br />

D. Keim. Comb<strong>in</strong><strong>in</strong>g automated analysis and visualization techniques for e ective exploration<br />

<strong>of</strong> high dimensional data. Proceed<strong>in</strong>gs <strong>of</strong> the IEEE Symposium on <strong>Visual</strong> <strong>Analytics</strong> Science<br />

and Technology (VAST ’09), pages 59–66, 2009.<br />

[134] A. Tatu, P. Bak, E. Bert<strong>in</strong>i, D. A. Keim, and J. Schneidew<strong>in</strong>d. <strong>Visual</strong> quality metrics and<br />

human perception: an <strong>in</strong>itial study on 2D projections <strong>of</strong> large multidimensional data. In<br />

Proceed<strong>in</strong>gs <strong>of</strong> the Work<strong>in</strong>g Conference on Advanced <strong>Visual</strong> Interfaces (AVI), pages 49–56.<br />

ACM, 2010.<br />

[135] A. Tatu, F. Maaß, I. Färber, E. Bert<strong>in</strong>i, T. Schreck, T. Seidl, and D. Keim. Subspace<br />

Search and <strong>Visual</strong>ization to Make Sense <strong>of</strong> Alternative Cluster<strong>in</strong>gs <strong>in</strong> <strong>High</strong>-<strong>Dimensional</strong><br />

<strong>Data</strong>. Proceed<strong>in</strong>gs <strong>of</strong> the IEEE Symposium on <strong>Visual</strong> <strong>Analytics</strong> Science and Technology<br />

(VAST ’12), pages 63–72, 2012.<br />

[136] A. Tatu, L. Zhang, E. Bert<strong>in</strong>i, T. Schreck, D. A. Keim, S. Bremm, and T. von Landesberger.<br />

ClustNails: <strong>Visual</strong> Analysis <strong>of</strong> Subspace Clusters. Ts<strong>in</strong>ghua Science and Technology, Special<br />

Issue on <strong>Visual</strong>ization and Computer Graphics, 17(4):419–428, 2012.<br />

[137] J. J. Thomas and K. A. Cook. Illum<strong>in</strong>at<strong>in</strong>g the Path: The Research and Development Agenda<br />

for <strong>Visual</strong> <strong>Analytics</strong>. National <strong>Visual</strong>ization and <strong>Analytics</strong> Ctr, 2005.<br />

[138] M. Tory and T. Möller. Reth<strong>in</strong>k<strong>in</strong>g <strong>Visual</strong>ization: A <strong>High</strong>-Level Taxonomy. In Proceed<strong>in</strong>gs<br />

<strong>of</strong> the IEEE Symposium on Information <strong>Visual</strong>ization (InfoVis ’04), pages 151–158. IEEE<br />

CS Press, 2004.<br />

[139] E. R. Tufte. The visual display <strong>of</strong> quantitative <strong>in</strong>formation. Graphics Press, 1986.<br />

[140] J. Tukey and P. Tukey. Computer graphics and exploratory data analysis: An <strong>in</strong>troduction.<br />

Proceed<strong>in</strong>gs <strong>of</strong> the Annual Conference and Exposition: Computer Graphics, 3:773–785, 1985.


Bibliography 171<br />

[141] S. Vadapalli and K. Karlapalem. Heidi matrix: nearest neighbor driven high dimensional<br />

data visualization. In Proceed<strong>in</strong>gs <strong>of</strong> the ACM SIGKDD Workshop on <strong>Visual</strong> <strong>Analytics</strong> and<br />

Knowledge Discovery, pages 83–92, 2009.<br />

[142] S. van den Elzen and J. J. van Wijk. BaobabView: Interactive Construction and Analysis<br />

<strong>of</strong> Decision Trees. In Proceed<strong>in</strong>gs <strong>of</strong> the IEEE Symposium on <strong>Visual</strong> <strong>Analytics</strong> Science and<br />

Technology (VAST ’11), pages 151–160. IEEE CS Press, 2011.<br />

[143] L. van der Maaten and G. H<strong>in</strong>ton. <strong>Visual</strong>iz<strong>in</strong>g data us<strong>in</strong>g t-SNE. Journal <strong>of</strong> Mach<strong>in</strong>e<br />

Learn<strong>in</strong>g Research, 9(2579-2605):85, 2008.<br />

[144] J. Ward. Hierarchical group<strong>in</strong>g to optimize an objective function. Journal <strong>of</strong> the American<br />

Statistical Association, 58:236–244, 1963.<br />

[145] M. Ward, G. Gr<strong>in</strong>ste<strong>in</strong>, and D. Keim. Interactive <strong>Data</strong> <strong>Visual</strong>ization: Foundations, Techniques,<br />

and Applications. Taylor & Francis, 2010.<br />

[146] M. O. Ward. Xmdvtool: Integrat<strong>in</strong>g multiple methods for visualiz<strong>in</strong>g multivariate data.<br />

In Proceed<strong>in</strong>gs <strong>of</strong> the IEEE Symposium on Information <strong>Visual</strong>ization (InfoVis ’94), pages<br />

326–333. IEEE CS Press, 1994.<br />

[147] M. O. Ward. A taxonomy <strong>of</strong> glyph placement strategies for multidimensional data visualization.<br />

Information <strong>Visual</strong>ization, 1(3/4):194–210, 2002.<br />

[148] C. Ware. Information <strong>Visual</strong>ization: Perception for Design. Morgan Kaufmann Publishers<br />

Inc., 2004.<br />

[149] C. Ware, H. Purchase, L. Colpoys, and M. McGill. Cognitive measurements <strong>of</strong> graph aesthetics.<br />

Information <strong>Visual</strong>ization, 1:103–110, 2002.<br />

[150] M. Wattenberg. A note on space-fill<strong>in</strong>g visualizations and space-fill<strong>in</strong>g curves. In Proceed<strong>in</strong>gs<br />

<strong>of</strong> the IEEE Symposium on Information <strong>Visual</strong>ization (InfoVis ’05). IEEE CS Press, 2005.<br />

[151] L. Wilk<strong>in</strong>son, A. Anand, and R. Grossman. Graph-theoretic scagnostics. In Proceed<strong>in</strong>gs <strong>of</strong><br />

the IEEE Symposium on Information <strong>Visual</strong>ization (InfoVis ’05), pages 157–164. IEEE CS<br />

Press, 2005.<br />

[152] L. Wilk<strong>in</strong>son, A. Anand, and R. Grossman. <strong>High</strong>-dimensional visual analytics: Interactive<br />

exploration guided by pairwise views <strong>of</strong> po<strong>in</strong>t distributions. IEEE Transactions on <strong>Visual</strong>ization<br />

and Computer Graphics (TVCG ’06), 12:1363–1372, 2006.<br />

[153] A. Wismueller, M. Verleysen, M. Aupetit, and J. A. Lee. Recent Advances <strong>in</strong> Nonl<strong>in</strong>ear<br />

<strong>Dimensional</strong>ity Reduction, Manifold and Topological Learn<strong>in</strong>g. 18th European Symposium<br />

on Artificial Neural Networks - Computational Intelligence and Mach<strong>in</strong>e Learn<strong>in</strong>g (ESANN),<br />

pages 71–80, 2010.<br />

[154] I. H. Witten and E. Frank. <strong>Data</strong> M<strong>in</strong><strong>in</strong>g: Practical Mach<strong>in</strong>e Learn<strong>in</strong>g Tools and Techniques.<br />

The Morgan Kaufmann Series <strong>in</strong> <strong>Data</strong> Management Systems. Morgan Kaufmann Publishers,<br />

2nd edition, 2005.<br />

[155] R. Xu and D. C. W. II. Survey <strong>of</strong> cluster<strong>in</strong>g algorithms. IEEE Transactions on Neural<br />

Networks, 16(3):645–678, 2005.<br />

[156] J. Yang, D. Hubball, M. O. Ward, E. A. Rundenste<strong>in</strong>er, and W. Ribarsky. Value and relation<br />

display: Interactive visual exploration <strong>of</strong> large data sets with hundreds <strong>of</strong> dimensions. IEEE<br />

Transactions on <strong>Visual</strong>ization and Computer Graphics (TVCG ’07), 13:494–507, 2007.<br />

[157] J. Yang, A. Patro, S. Huang, N. Mehta, M. O. Ward, and E. A. Rundenste<strong>in</strong>er. Value and<br />

Relation Display for Interactive Exploration <strong>of</strong> <strong>High</strong> <strong>Dimensional</strong> <strong>Data</strong>sets. In Proceed<strong>in</strong>gs <strong>of</strong><br />

IEEE Symposium on Information <strong>Visual</strong>ization (InfoVis ’04), pages 73–80. IEEE CS Press,<br />

2004.


172 Bibliography<br />

[158] J. Yang, W. Peng, M. O. Ward, and E. A. Rundenste<strong>in</strong>er. Interactive Hierarchical Dimension<br />

Order<strong>in</strong>g, Spac<strong>in</strong>g and Filter<strong>in</strong>g for Exploration <strong>of</strong> <strong>High</strong> <strong>Dimensional</strong> <strong>Data</strong>sets. In Proceed<strong>in</strong>gs<br />

<strong>of</strong> the IEEE Symposium Information <strong>Visual</strong>ization (InfoVis ’03). IEEE CS Press, 2003.<br />

[159] J. Yang, M. O. Ward, E. A. Rundenste<strong>in</strong>er, and S. Huang. <strong>Visual</strong> hierarchical dimension<br />

reduction for exploration <strong>of</strong> high dimensional datasets. In Proceed<strong>in</strong>gs <strong>of</strong> the Symposium on<br />

<strong>Data</strong> <strong>Visual</strong>ization (VISSYM), pages 19–28. Eurographics Association, 2003.<br />

[160] J. S. Yi, Y. a. Kang, J. Stasko, and J. Jacko. Toward a deeper understand<strong>in</strong>g <strong>of</strong> the role <strong>of</strong><br />

<strong>in</strong>teraction <strong>in</strong> <strong>in</strong>formation visualization. IEEE Transactions on <strong>Visual</strong>ization and Computer<br />

Graphics (TVCG ’07), 13:1224–1231, 2007.<br />

[161] X. Yuan, Z. Wang, and C. Guo. Mds-tree and mds-matrix for high dimensional data visualization.<br />

In Proceed<strong>in</strong>gs <strong>of</strong> IEEE Symposium on Information <strong>Visual</strong>ization (InfoVis ’11),<br />

2011. Poster abstract.<br />

[162] T. Zhang, R. Ramakrishnan, and M. Livny. Birch: an e cient data cluster<strong>in</strong>g method for<br />

very large databases. In Proceed<strong>in</strong>gs <strong>of</strong> the ACM SIGMOD International Conference on<br />

Management <strong>of</strong> <strong>Data</strong> (SIGMOD ’96), pages 103–114, New York, NY, USA, 1996. ACM.<br />

[163] J. Zupan, M. Novic, X. Li, and J. Gasteiger. Classification <strong>of</strong> multicomponent analytical<br />

data <strong>of</strong> olive oils us<strong>in</strong>g di erent neural networks. In Analytica Chimica Acta, volume 292,<br />

pages 219–234, 1994.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!