01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

42<br />

Figure 3-3: Example of distance measurement in hierarchical clustering methods<br />

It is clear that the number and type of clusters will strongly depend on the choice of the distance<br />

metric and on the method used to merge the clusters. A typical measure of distance between two<br />

N-dimensional data points xy , takes the general form:<br />

p<br />

d ( x, y)<br />

= x −y<br />

N<br />

∑ i i<br />

p<br />

(3.1)<br />

i=<br />

1<br />

The 1-norm distance, i.e. p=1, sometimes referred to as the Manhattan distance, because it is the<br />

distance a car would drive in a city laid out in square blocks (if there are no one-way streets). The<br />

2-norm distance is the classical Euclidean distance.<br />

Figure 3-4 shows examples of data sets in which such nearest neighbor technique would fail.<br />

Failure to converge to a correct solution might occur, for instance, when the data points within a<br />

dataset are further apart than the two clusters. An even worse situation occurs when the clusters<br />

contain one another, as shown in Figure 3-4 right. In other words such a simple clustering<br />

technique works well only when the clusters are linearly separable. A solution to such a situation<br />

is to change coordinate system, e.g. using polar coordinates. However, determining the<br />

appropriate coordinate system remains a challenge in itself.<br />

Figure 3-4: Example of pairs of clusters, easy to see but awkward to extract for clustering algorithms<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!