01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

44<br />

The CURE algorithm has a number of features of general significance. It takes special<br />

care of outliers. It also uses two devices to achieve scalability. A major feature of CURE is that it<br />

represents a cluster by a fixed number c of points scattered around it. The distance between two<br />

clusters used in the agglomerative process is equal to the minimum of distances between two<br />

scattered representatives. Therefore, CURE takes a middleground approach between the graph<br />

(all-points) methods and the geometric (one centroid) methods. Single and average link<br />

closeness is replaced by representatives. Selecting representatives scattered around a cluster<br />

makes it possible to cover non-spherical shapes. As before, agglomeration continues until the<br />

requested number k of clusters is achieved. CURE employs one additional device: originally<br />

selected scattered points are shrunk to the geometric centroid of the cluster by user-specified<br />

factor α. Shrinkage suppresses the affect of the outliers since outliers happen to be located<br />

further from the cluster centroid than the other scattered representatives. CURE is capable of<br />

finding clusters of different shapes and sizes, and it is insensitive to outliers. Since CURE uses<br />

sampling, estimation of its complexity is not straightforward.<br />

Figure 3-7: Agglomeration with CURE. Three clusters, each with three representatives, are<br />

shown before and after the merge and shrinkage. Two closest representatives are connected by<br />

arrow.<br />

Summary:<br />

Advantages of hierarchical clustering include:<br />

• Embedded flexibility regarding the level of granularity<br />

• Ease of handling of any forms of similarity or distance<br />

• Consequently, applicability to any attribute types<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!