MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
44<br />
The CURE algorithm has a number of features of general significance. It takes special<br />
care of outliers. It also uses two devices to achieve scalability. A major feature of CURE is that it<br />
represents a cluster by a fixed number c of points scattered around it. The distance between two<br />
clusters used in the agglomerative process is equal to the minimum of distances between two<br />
scattered representatives. Therefore, CURE takes a middleground approach between the graph<br />
(all-points) methods and the geometric (one centroid) methods. Single and average link<br />
closeness is replaced by representatives. Selecting representatives scattered around a cluster<br />
makes it possible to cover non-spherical shapes. As before, agglomeration continues until the<br />
requested number k of clusters is achieved. CURE employs one additional device: originally<br />
selected scattered points are shrunk to the geometric centroid of the cluster by user-specified<br />
factor α. Shrinkage suppresses the affect of the outliers since outliers happen to be located<br />
further from the cluster centroid than the other scattered representatives. CURE is capable of<br />
finding clusters of different shapes and sizes, and it is insensitive to outliers. Since CURE uses<br />
sampling, estimation of its complexity is not straightforward.<br />
Figure 3-7: Agglomeration with CURE. Three clusters, each with three representatives, are<br />
shown before and after the merge and shrinkage. Two closest representatives are connected by<br />
arrow.<br />
Summary:<br />
Advantages of hierarchical clustering include:<br />
• Embedded flexibility regarding the level of granularity<br />
• Ease of handling of any forms of similarity or distance<br />
• Consequently, applicability to any attribute types<br />
© A.G.Billard 2004 – Last Update March 2011