MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA MACHINE LEARNING TECHNIQUES - LASA

01.11.2014 Views

44 The CURE algorithm has a number of features of general significance. It takes special care of outliers. It also uses two devices to achieve scalability. A major feature of CURE is that it represents a cluster by a fixed number c of points scattered around it. The distance between two clusters used in the agglomerative process is equal to the minimum of distances between two scattered representatives. Therefore, CURE takes a middleground approach between the graph (all-points) methods and the geometric (one centroid) methods. Single and average link closeness is replaced by representatives. Selecting representatives scattered around a cluster makes it possible to cover non-spherical shapes. As before, agglomeration continues until the requested number k of clusters is achieved. CURE employs one additional device: originally selected scattered points are shrunk to the geometric centroid of the cluster by user-specified factor α. Shrinkage suppresses the affect of the outliers since outliers happen to be located further from the cluster centroid than the other scattered representatives. CURE is capable of finding clusters of different shapes and sizes, and it is insensitive to outliers. Since CURE uses sampling, estimation of its complexity is not straightforward. Figure 3-7: Agglomeration with CURE. Three clusters, each with three representatives, are shown before and after the merge and shrinkage. Two closest representatives are connected by arrow. Summary: Advantages of hierarchical clustering include: • Embedded flexibility regarding the level of granularity • Ease of handling of any forms of similarity or distance • Consequently, applicability to any attribute types © A.G.Billard 2004 – Last Update March 2011

45 Disadvantages of hierarchical clustering are related to: • Vagueness of termination criteria • The fact that most hierarchical algorithms do not revisit once constructed (intermediate) clusters with the purpose of their improvement Exercise: Determine the optimal distance metric for two examples, in which the data are either not linearly separable or linearly separable. 3.1.2 K-means clustering K-Means clustering generates a number K of disjoint, flat (non-hierarchical) k clusters k = K so as to minimize the sum-of-squares criterion C , 1,... , J ( ,... ) µ µ = x −µ 1 K K i k k = 1 k i∈C 2 ∑∑ (3.2) where x i is a vector representing the i th data point and µ is the geometric centroid of the data k points associated to cluster C . K-means is well suited to generating globular clusters. The K- Means method is numerical, unsupervised, non-deterministic and iterative. In general, the algorithm is not guaranteed to converge to a global minimum of J. k K-Means Algorithm: 1 Initialization: Pick K arbitrary centroids and set their geometric means µ ,..., 1 µ to K random values. 2 Calculate the distance dx ( , µ ) from each data point i to each centroid k . 3 Assignment Step: Assign the responsibility i k i r k of each data point i to its “closest” centroid k i (E-step). If a tie happens (i.e. two centroids are equidistant to a data point, one assigns the data point to the smallest winning centroid). { ( )} k arg min d x , µ = (3.3) i i k k i k ⎧⎧ 1 if k = k ri = ⎨⎨ ⎩⎩ 0 otherwise 4 Update Step: Adjust the centroids to be the means of all data points assigned to them (M-step) µ = k ∑ i ∑ i rx k i r k i i (3.4) © A.G.Billard 2004 – Last Update March 2011

45<br />

Disadvantages of hierarchical clustering are related to:<br />

• Vagueness of termination criteria<br />

• The fact that most hierarchical algorithms do not revisit once constructed (intermediate)<br />

clusters with the purpose of their improvement<br />

Exercise:<br />

Determine the optimal distance metric for two examples, in which the data are either not linearly<br />

separable or linearly separable.<br />

3.1.2 K-means clustering<br />

K-Means clustering generates a number K of disjoint, flat (non-hierarchical)<br />

k<br />

clusters k = K so as to minimize the sum-of-squares criterion<br />

C , 1,... ,<br />

J<br />

( ,... )<br />

µ µ = x −µ<br />

1<br />

K<br />

K i k<br />

k = 1<br />

k<br />

i∈C<br />

2<br />

∑∑ (3.2)<br />

where x i<br />

is a vector representing the i th data point and µ is the geometric centroid of the data<br />

k<br />

points associated to cluster C . K-means is well suited to generating globular clusters. The K-<br />

Means method is numerical, unsupervised, non-deterministic and iterative. In general, the<br />

algorithm is not guaranteed to converge to a global minimum of J.<br />

k<br />

K-Means Algorithm:<br />

1 Initialization: Pick K arbitrary centroids and set their geometric means µ ,..., 1<br />

µ to<br />

K<br />

random values.<br />

2 Calculate the distance dx ( , µ ) from each data point i to each centroid k .<br />

3 Assignment Step: Assign the responsibility<br />

i<br />

k<br />

i<br />

r<br />

k<br />

of each data point i to its “closest”<br />

centroid k<br />

i<br />

(E-step). If a tie happens (i.e. two centroids are equidistant to a data<br />

point, one assigns the data point to the smallest winning centroid).<br />

{ ( )}<br />

k arg min d x , µ<br />

= (3.3)<br />

i i k<br />

k<br />

i<br />

k<br />

⎧⎧ 1 if k = k<br />

ri<br />

= ⎨⎨<br />

⎩⎩ 0 otherwise<br />

4 Update Step: Adjust the centroids to be the means of all data points assigned to<br />

them (M-step)<br />

µ =<br />

k<br />

∑<br />

i<br />

∑<br />

i<br />

rx<br />

k<br />

i<br />

r<br />

k<br />

i<br />

i<br />

(3.4)<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!