MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA MACHINE LEARNING TECHNIQUES - LASA

01.11.2014 Views

48 Update step. The model parameters, i.e. the means, are adjusted to match the sample means of the data points that they are responsible for. k µ = ∑ r i ∑ i k i r ⋅ x k i i The update algorithm of the soft K-means is identical to that of the hard K-means, apart for the fact that the responsibilities to a particular cluster are now real numbers varying between 0 and 1. Figure 3-10: Soft K-means algorithm with a small (left), medium (center) and large (right) σ [DEMOS\CLUSTERING\SOFT-KMEANS-SIGMA.ML] Figure 3-11: Iterations of the Soft K-means algorithm from the random initialization (left) to convergence (right). Computed with σ = 10. [DEMOS\CLUSTERING\SOFT-KMEANS-ITERATIONS.ML] © A.G.Billard 2004 – Last Update March 2011

49 3.1.4 Clustering with Mixtures of Gaussians An extension of the soft K-means algorithm consists of fitting the data with a Mixture of Gaussians (not to be confused with Gaussian Mixture Model (GMM))) which we will review later on. Instead of simply attaching a responsibility factor to each cluster, one attaches a density of probability measuring how well each cluster represents the distribution of the data. The method is bound to converge to a state that maximizes the likelihood of each point to belong to each distribution. Soft-Clustering methods are part of model-based approaches to clustering. In clustering with mixture of Gaussians, the model is naturally a Gaussian. Other model-based methods use, for instance, the Poisson or the Normal distributions. The main advantages of model-based clustering are: • It can make use of well-studied statistical inference techniques; • Its flexibility in choosing the component distribution; • It obtains a density estimation for each cluster; • It is a “soft” means of classification. Clusters with mixtures of Gaussian places K distributions, whose barycentres are located on the cluster means, as in Figure 3-13. Figure 3-12: Examples of clustering with Mixtures of Gaussians (the grey circles represent the first and second variances of the distributions). [DEMOS\CLUSTERING\GMM-CLUSTERING-SIMPLE.ML] Algorithm Assignment Step (E-step): The responsibilities are r k i = ∑ k ' α k α ( 2πσ k ) k ' 1 1 N ( 2πσ k ' ) e N ⎛⎛ 1 ⎜⎜− ⋅d ⎜⎜ 2 ⎝⎝ σ k e ⎛⎛ 1 ⎜⎜− ⋅d ⎜⎜ 2 ⎝⎝ σ k ' ( µ , x ) k i ⎞⎞ ⎟⎟ ⎟⎟ ⎠⎠ ( µ , x ) k' i ⎞⎞ ⎟⎟ ⎟⎟ ⎠⎠ (3.6) © A.G.Billard 2004 – Last Update March 2011

48<br />

Update step. The model parameters, i.e. the means, are adjusted to match the sample means of<br />

the data points that they are responsible for.<br />

k<br />

µ<br />

=<br />

∑ r<br />

i<br />

∑<br />

i<br />

k<br />

i<br />

r<br />

⋅ x<br />

k<br />

i<br />

i<br />

The update algorithm of the soft K-means is identical to that of the hard K-means, apart for the<br />

fact that the responsibilities to a particular cluster are now real numbers varying between 0 and 1.<br />

Figure 3-10: Soft K-means algorithm with a small (left), medium (center) and large (right) σ<br />

[DEMOS\CLUSTERING\SOFT-KMEANS-SIGMA.ML]<br />

Figure 3-11: Iterations of the Soft K-means algorithm from the random initialization (left) to convergence<br />

(right). Computed with σ = 10.<br />

[DEMOS\CLUSTERING\SOFT-KMEANS-ITERATIONS.ML]<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!