MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA MACHINE LEARNING TECHNIQUES - LASA
48 Update step. The model parameters, i.e. the means, are adjusted to match the sample means of the data points that they are responsible for. k µ = ∑ r i ∑ i k i r ⋅ x k i i The update algorithm of the soft K-means is identical to that of the hard K-means, apart for the fact that the responsibilities to a particular cluster are now real numbers varying between 0 and 1. Figure 3-10: Soft K-means algorithm with a small (left), medium (center) and large (right) σ [DEMOS\CLUSTERING\SOFT-KMEANS-SIGMA.ML] Figure 3-11: Iterations of the Soft K-means algorithm from the random initialization (left) to convergence (right). Computed with σ = 10. [DEMOS\CLUSTERING\SOFT-KMEANS-ITERATIONS.ML] © A.G.Billard 2004 – Last Update March 2011
49 3.1.4 Clustering with Mixtures of Gaussians An extension of the soft K-means algorithm consists of fitting the data with a Mixture of Gaussians (not to be confused with Gaussian Mixture Model (GMM))) which we will review later on. Instead of simply attaching a responsibility factor to each cluster, one attaches a density of probability measuring how well each cluster represents the distribution of the data. The method is bound to converge to a state that maximizes the likelihood of each point to belong to each distribution. Soft-Clustering methods are part of model-based approaches to clustering. In clustering with mixture of Gaussians, the model is naturally a Gaussian. Other model-based methods use, for instance, the Poisson or the Normal distributions. The main advantages of model-based clustering are: • It can make use of well-studied statistical inference techniques; • Its flexibility in choosing the component distribution; • It obtains a density estimation for each cluster; • It is a “soft” means of classification. Clusters with mixtures of Gaussian places K distributions, whose barycentres are located on the cluster means, as in Figure 3-13. Figure 3-12: Examples of clustering with Mixtures of Gaussians (the grey circles represent the first and second variances of the distributions). [DEMOS\CLUSTERING\GMM-CLUSTERING-SIMPLE.ML] Algorithm Assignment Step (E-step): The responsibilities are r k i = ∑ k ' α k α ( 2πσ k ) k ' 1 1 N ( 2πσ k ' ) e N ⎛⎛ 1 ⎜⎜− ⋅d ⎜⎜ 2 ⎝⎝ σ k e ⎛⎛ 1 ⎜⎜− ⋅d ⎜⎜ 2 ⎝⎝ σ k ' ( µ , x ) k i ⎞⎞ ⎟⎟ ⎟⎟ ⎠⎠ ( µ , x ) k' i ⎞⎞ ⎟⎟ ⎟⎟ ⎠⎠ (3.6) © A.G.Billard 2004 – Last Update March 2011
- Page 1 and 2: SCHOOL OF ENGINEERING MACHINE LEARN
- Page 3 and 4: 3 4. 4 Regression Techniques ......
- Page 5 and 6: 5 9.2.2 Probability Distributions,
- Page 7 and 8: 7 Journals: • Machine Learning
- Page 9 and 10: 9 Performance What would be an opti
- Page 11 and 12: 11 1.2.3 Key features for a good le
- Page 13 and 14: 13 1.3.2 Crossvalidation To ensure
- Page 15 and 16: 15 In particular, we will consider
- Page 17 and 18: 17 2.1 Principal Component Analysis
- Page 19 and 20: 19 ( ) Xʹ′ = W X − µ (2.6) i
- Page 21 and 22: 21 2.1.2.2 Reconstruction error min
- Page 23 and 24: 23 PCA is an example of PP approach
- Page 25 and 26: 25 Algorithm: If one further assume
- Page 27 and 28: 27 The CCA algorithm consists thus
- Page 29 and 30: 29 Figure 2-6: Mixture of variables
- Page 31 and 32: 31 2.3.2 Why Gaussian variables are
- Page 33 and 34: 33 • In our general definition of
- Page 35 and 36: 35 2.3.5 ICA Ambiguities We cannot
- Page 37 and 38: 37 Denote by g the derivative of th
- Page 39 and 40: 39 3 Clustering and Classification
- Page 41 and 42: 41 An agglomerative clustering star
- Page 43 and 44: 43 3.1.1.1 The CURE Clustering Algo
- Page 45 and 46: 45 Disadvantages of hierarchical cl
- Page 47: 47 Cases where K-means might be vie
- Page 51 and 52: 51 k ( σ j ) 2 = k ∑ i α = r k
- Page 53 and 54: 53 Theα are the so-called mixing c
- Page 55 and 56: 55 Figure 3-16: Clustering with 3 G
- Page 57 and 58: 57 When the transformation A is lin
- Page 59 and 60: 59 C: X → Y ( ) C x K = arg max
- Page 61 and 62: 61 Figure 3-18: Linear combination
- Page 63 and 64: 63 Figure 3-19: Bayes classificatio
- Page 65 and 66: 65 ⎛⎛ min ⎜⎜ w ⎝⎝ N i=
- Page 67 and 68: 67 T ( yi − xi w) 2 M ⎛⎛ ⎞
- Page 69 and 70: 69 Figure 4-2: Illustration of the
- Page 71 and 72: 71 4.4.2 Multi-Gaussian Case It is
- Page 73 and 74: 73 5 Kernel Methods These lecture n
- Page 75 and 76: 75 The kernel k provides a metric o
- Page 77 and 78: 77 M 1 T v = ∑ x ( x ) v M λ i j
- Page 79 and 80: 79 1 M The solutions to the dual ei
- Page 81 and 82: 81 5.4 Kernel CCA The linear versio
- Page 83 and 84: 83 additional ridge parameter induc
- Page 85 and 86: 85 Figure 5-3: TOP: Marginal (left)
- Page 87 and 88: 87 statistical independence. We def
- Page 89 and 90: 89 J j ( µ 1,...., µ K) = ∑∑
- Page 91 and 92: 91 A simple pattern recognition alg
- Page 93 and 94: 93 ( ) ( , ) f x = sign w x + b (5.
- Page 95 and 96: 95 Figure 5-6: A binary classificat
- Page 97 and 98: 97 where N is the number of support
48<br />
Update step. The model parameters, i.e. the means, are adjusted to match the sample means of<br />
the data points that they are responsible for.<br />
k<br />
µ<br />
=<br />
∑ r<br />
i<br />
∑<br />
i<br />
k<br />
i<br />
r<br />
⋅ x<br />
k<br />
i<br />
i<br />
The update algorithm of the soft K-means is identical to that of the hard K-means, apart for the<br />
fact that the responsibilities to a particular cluster are now real numbers varying between 0 and 1.<br />
Figure 3-10: Soft K-means algorithm with a small (left), medium (center) and large (right) σ<br />
[DEMOS\CLUSTERING\SOFT-KMEANS-SIGMA.ML]<br />
Figure 3-11: Iterations of the Soft K-means algorithm from the random initialization (left) to convergence<br />
(right). Computed with σ = 10.<br />
[DEMOS\CLUSTERING\SOFT-KMEANS-ITERATIONS.ML]<br />
© A.G.Billard 2004 – Last Update March 2011