01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

53<br />

Theα are the so-called mixing coefficients. Their sum is 1, i.e.<br />

k<br />

K<br />

∑<br />

k = 1<br />

α = 1<br />

k<br />

. These coefficients are<br />

usually estimated together with the estimation of the parameters of the Gaussians (i.e. the means<br />

and covariance matrices). In some cases, they can however be set to a constant. For instance to<br />

give equal weight to each Gaussian (in this case, α 1<br />

k<br />

= ∀ k = 1,... K). When the<br />

K<br />

coefficients are estimated, they end up representing a measure of the proportion of data points<br />

that belong most to that particular Gaussian (this is similar to the definition of theα seen in the<br />

case of Mixture of Gaussians in the previous section). In a probabilistic sense, these coefficients<br />

represent the prior probability with which each Gaussian may have generated the whole dataset<br />

M<br />

j<br />

and can hence be written αk<br />

= p( k) =∑ p( k|<br />

x)<br />

.<br />

j=<br />

1<br />

Learning of GMM requires determining the means and covariance matrices and prior probabilities<br />

of the K Gaussians. The most popular method relies on Expectation-Maximization (E-M). We<br />

advise the reader to take a detour here and read the tutorial by Gilmes provided in the annexes of<br />

these lectures notes for a full derivation of GMM parameters estimation through E-M. We briefly<br />

summarize the principle next.<br />

We want to maximize the likelihood of the model’s parameters<br />

1 1 1<br />

{ α ,..... α K , µ ,..... µ<br />

K , ,..... K ,}<br />

Θ= ∑ ∑ given the data, that is:<br />

Assuming that the set of M datapoints { }<br />

j=<br />

1<br />

(iid), we get:<br />

( ) ( )<br />

max L Θ | X = max p X | Θ (3.15)<br />

Θ<br />

X<br />

Θ<br />

j<br />

M<br />

= x is identically and independently distributed<br />

M K<br />

j k k<br />

( Θ ) = ∑α k<br />

⋅p( x µ ∑ )<br />

max p X | max | ,<br />

Θ<br />

Θ<br />

∏ (3.16)<br />

j=<br />

1 k=<br />

1<br />

Taking the log of the likelihood is often a good approach as it simplifies the computation. Using<br />

the fact that the optimum *<br />

log f x , one can<br />

compute:<br />

x of a function f ( x)<br />

is also an optimum of ( )<br />

( Θ ) = p( X Θ)<br />

max p X | max log |<br />

Θ<br />

Θ<br />

⎛⎛<br />

max log | , max log | ,<br />

Θ<br />

Θ<br />

j=<br />

1 k= 1 j= 1 ⎝⎝ k=<br />

1<br />

M K M K<br />

j k k j k k<br />

∏∑α k<br />

⋅p( x µ ∑ ) = ∑ ⎜⎜∑α k<br />

⋅p( x µ ∑ )<br />

⎞⎞<br />

⎟⎟<br />

⎠⎠<br />

(3.17)<br />

The log of a sum is difficult to compute and one is led to proceed iteratively by calculating an<br />

approximation at each step (EM), see Sections 9.4.2.2 The final update procedure runs as<br />

follows:<br />

Initialization of the parameters:<br />

Initialize all parameters to a value to start with. The<br />

p 1<br />

,..,<br />

K<br />

p can for instance be initialized with a<br />

uniform prior, while the means can be initialized by running K-Means first. The complete set of<br />

parameters is then given by:<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!