01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

25<br />

Algorithm:<br />

If one further assume that the noise follows an isotropic Gaussian distribution of the<br />

2<br />

form Ν (0, I) , i.e. that its variance σ is constant along all dimensions. The conditional<br />

σ ε<br />

probability of the observables X given the latent variables px ( | z)<br />

is given by:<br />

2<br />

ε<br />

2<br />

( µσ ε )<br />

px ( | z) =Ν Wz+ , I<br />

(2.12)<br />

The marginal distribution can then be computed by integrating out the latent variable and one<br />

obtains:<br />

If we set B=WW<br />

T<br />

z<br />

T 2<br />

( ) ( µ , σ ε )<br />

p x =Ν WW + I<br />

(2.13)<br />

2<br />

+ σ ε<br />

I, one can then compute the log-likelihood:<br />

M<br />

−1<br />

L ( B, σ<br />

ε<br />

, µ ) =− { N ln( 2π) + ln B + tr ( B C)<br />

}<br />

(2.14)<br />

2<br />

1<br />

where C = x − x −<br />

M<br />

M<br />

i<br />

i<br />

∑( µ )( µ )<br />

i=<br />

1<br />

1 M<br />

{ x x }<br />

M datapoints X= ,..., .<br />

T<br />

is the covariance matrix of the complete set of<br />

The parameters B, µ and σ can then be computed through maximum likelihood, i.e. by<br />

maximizing the quantity L ( B, ε<br />

, )<br />

ε<br />

maximum estimate of µ turns out to be the mean of the dataset.<br />

The maximum-likelihood estimates of B and σ are then:<br />

1<br />

* 2<br />

=<br />

2<br />

q<br />

Λ −<br />

*<br />

( σ∈)<br />

2<br />

( q<br />

σ<br />

ε )<br />

B W I R<br />

1<br />

=<br />

N − q<br />

N<br />

j= q+<br />

1<br />

j<br />

σ µ using expectation-maximization. Unsurprisingly, the<br />

2<br />

∈<br />

(this is also called the residual)<br />

where W is the matrix of eigenvectors of C and the λ are the associated eigenvalues.<br />

q<br />

∑<br />

λ<br />

As in PCA, the dimension N of the original dataset, i.e. the observable X, is reduced by fixing the<br />

dimension q< N of the latent variable. The conditional distribution of the latent variable given<br />

the observable is:<br />

T 2<br />

( )<br />

where B= W W+ I.<br />

σ ε<br />

1 −1 2<br />

( µ σ )<br />

−<br />

( ) ( )<br />

j<br />

p z| x =Ν B W x− , B ε<br />

(2.15)<br />

Finally note that, in the absence of noise, one recovers standard PCA. Simply observe that:<br />

T<br />

−1<br />

(( W)<br />

W) W( x µ )<br />

sets A = ( )<br />

T<br />

( ) − 1<br />

W W W<br />

− is an orthonormal projection of the zero mean dataset, and hence if one<br />

, one recovers the standard PCA transformation.<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!