MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
25<br />
Algorithm:<br />
If one further assume that the noise follows an isotropic Gaussian distribution of the<br />
2<br />
form Ν (0, I) , i.e. that its variance σ is constant along all dimensions. The conditional<br />
σ ε<br />
probability of the observables X given the latent variables px ( | z)<br />
is given by:<br />
2<br />
ε<br />
2<br />
( µσ ε )<br />
px ( | z) =Ν Wz+ , I<br />
(2.12)<br />
The marginal distribution can then be computed by integrating out the latent variable and one<br />
obtains:<br />
If we set B=WW<br />
T<br />
z<br />
T 2<br />
( ) ( µ , σ ε )<br />
p x =Ν WW + I<br />
(2.13)<br />
2<br />
+ σ ε<br />
I, one can then compute the log-likelihood:<br />
M<br />
−1<br />
L ( B, σ<br />
ε<br />
, µ ) =− { N ln( 2π) + ln B + tr ( B C)<br />
}<br />
(2.14)<br />
2<br />
1<br />
where C = x − x −<br />
M<br />
M<br />
i<br />
i<br />
∑( µ )( µ )<br />
i=<br />
1<br />
1 M<br />
{ x x }<br />
M datapoints X= ,..., .<br />
T<br />
is the covariance matrix of the complete set of<br />
The parameters B, µ and σ can then be computed through maximum likelihood, i.e. by<br />
maximizing the quantity L ( B, ε<br />
, )<br />
ε<br />
maximum estimate of µ turns out to be the mean of the dataset.<br />
The maximum-likelihood estimates of B and σ are then:<br />
1<br />
* 2<br />
=<br />
2<br />
q<br />
Λ −<br />
*<br />
( σ∈)<br />
2<br />
( q<br />
σ<br />
ε )<br />
B W I R<br />
1<br />
=<br />
N − q<br />
N<br />
j= q+<br />
1<br />
j<br />
σ µ using expectation-maximization. Unsurprisingly, the<br />
2<br />
∈<br />
(this is also called the residual)<br />
where W is the matrix of eigenvectors of C and the λ are the associated eigenvalues.<br />
q<br />
∑<br />
λ<br />
As in PCA, the dimension N of the original dataset, i.e. the observable X, is reduced by fixing the<br />
dimension q< N of the latent variable. The conditional distribution of the latent variable given<br />
the observable is:<br />
T 2<br />
( )<br />
where B= W W+ I.<br />
σ ε<br />
1 −1 2<br />
( µ σ )<br />
−<br />
( ) ( )<br />
j<br />
p z| x =Ν B W x− , B ε<br />
(2.15)<br />
Finally note that, in the absence of noise, one recovers standard PCA. Simply observe that:<br />
T<br />
−1<br />
(( W)<br />
W) W( x µ )<br />
sets A = ( )<br />
T<br />
( ) − 1<br />
W W W<br />
− is an orthonormal projection of the zero mean dataset, and hence if one<br />
, one recovers the standard PCA transformation.<br />
© A.G.Billard 2004 – Last Update March 2011