MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
24<br />
2.1.5 Probabilistic PCA<br />
Until now, all variants of PCA we have seen had a deterministic nature. However, implicitly<br />
through the computation of the mean and covariance matrix of the data, one assumed that the<br />
data followed a distribution that could be parameterized through these two parameters. A<br />
distribution parameterized with this is the Gaussian distribution.<br />
Here we will see how the standard PCA procedure can be extended with a probabilistic model.<br />
This rewriting will provide a first step toward introducing a series of methods based on so-called<br />
latent variables, which will see later on.<br />
Latent variables correspond to unobserved variables. They offer a lower dimensional<br />
representation of the data and their dependencies. Fewer dimensions result in more<br />
parsimonious models. Probabilistic PCA (PPCA) is then PCA through projection on a latent<br />
space.<br />
Formalism:<br />
N<br />
Asssume a N-dimenionsal data set X ∈° . X corresponds to the observations. Probabilistic<br />
PCA starts with the assumption that the data X were generated by a Gaussian latent variable<br />
model of the form:<br />
x= Wz + µ + ε<br />
(2.11)<br />
where z ∈ °<br />
W is a N×<br />
q matrix<br />
µ ∈ °<br />
N<br />
q<br />
are the q-dimensional Latent Variable,<br />
is a vector of parameters<br />
( ε )<br />
N<br />
ε ∈ ° is the noise and follows a zero mean Gaussian distribution ε =Ν 0, ∑<br />
Probabilistic PCA hence differs from PCA by assuming that a) the linear transformation through<br />
the matrix W goes from the latent variables to the observables; b) the transformation is no longer<br />
deterministic and is affected by random noise.<br />
Note that the noise is a random variable with zero mean and fixed covariance. If one further<br />
assumes that the covariance matrix of the noise ∑ is diagonal, i.e. that the noise along each<br />
dimension is uncorrelated to the noise along the other dimensions, this leads to a conditional<br />
independence on the observables given the latent variables. In other words, the latent variables z<br />
encapsulate the correlations across the variables. Such conditional independence on the<br />
observables is advantageous for further processing, e.g. to proceed to an estimation of the<br />
likelihood of the model given the data. In this case, one can then simply take the product of the<br />
likelihood of the data for each dimension separately.<br />
Probabilistic PCA consists then in estimating the density of the latent variable z. PPCA does so<br />
through maximum likelihood.<br />
ε<br />
© A.G.Billard 2004 – Last Update March 2011