01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

24<br />

2.1.5 Probabilistic PCA<br />

Until now, all variants of PCA we have seen had a deterministic nature. However, implicitly<br />

through the computation of the mean and covariance matrix of the data, one assumed that the<br />

data followed a distribution that could be parameterized through these two parameters. A<br />

distribution parameterized with this is the Gaussian distribution.<br />

Here we will see how the standard PCA procedure can be extended with a probabilistic model.<br />

This rewriting will provide a first step toward introducing a series of methods based on so-called<br />

latent variables, which will see later on.<br />

Latent variables correspond to unobserved variables. They offer a lower dimensional<br />

representation of the data and their dependencies. Fewer dimensions result in more<br />

parsimonious models. Probabilistic PCA (PPCA) is then PCA through projection on a latent<br />

space.<br />

Formalism:<br />

N<br />

Asssume a N-dimenionsal data set X ∈° . X corresponds to the observations. Probabilistic<br />

PCA starts with the assumption that the data X were generated by a Gaussian latent variable<br />

model of the form:<br />

x= Wz + µ + ε<br />

(2.11)<br />

where z ∈ °<br />

W is a N×<br />

q matrix<br />

µ ∈ °<br />

N<br />

q<br />

are the q-dimensional Latent Variable,<br />

is a vector of parameters<br />

( ε )<br />

N<br />

ε ∈ ° is the noise and follows a zero mean Gaussian distribution ε =Ν 0, ∑<br />

Probabilistic PCA hence differs from PCA by assuming that a) the linear transformation through<br />

the matrix W goes from the latent variables to the observables; b) the transformation is no longer<br />

deterministic and is affected by random noise.<br />

Note that the noise is a random variable with zero mean and fixed covariance. If one further<br />

assumes that the covariance matrix of the noise ∑ is diagonal, i.e. that the noise along each<br />

dimension is uncorrelated to the noise along the other dimensions, this leads to a conditional<br />

independence on the observables given the latent variables. In other words, the latent variables z<br />

encapsulate the correlations across the variables. Such conditional independence on the<br />

observables is advantageous for further processing, e.g. to proceed to an estimation of the<br />

likelihood of the model given the data. In this case, one can then simply take the product of the<br />

likelihood of the data for each dimension separately.<br />

Probabilistic PCA consists then in estimating the density of the latent variable z. PPCA does so<br />

through maximum likelihood.<br />

ε<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!