01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

84<br />

5.5 Kernel ICA<br />

Adapted from Kernel Independent Component Analysis, F. Bach and M.I. Jordan, Journal of Machine Learning<br />

Research 3 (2002) 1-48<br />

In Section 2.3, we covered the linear version of Independent Component Analysis. We first here<br />

revisit ICA and extend this to a non-linear transformation through kernel ICA. To this end, we first<br />

show that ICA can also be solved by minimimization of mutual information and extend this<br />

observation to the non-linear case.<br />

Linear ICA<br />

i=<br />

i<br />

ICA assumes that the set of M observations { } 1...<br />

independent sources { 1 q}<br />

j<br />

M<br />

X = x was generated by a set of statistically<br />

j=<br />

1,... N<br />

S = s ,.... s , with q ≤ N through a linear transformation A :<br />

q<br />

A:<br />

° → °<br />

N<br />

s → x=As<br />

where A: is an unknown N×<br />

q mixing matrix.<br />

ICA consists then in estimating both A and S knowing only X .<br />

ICA bears important similarity with Probabilistic PCA, see Section 2.1.5. The sources S are latent<br />

random variables, i.e. each source s ,.... 1<br />

s was generated by an independent random process and<br />

q<br />

hence as an associated distribution p . ICA differs from PPCA in that it requires that the sources<br />

s i<br />

be statistically independent. PCA in contrast requires solely that the projections be uncorrelated.<br />

Statistical independence is a stronger constraint than un-correlatedness, see definitions in<br />

Annexes, Sections 9.2.7 and 9.2.8.<br />

To solve for ICA would require doing a multi-dimensional density estimation to estimate each of<br />

the densities p ,.... .<br />

s<br />

p This is impractical and hence ICA is usually solved by approximation<br />

1 s q<br />

techniques. In Section 2.3, we saw one method to solve ICA using a measure of non-gaussianity.<br />

The intuition behind this approach was that the distribution of the mixture of independent sources<br />

becomes closer to a Gaussian distribution than the distribution of each source independently.<br />

This, of course, assumed that the distribution of each source is non-gaussian. This idea is<br />

illustrated in Figure 5-3. A measure of non-gaussianity was proposed based on the Negentropy<br />

J y H y H y<br />

H y of the<br />

( ) = ( Gauss ) − ( ). The negentropy measures by how much the entropy ( )<br />

y s of the distribution of the sources differs from the entropy H( y )<br />

current estimate ~<br />

Gaussian distribution with the same mean and covariance as that of the distribution of y . In<br />

information theory, the entropy is a measure of the uncertainty attached to the information<br />

contained in the observation of a given variable. The more entropy, the more uncertain the event<br />

is, see Section 9.3. The notion of entropy can be extended to joint and conditional distributions.<br />

When observing two variables, the joint entropy is a measure of the information conveyed by the<br />

observation of one variable onto the other variable. It is hence tightly linked to the notion of<br />

information. Unsurprisingly, ICA can hence also be formulated in terms of mutual information, as<br />

we will see next.<br />

Gauss<br />

of a<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!