MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
84<br />
5.5 Kernel ICA<br />
Adapted from Kernel Independent Component Analysis, F. Bach and M.I. Jordan, Journal of Machine Learning<br />
Research 3 (2002) 1-48<br />
In Section 2.3, we covered the linear version of Independent Component Analysis. We first here<br />
revisit ICA and extend this to a non-linear transformation through kernel ICA. To this end, we first<br />
show that ICA can also be solved by minimimization of mutual information and extend this<br />
observation to the non-linear case.<br />
Linear ICA<br />
i=<br />
i<br />
ICA assumes that the set of M observations { } 1...<br />
independent sources { 1 q}<br />
j<br />
M<br />
X = x was generated by a set of statistically<br />
j=<br />
1,... N<br />
S = s ,.... s , with q ≤ N through a linear transformation A :<br />
q<br />
A:<br />
° → °<br />
N<br />
s → x=As<br />
where A: is an unknown N×<br />
q mixing matrix.<br />
ICA consists then in estimating both A and S knowing only X .<br />
ICA bears important similarity with Probabilistic PCA, see Section 2.1.5. The sources S are latent<br />
random variables, i.e. each source s ,.... 1<br />
s was generated by an independent random process and<br />
q<br />
hence as an associated distribution p . ICA differs from PPCA in that it requires that the sources<br />
s i<br />
be statistically independent. PCA in contrast requires solely that the projections be uncorrelated.<br />
Statistical independence is a stronger constraint than un-correlatedness, see definitions in<br />
Annexes, Sections 9.2.7 and 9.2.8.<br />
To solve for ICA would require doing a multi-dimensional density estimation to estimate each of<br />
the densities p ,.... .<br />
s<br />
p This is impractical and hence ICA is usually solved by approximation<br />
1 s q<br />
techniques. In Section 2.3, we saw one method to solve ICA using a measure of non-gaussianity.<br />
The intuition behind this approach was that the distribution of the mixture of independent sources<br />
becomes closer to a Gaussian distribution than the distribution of each source independently.<br />
This, of course, assumed that the distribution of each source is non-gaussian. This idea is<br />
illustrated in Figure 5-3. A measure of non-gaussianity was proposed based on the Negentropy<br />
J y H y H y<br />
H y of the<br />
( ) = ( Gauss ) − ( ). The negentropy measures by how much the entropy ( )<br />
y s of the distribution of the sources differs from the entropy H( y )<br />
current estimate ~<br />
Gaussian distribution with the same mean and covariance as that of the distribution of y . In<br />
information theory, the entropy is a measure of the uncertainty attached to the information<br />
contained in the observation of a given variable. The more entropy, the more uncertain the event<br />
is, see Section 9.3. The notion of entropy can be extended to joint and conditional distributions.<br />
When observing two variables, the joint entropy is a measure of the information conveyed by the<br />
observation of one variable onto the other variable. It is hence tightly linked to the notion of<br />
information. Unsurprisingly, ICA can hence also be formulated in terms of mutual information, as<br />
we will see next.<br />
Gauss<br />
of a<br />
© A.G.Billard 2004 – Last Update March 2011