MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
32<br />
x= A⋅ s+<br />
ε<br />
Where A is a q<br />
× N ‘mixing’ matrix, s ,..., 1<br />
s<br />
q<br />
the set of independent components and ε a q-<br />
dimensional random noise vector 1 . The independent components are latent variables, meaning<br />
that they cannot be directly observed.<br />
This formulation reduces the ICA problem to ordinary estimation of a latent variable model.<br />
Because the estimation of the latent variable in the noisy case can be very tricky, the majority of<br />
ICA research has concentrated on the noise-free ICA model, where:<br />
x<br />
x<br />
= A⋅ s<br />
(2.22)<br />
q<br />
= ∑ a s<br />
(2.23)<br />
i ij j<br />
j=<br />
1<br />
W = A −1 (this is possible only if A is invertible; to this<br />
The matrix W is then the inverse of A, i.e.<br />
end, one can ensure that A is full rank by reducing its dimension to that of the independent<br />
component, see below)..<br />
Hypotheses of ICA:<br />
The starting point for ICA is the following assumptions:<br />
• Without loss of generality, we can assume that, both the mixture variables x and the<br />
independent components s have zero mean. Observe that the observable variables x<br />
can always be centered by subtracting the sample mean, i.e. x x'<br />
E{ x'<br />
}<br />
= − .<br />
Consequently, the independent component have also zero mean,<br />
−<br />
E s = A 1 E x .<br />
since { } { }<br />
• The components s i are statistically independent. Statistical independence is rigorously<br />
defined in Section 9.2.7. It is a stronger constraint than uncorrelatedness (which is<br />
ensured through PCA). Hence, ICA decomposition results usually in different estimates<br />
from that found through PCA decomposition.<br />
• As discussed before, we must also assume that the independent components have non-<br />
Gaussian distributions. Usually, assuming that the data follow a Gaussian distribution is<br />
handy and is done in many other techniques we will see in this course (see e.g. PPCA or<br />
GMM). Gaussian distributions are so-called parametric distribution that is the distribution<br />
is fully determined once its parameters have been defined. Hence, assuming a Gaussian<br />
distribution simplifies the estimation of the density as one must solely estimates the<br />
parameters of the Gaussian (or mixture of Gaussians, as in GMM). Since ICA does not<br />
make any assumption regarding the form of the distribution of the independent<br />
component, it looks as if ICA would estimate the full density of s. This is a very difficult<br />
problem. ICA will overcome this difficulty by not estimating explicitly the density of s.<br />
Rather; the density of s can be recovered by sampling through x and using the inverse<br />
transformation.<br />
1 Note the similarity between this model and that introduced in PPCA, see Section 2.1.5; the<br />
difference here lies in the optimization method whereby ICA optimizes for statistical<br />
independence and PCA optimizes for maximal variance.<br />
© A.G.Billard 2004 – Last Update March 2011