01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

32<br />

x= A⋅ s+<br />

ε<br />

Where A is a q<br />

× N ‘mixing’ matrix, s ,..., 1<br />

s<br />

q<br />

the set of independent components and ε a q-<br />

dimensional random noise vector 1 . The independent components are latent variables, meaning<br />

that they cannot be directly observed.<br />

This formulation reduces the ICA problem to ordinary estimation of a latent variable model.<br />

Because the estimation of the latent variable in the noisy case can be very tricky, the majority of<br />

ICA research has concentrated on the noise-free ICA model, where:<br />

x<br />

x<br />

= A⋅ s<br />

(2.22)<br />

q<br />

= ∑ a s<br />

(2.23)<br />

i ij j<br />

j=<br />

1<br />

W = A −1 (this is possible only if A is invertible; to this<br />

The matrix W is then the inverse of A, i.e.<br />

end, one can ensure that A is full rank by reducing its dimension to that of the independent<br />

component, see below)..<br />

Hypotheses of ICA:<br />

The starting point for ICA is the following assumptions:<br />

• Without loss of generality, we can assume that, both the mixture variables x and the<br />

independent components s have zero mean. Observe that the observable variables x<br />

can always be centered by subtracting the sample mean, i.e. x x'<br />

E{ x'<br />

}<br />

= − .<br />

Consequently, the independent component have also zero mean,<br />

−<br />

E s = A 1 E x .<br />

since { } { }<br />

• The components s i are statistically independent. Statistical independence is rigorously<br />

defined in Section 9.2.7. It is a stronger constraint than uncorrelatedness (which is<br />

ensured through PCA). Hence, ICA decomposition results usually in different estimates<br />

from that found through PCA decomposition.<br />

• As discussed before, we must also assume that the independent components have non-<br />

Gaussian distributions. Usually, assuming that the data follow a Gaussian distribution is<br />

handy and is done in many other techniques we will see in this course (see e.g. PPCA or<br />

GMM). Gaussian distributions are so-called parametric distribution that is the distribution<br />

is fully determined once its parameters have been defined. Hence, assuming a Gaussian<br />

distribution simplifies the estimation of the density as one must solely estimates the<br />

parameters of the Gaussian (or mixture of Gaussians, as in GMM). Since ICA does not<br />

make any assumption regarding the form of the distribution of the independent<br />

component, it looks as if ICA would estimate the full density of s. This is a very difficult<br />

problem. ICA will overcome this difficulty by not estimating explicitly the density of s.<br />

Rather; the density of s can be recovered by sampling through x and using the inverse<br />

transformation.<br />

1 Note the similarity between this model and that introduced in PPCA, see Section 2.1.5; the<br />

difference here lies in the optimization method whereby ICA optimizes for statistical<br />

independence and PCA optimizes for maximal variance.<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!