MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
37<br />
Denote by g the derivative of the above two functions, we have:<br />
( ) = tanh ( )<br />
g u au<br />
1 1<br />
2<br />
( )<br />
g u = ue<br />
⎛⎛<br />
2<br />
u ⎞⎞<br />
⎜⎜<br />
⎜⎜<br />
− ⎟⎟<br />
2 ⎟⎟<br />
⎝⎝ ⎠⎠<br />
1≤a<br />
≤2is some suitable constant, often taken as a 1 =1. These function are monotonic<br />
where<br />
1<br />
and hence particularly well suited for performing gradient descent.<br />
The basic form of the FastICA algorithm is as follows:<br />
1. Choose an initial (e.g. random) weight vector W.<br />
T<br />
2. Compute the quantity ( )<br />
T<br />
{ } { ( )}<br />
w + = E xg w x − E g w x w<br />
3. Proceed to a normalization of the weight vector:<br />
w =<br />
+<br />
w<br />
+<br />
w<br />
r<br />
−1 ⋅ ≠1<br />
4. If the weights have not converged, i.e. wt ( ) wt ( )<br />
r<br />
, go back to step 2.<br />
Note that it is not necessary that the vector converge to a single point, since w and -w define the<br />
same direction. Recall also that it is here assumed that the data have been whitened.<br />
2.3.6.3 FastICA for several units<br />
The one-unit algorithm of the preceding subsection estimates just one of the independent<br />
components, or one projection pursuit direction. To estimate several independent components,<br />
we need to run the one-unit FastICA algorithm using several units (e.g. neurons) with weight<br />
w w .<br />
vectors<br />
1 ,..., q<br />
To prevent different vectors from converging to the same maxima we must decorrelate the<br />
outputs T ,..., T<br />
w1 w at each iteration. We present here three methods for achieving this.<br />
q<br />
A simple way of achieving decorrelation is a deflation scheme based on a Gram-Schmidt-like<br />
decorrelation. This means that we estimate the independent components one by one. When we<br />
have estimated p independent components, or p vectors w ,..., 1<br />
w<br />
p<br />
, we run the one-unit fixedpoint<br />
algorithm for w<br />
p + 1, and after every iteration step subtract from w<br />
p + 1<br />
the ``projections''<br />
+<br />
= of the previously estimated p vectors, and then renormalize w<br />
p + 1:<br />
T<br />
wp 1<br />
wjwj, j 1,... p<br />
1. Let<br />
p<br />
T<br />
p+ 1<br />
=<br />
p+ 1−∑<br />
p+<br />
1 j j<br />
j=<br />
1<br />
w w w w w<br />
2. Let w = w / w w<br />
T<br />
p+ 1 p+ 1 p+ 1 p+<br />
1<br />
(2.32)<br />
© A.G.Billard 2004 – Last Update March 2011