01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

37<br />

Denote by g the derivative of the above two functions, we have:<br />

( ) = tanh ( )<br />

g u au<br />

1 1<br />

2<br />

( )<br />

g u = ue<br />

⎛⎛<br />

2<br />

u ⎞⎞<br />

⎜⎜<br />

⎜⎜<br />

− ⎟⎟<br />

2 ⎟⎟<br />

⎝⎝ ⎠⎠<br />

1≤a<br />

≤2is some suitable constant, often taken as a 1 =1. These function are monotonic<br />

where<br />

1<br />

and hence particularly well suited for performing gradient descent.<br />

The basic form of the FastICA algorithm is as follows:<br />

1. Choose an initial (e.g. random) weight vector W.<br />

T<br />

2. Compute the quantity ( )<br />

T<br />

{ } { ( )}<br />

w + = E xg w x − E g w x w<br />

3. Proceed to a normalization of the weight vector:<br />

w =<br />

+<br />

w<br />

+<br />

w<br />

r<br />

−1 ⋅ ≠1<br />

4. If the weights have not converged, i.e. wt ( ) wt ( )<br />

r<br />

, go back to step 2.<br />

Note that it is not necessary that the vector converge to a single point, since w and -w define the<br />

same direction. Recall also that it is here assumed that the data have been whitened.<br />

2.3.6.3 FastICA for several units<br />

The one-unit algorithm of the preceding subsection estimates just one of the independent<br />

components, or one projection pursuit direction. To estimate several independent components,<br />

we need to run the one-unit FastICA algorithm using several units (e.g. neurons) with weight<br />

w w .<br />

vectors<br />

1 ,..., q<br />

To prevent different vectors from converging to the same maxima we must decorrelate the<br />

outputs T ,..., T<br />

w1 w at each iteration. We present here three methods for achieving this.<br />

q<br />

A simple way of achieving decorrelation is a deflation scheme based on a Gram-Schmidt-like<br />

decorrelation. This means that we estimate the independent components one by one. When we<br />

have estimated p independent components, or p vectors w ,..., 1<br />

w<br />

p<br />

, we run the one-unit fixedpoint<br />

algorithm for w<br />

p + 1, and after every iteration step subtract from w<br />

p + 1<br />

the ``projections''<br />

+<br />

= of the previously estimated p vectors, and then renormalize w<br />

p + 1:<br />

T<br />

wp 1<br />

wjwj, j 1,... p<br />

1. Let<br />

p<br />

T<br />

p+ 1<br />

=<br />

p+ 1−∑<br />

p+<br />

1 j j<br />

j=<br />

1<br />

w w w w w<br />

2. Let w = w / w w<br />

T<br />

p+ 1 p+ 1 p+ 1 p+<br />

1<br />

(2.32)<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!