01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

144<br />

= 1 2 1<br />

2<br />

{( ) + (1 ) (1 )}<br />

1 2 1 1 2<br />

2λ<br />

− + 2λ<br />

− (6.49)<br />

2<br />

J E yy y y<br />

where the constraints have been implemented using Lagrange multipliers, λ 1 and λ 2 . Using gradient<br />

ascent with respect to w i and gradient descent with respect to λ i gives the learning rules:<br />

j( −λ<br />

)<br />

2<br />

( ( y ) )<br />

j( −λ<br />

)<br />

2<br />

( ( y ) )<br />

Δw ~ x y y<br />

1j<br />

1 2 1 1<br />

Δλ<br />

~ − 1−<br />

1 1<br />

Δw ~ x y y<br />

2j<br />

2 1 2 2<br />

Δλ<br />

~ − 1−<br />

2 2<br />

(6.50)<br />

where w 1j is the j th element of weight vector w 1 etc.<br />

However, just as a neural implementation of Principal Component Analysis (PCA) may be very<br />

interesting but not a generally useful method of finding Principal Components, so a neural<br />

implementation of CCA may be only a curiosity. It becomes interesting only in the light of performing<br />

nonlinear CCA, which is described next.<br />

6.7.2.1 Non-linear Canonical Correlation Analysis<br />

To determine whether a neural network as described in the previous section can extract nonlinear<br />

correlations between two data sets, x 1 and x 2 , and further to test whether such correlations are<br />

greater than the maximum linear correlations, one must introduce a nonlinearity term to the<br />

network. This takes the usual form of a tanh() function, which we also find in classical approach to<br />

non-linear ANN approximation (e.g. as in feedforward NN with backpropagation).<br />

The outputs y 1 and y 2 of the network become:<br />

y<br />

y<br />

∑<br />

( )<br />

= w tanh v x = w<br />

1 1j 1j 1j<br />

1 1<br />

j<br />

∑<br />

( )<br />

= w tanh v x = w<br />

2 2j 2j 2j<br />

2 2<br />

j<br />

To maximise the correlation between y 1 and y 2 , we again use the objective function<br />

whose derivatives give us<br />

1 2 1<br />

2<br />

= E{( yy) + (1 − y) + (1 − y)}<br />

2 2<br />

J λ λ<br />

1 1 2 1 1 2 2<br />

f<br />

f<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!