MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
144<br />
= 1 2 1<br />
2<br />
{( ) + (1 ) (1 )}<br />
1 2 1 1 2<br />
2λ<br />
− + 2λ<br />
− (6.49)<br />
2<br />
J E yy y y<br />
where the constraints have been implemented using Lagrange multipliers, λ 1 and λ 2 . Using gradient<br />
ascent with respect to w i and gradient descent with respect to λ i gives the learning rules:<br />
j( −λ<br />
)<br />
2<br />
( ( y ) )<br />
j( −λ<br />
)<br />
2<br />
( ( y ) )<br />
Δw ~ x y y<br />
1j<br />
1 2 1 1<br />
Δλ<br />
~ − 1−<br />
1 1<br />
Δw ~ x y y<br />
2j<br />
2 1 2 2<br />
Δλ<br />
~ − 1−<br />
2 2<br />
(6.50)<br />
where w 1j is the j th element of weight vector w 1 etc.<br />
However, just as a neural implementation of Principal Component Analysis (PCA) may be very<br />
interesting but not a generally useful method of finding Principal Components, so a neural<br />
implementation of CCA may be only a curiosity. It becomes interesting only in the light of performing<br />
nonlinear CCA, which is described next.<br />
6.7.2.1 Non-linear Canonical Correlation Analysis<br />
To determine whether a neural network as described in the previous section can extract nonlinear<br />
correlations between two data sets, x 1 and x 2 , and further to test whether such correlations are<br />
greater than the maximum linear correlations, one must introduce a nonlinearity term to the<br />
network. This takes the usual form of a tanh() function, which we also find in classical approach to<br />
non-linear ANN approximation (e.g. as in feedforward NN with backpropagation).<br />
The outputs y 1 and y 2 of the network become:<br />
y<br />
y<br />
∑<br />
( )<br />
= w tanh v x = w<br />
1 1j 1j 1j<br />
1 1<br />
j<br />
∑<br />
( )<br />
= w tanh v x = w<br />
2 2j 2j 2j<br />
2 2<br />
j<br />
To maximise the correlation between y 1 and y 2 , we again use the objective function<br />
whose derivatives give us<br />
1 2 1<br />
2<br />
= E{( yy) + (1 − y) + (1 − y)}<br />
2 2<br />
J λ λ<br />
1 1 2 1 1 2 2<br />
f<br />
f<br />
© A.G.Billard 2004 – Last Update March 2011