MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
82<br />
One can further add normalization constraints on the denominator, i.e.<br />
α K α = α K α = 1.<br />
2 2<br />
x x x y y y<br />
Then, as done for linear CCA, see Section 2.2, one can express this optimization as a<br />
generalized eigenvalue problem of the form:<br />
2<br />
0 KK<br />
x y α ⎛⎛<br />
x<br />
K<br />
x<br />
0 ⎞⎞ αx<br />
⎛⎛ ⎞⎞⎛⎛ ⎞⎞ ⎛⎛ ⎞⎞<br />
⎜⎜ = ρ<br />
⎜⎜<br />
⎟⎟⎜⎜ 2<br />
KK<br />
y x<br />
0 ⎟⎟ ⎟⎟ ⎜⎜ ⎟⎟<br />
α ⎜⎜<br />
⎜⎜ ⎟⎟<br />
y 0 K ⎟⎟<br />
⎝⎝ ⎠⎠⎝⎝ ⎠⎠ α<br />
⎝⎝ y ⎠⎠⎝⎝ y⎠⎠<br />
(5.17)<br />
A first difficulty that arises when solving the above problem is the fact that its solution is usually<br />
always ρ = 1, irrespective of the kernel. This is due to the fact that the intersection between the<br />
spaces spanned by the columns of K and K is usually non-zero 4 . In this case, there exist two<br />
x<br />
αx<br />
αysuch that Kxαx Kyαy<br />
vectors and<br />
y<br />
= , that are solution of (5.17). Finding a solution to<br />
(5.17) will hence yield different projections depending on the kernel, but all of these will have<br />
maximal correlation. This hence cannot serve as a means to determine which non-linear<br />
transformation K is most appropriate.<br />
A second difficulty when solving (5.17) is that it requires inverting the Gram matrices. These may<br />
not always be invertible, especially as these are not full rank (because of the constraint of zero<br />
mean in feature space).<br />
To counter this effect, one may use a regularization parameter (also called ridge parameter) κ<br />
on the norm of the transformation yielded by K , K and end up with the following regularized<br />
problem:<br />
x<br />
2<br />
⎛⎛⎛⎛<br />
Mκ<br />
⎞⎞<br />
⎞⎞<br />
⎜⎜ Kx<br />
+ I 0 ⎟⎟<br />
⎛⎛ 0 KK<br />
x y⎞⎞⎛⎛αx ⎞⎞<br />
⎜⎜ ⎟⎟<br />
2<br />
αx<br />
ρ<br />
⎜⎜⎝⎝<br />
⎠⎠<br />
⎟⎟⎛⎛ ⎞⎞<br />
⎜⎜ =<br />
⎜⎜<br />
⎟⎟⎜⎜<br />
2<br />
KK<br />
y x<br />
0 ⎟⎟ ⎟⎟<br />
α ⎜⎜ ⎟⎟⎜⎜ ⎟⎟<br />
⎝⎝ y<br />
y<br />
{<br />
Mκ<br />
α<br />
1442443⎠⎠⎝⎝ ⎠⎠ ⎜⎜ ⎛⎛ ⎞⎞ ⎝⎝ ⎠⎠<br />
0 K<br />
y<br />
+ I ⎟⎟<br />
A α ⎜⎜<br />
⎜⎜ ⎟⎟<br />
2<br />
⎟⎟<br />
⎝⎝144444424444443<br />
⎠⎠ ⎠⎠<br />
⇔ Aα<br />
= ρBα<br />
y<br />
B<br />
(5.18)<br />
Thanks to the regularization term, when M is very large, the right-handside matrix becomes PSD<br />
(which is usually the case as the number of datapoints M>>N, the dimension of the dataset; if this<br />
is not the case, then one makes κ very large). In this case, the matrix B can be decomposed<br />
T<br />
into B= C C. Replacing, this yields a classical eigenvalue problem of the form:<br />
T<br />
−<br />
( ) 1 −1<br />
C AC β = ρβ, with β = Cα.<br />
Note that the solution to this eigenvalue problem neither shows the geometry of the kernel<br />
canonical vectors nor gives an optimal correlation of the variates. On the other hand, the<br />
4<br />
K and<br />
x<br />
K are centered and are hence at most of dimensions M − 1 . The dimension of the<br />
y<br />
intersection between the spaces spanned by the column vectors of these matrices is<br />
dim ( C( K ) , ( )) dim ( ( )) dim( ( ) )<br />
x<br />
C Ky ≥ C Kx + C Ky<br />
− M . If M>2, i.e. if we have more than two<br />
datapoints, then the space spanned by the two canonical basis intersect.<br />
© A.G.Billard 2004 – Last Update March 2011