01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

82<br />

One can further add normalization constraints on the denominator, i.e.<br />

α K α = α K α = 1.<br />

2 2<br />

x x x y y y<br />

Then, as done for linear CCA, see Section 2.2, one can express this optimization as a<br />

generalized eigenvalue problem of the form:<br />

2<br />

0 KK<br />

x y α ⎛⎛<br />

x<br />

K<br />

x<br />

0 ⎞⎞ αx<br />

⎛⎛ ⎞⎞⎛⎛ ⎞⎞ ⎛⎛ ⎞⎞<br />

⎜⎜ = ρ<br />

⎜⎜<br />

⎟⎟⎜⎜ 2<br />

KK<br />

y x<br />

0 ⎟⎟ ⎟⎟ ⎜⎜ ⎟⎟<br />

α ⎜⎜<br />

⎜⎜ ⎟⎟<br />

y 0 K ⎟⎟<br />

⎝⎝ ⎠⎠⎝⎝ ⎠⎠ α<br />

⎝⎝ y ⎠⎠⎝⎝ y⎠⎠<br />

(5.17)<br />

A first difficulty that arises when solving the above problem is the fact that its solution is usually<br />

always ρ = 1, irrespective of the kernel. This is due to the fact that the intersection between the<br />

spaces spanned by the columns of K and K is usually non-zero 4 . In this case, there exist two<br />

x<br />

αx<br />

αysuch that Kxαx Kyαy<br />

vectors and<br />

y<br />

= , that are solution of (5.17). Finding a solution to<br />

(5.17) will hence yield different projections depending on the kernel, but all of these will have<br />

maximal correlation. This hence cannot serve as a means to determine which non-linear<br />

transformation K is most appropriate.<br />

A second difficulty when solving (5.17) is that it requires inverting the Gram matrices. These may<br />

not always be invertible, especially as these are not full rank (because of the constraint of zero<br />

mean in feature space).<br />

To counter this effect, one may use a regularization parameter (also called ridge parameter) κ<br />

on the norm of the transformation yielded by K , K and end up with the following regularized<br />

problem:<br />

x<br />

2<br />

⎛⎛⎛⎛<br />

Mκ<br />

⎞⎞<br />

⎞⎞<br />

⎜⎜ Kx<br />

+ I 0 ⎟⎟<br />

⎛⎛ 0 KK<br />

x y⎞⎞⎛⎛αx ⎞⎞<br />

⎜⎜ ⎟⎟<br />

2<br />

αx<br />

ρ<br />

⎜⎜⎝⎝<br />

⎠⎠<br />

⎟⎟⎛⎛ ⎞⎞<br />

⎜⎜ =<br />

⎜⎜<br />

⎟⎟⎜⎜<br />

2<br />

KK<br />

y x<br />

0 ⎟⎟ ⎟⎟<br />

α ⎜⎜ ⎟⎟⎜⎜ ⎟⎟<br />

⎝⎝ y<br />

y<br />

{<br />

Mκ<br />

α<br />

1442443⎠⎠⎝⎝ ⎠⎠ ⎜⎜ ⎛⎛ ⎞⎞ ⎝⎝ ⎠⎠<br />

0 K<br />

y<br />

+ I ⎟⎟<br />

A α ⎜⎜<br />

⎜⎜ ⎟⎟<br />

2<br />

⎟⎟<br />

⎝⎝144444424444443<br />

⎠⎠ ⎠⎠<br />

⇔ Aα<br />

= ρBα<br />

y<br />

B<br />

(5.18)<br />

Thanks to the regularization term, when M is very large, the right-handside matrix becomes PSD<br />

(which is usually the case as the number of datapoints M>>N, the dimension of the dataset; if this<br />

is not the case, then one makes κ very large). In this case, the matrix B can be decomposed<br />

T<br />

into B= C C. Replacing, this yields a classical eigenvalue problem of the form:<br />

T<br />

−<br />

( ) 1 −1<br />

C AC β = ρβ, with β = Cα.<br />

Note that the solution to this eigenvalue problem neither shows the geometry of the kernel<br />

canonical vectors nor gives an optimal correlation of the variates. On the other hand, the<br />

4<br />

K and<br />

x<br />

K are centered and are hence at most of dimensions M − 1 . The dimension of the<br />

y<br />

intersection between the spaces spanned by the column vectors of these matrices is<br />

dim ( C( K ) , ( )) dim ( ( )) dim( ( ) )<br />

x<br />

C Ky ≥ C Kx + C Ky<br />

− M . If M>2, i.e. if we have more than two<br />

datapoints, then the space spanned by the two canonical basis intersect.<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!