01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

81<br />

5.4 Kernel CCA<br />

The linear version of CCA was treated in Section 2.2. Here we consider its extension to nonlinear<br />

projections to find each pair of eigenvectors. We start with a brief recall of CCA.<br />

M<br />

i N<br />

i q<br />

Consider a pair of multivariate datasets X = { x ∈ } , Y =∈{ y ∈ }<br />

i<br />

i<br />

M<br />

i i<br />

measure a sample of M instances ( x , y )<br />

vectors and<br />

x<br />

y<br />

{ }<br />

i=<br />

1<br />

° ° of which we<br />

M<br />

= 1 = 1<br />

. CCA consists in determining a set of projection<br />

w w for X and Y such that the correlation ρ between the projections<br />

T<br />

T<br />

X'<br />

= w X and Y' = w Y (the canonical variates) is maximized:<br />

x<br />

y<br />

T<br />

{ }<br />

T<br />

T<br />

wxE XY wy wxCxywy<br />

ρ = max corr ( X',Y' ) = max = max<br />

(5.15)<br />

wx,<br />

wy<br />

w X w Y w C w w C w<br />

w , T , T T<br />

x w<br />

T<br />

y wx wy<br />

x y x xx x y yy y<br />

Where Cxy, Cxx,<br />

Cxy<br />

are respectively the inter-set and within sets covariance matrices.<br />

C is N× q, C is N× N and C q×<br />

N.<br />

xy xx xy<br />

Non-linear Case:<br />

Kernel CCA extends this notion to non-linear projections. As for kernel PCA, let us assume that<br />

both sets of data have been projected into a feature space through a non-linear map φ , φ , such<br />

i<br />

that we have now the two sets φx( x )<br />

M<br />

i<br />

{ } and { φy( y )}<br />

M<br />

i= 1 i=<br />

1<br />

M<br />

M<br />

. Let us further assume that the<br />

i<br />

i<br />

data are centered in feature space, i.e. ∑φx( x ) = 0 and ∑ φy( y ) = 0 (if the data are not<br />

i= 1 i=<br />

1<br />

centered in feature space, one can find a Gram matrix that ensures that these are centered, as<br />

done for kernel PCA, see exercise session). Kernel canonical correlation analysis aims at<br />

maximizing the correlation between the data in their corresponding projection space. Similarly to<br />

T<br />

kernel PCA, we can construct the two kernel matrices ,<br />

F and F are two M M<br />

x<br />

i<br />

projections φx( x )<br />

y<br />

M<br />

i<br />

{ } and { φy( y )}<br />

x<br />

K = F F K = F F , where<br />

T<br />

x x x y y y<br />

× matrices, whose columns are composed of the<br />

M<br />

i= 1 i=<br />

1<br />

y<br />

, respectively.<br />

The weights w , w can be expressed as a linear combination of the training examples in feature<br />

space, i.e.<br />

following optimization:<br />

w = Fα<br />

and w = Fα<br />

. Substituting into the equation for linear CCA yields the<br />

x x x y y y<br />

α KKα<br />

max ρ = max<br />

(5.16)<br />

α , α α , α<br />

x y x y<br />

x x y y<br />

1/2 1/2<br />

2 2<br />

( αxKxαx) ( αyKyαy)<br />

x<br />

y<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!