MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
81<br />
5.4 Kernel CCA<br />
The linear version of CCA was treated in Section 2.2. Here we consider its extension to nonlinear<br />
projections to find each pair of eigenvectors. We start with a brief recall of CCA.<br />
M<br />
i N<br />
i q<br />
Consider a pair of multivariate datasets X = { x ∈ } , Y =∈{ y ∈ }<br />
i<br />
i<br />
M<br />
i i<br />
measure a sample of M instances ( x , y )<br />
vectors and<br />
x<br />
y<br />
{ }<br />
i=<br />
1<br />
° ° of which we<br />
M<br />
= 1 = 1<br />
. CCA consists in determining a set of projection<br />
w w for X and Y such that the correlation ρ between the projections<br />
T<br />
T<br />
X'<br />
= w X and Y' = w Y (the canonical variates) is maximized:<br />
x<br />
y<br />
T<br />
{ }<br />
T<br />
T<br />
wxE XY wy wxCxywy<br />
ρ = max corr ( X',Y' ) = max = max<br />
(5.15)<br />
wx,<br />
wy<br />
w X w Y w C w w C w<br />
w , T , T T<br />
x w<br />
T<br />
y wx wy<br />
x y x xx x y yy y<br />
Where Cxy, Cxx,<br />
Cxy<br />
are respectively the inter-set and within sets covariance matrices.<br />
C is N× q, C is N× N and C q×<br />
N.<br />
xy xx xy<br />
Non-linear Case:<br />
Kernel CCA extends this notion to non-linear projections. As for kernel PCA, let us assume that<br />
both sets of data have been projected into a feature space through a non-linear map φ , φ , such<br />
i<br />
that we have now the two sets φx( x )<br />
M<br />
i<br />
{ } and { φy( y )}<br />
M<br />
i= 1 i=<br />
1<br />
M<br />
M<br />
. Let us further assume that the<br />
i<br />
i<br />
data are centered in feature space, i.e. ∑φx( x ) = 0 and ∑ φy( y ) = 0 (if the data are not<br />
i= 1 i=<br />
1<br />
centered in feature space, one can find a Gram matrix that ensures that these are centered, as<br />
done for kernel PCA, see exercise session). Kernel canonical correlation analysis aims at<br />
maximizing the correlation between the data in their corresponding projection space. Similarly to<br />
T<br />
kernel PCA, we can construct the two kernel matrices ,<br />
F and F are two M M<br />
x<br />
i<br />
projections φx( x )<br />
y<br />
M<br />
i<br />
{ } and { φy( y )}<br />
x<br />
K = F F K = F F , where<br />
T<br />
x x x y y y<br />
× matrices, whose columns are composed of the<br />
M<br />
i= 1 i=<br />
1<br />
y<br />
, respectively.<br />
The weights w , w can be expressed as a linear combination of the training examples in feature<br />
space, i.e.<br />
following optimization:<br />
w = Fα<br />
and w = Fα<br />
. Substituting into the equation for linear CCA yields the<br />
x x x y y y<br />
α KKα<br />
max ρ = max<br />
(5.16)<br />
α , α α , α<br />
x y x y<br />
x x y y<br />
1/2 1/2<br />
2 2<br />
( αxKxαx) ( αyKyαy)<br />
x<br />
y<br />
© A.G.Billard 2004 – Last Update March 2011