MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
26<br />
2.2 Canonical Correlation Analysis<br />
Adapted from Hardoon, Szedmak & Shawe-Taylor, Neural Computation, 16:12, p. 2639-2664, 2004<br />
Canonical Correlation Analysis (CCA) finds basis vectors for two sets of variables such that the<br />
correlation between the projections of the variables onto these basis vectors is mutually<br />
maximized. CCA is, thus, a generalized version of PCA for two or more multi-dimensional<br />
datasets. Note that the two multidimensional datasets do not have the same dimensions nor live<br />
in the same basis. The may code for completely different types of information.<br />
Take for example the case when you may want to develop a biometric system that can identify<br />
different people based on audio recordings of the persons’ voices and sets of video recordings of<br />
each person’s face when talking. Audio and video recordings are recorded simultaneously and<br />
hence when taken as a pair they may reveal some more information than taking each of these<br />
individually. CCA will aim at extracting the features in each dataset (audio and visual) that<br />
correlate best for each person individually. In each space (audio and video), CCA would find one<br />
or more eigenvectors that maximize correlation across pair of first, second, third eigenvectors in<br />
each basis. In other words, CCA tries to find a linear combination of audio features and a linear<br />
combination of video features that correlate best. Data once projected in each of these separate<br />
sets of eigenvectors would then be most representative of a person and hence can be used to<br />
best discriminate afterwards (e.g. by using the projected data in a classifier afterwards).<br />
While PCA works with a single dataset and maximizes the variance of the projections of the data<br />
onto a set of eigenvectors forming a basis of this dataset, CCA works with a pair of random<br />
vectors determined in each basis separately and maximizes correlation between sets of<br />
projections. In other words, while PCA leads to an eigenvector problem, CCA leads to a<br />
generalized eigenvector problem.<br />
M<br />
i N<br />
i q<br />
Consider a pair of multivariate datasets X = { x ∈ } , Y =∈{ y ∈ }<br />
i<br />
i<br />
M<br />
i i<br />
measure a sample of M instances ( x , y )<br />
{ }<br />
i=<br />
1<br />
° ° of which we<br />
M<br />
= 1 = 1<br />
. CCA consists in determining a set of projection<br />
vectors w and w for X and Y such that the correlation ρ between the projections<br />
x<br />
y<br />
T<br />
T<br />
X'<br />
= w X and Y' = w Y (the canonical variates) is maximized.<br />
x<br />
y<br />
T<br />
{ }<br />
T<br />
T<br />
wxE XY wy wxCxywy<br />
ρ = max corr ( X',Y' ) = max = max<br />
(2.16)<br />
wx,<br />
wy<br />
w X w Y w C w w C w<br />
w , T , T T<br />
x w<br />
T<br />
y wx wy<br />
x y x xx x y yy y<br />
Where Cxy, Cxx,<br />
Cxy<br />
are respectively the inter-set and within sets covariance matrices.<br />
C is N× q, C is N× N and C q×<br />
N.<br />
xy xx xy<br />
Given that the correlation is not affected by rescaling the norm of the vectors w , w , we can set<br />
that:<br />
w C w = w C w = 1<br />
(2.17)<br />
T<br />
x<br />
T<br />
xx x y yy y<br />
x<br />
y<br />
© A.G.Billard 2004 – Last Update March 2011