01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

26<br />

2.2 Canonical Correlation Analysis<br />

Adapted from Hardoon, Szedmak & Shawe-Taylor, Neural Computation, 16:12, p. 2639-2664, 2004<br />

Canonical Correlation Analysis (CCA) finds basis vectors for two sets of variables such that the<br />

correlation between the projections of the variables onto these basis vectors is mutually<br />

maximized. CCA is, thus, a generalized version of PCA for two or more multi-dimensional<br />

datasets. Note that the two multidimensional datasets do not have the same dimensions nor live<br />

in the same basis. The may code for completely different types of information.<br />

Take for example the case when you may want to develop a biometric system that can identify<br />

different people based on audio recordings of the persons’ voices and sets of video recordings of<br />

each person’s face when talking. Audio and video recordings are recorded simultaneously and<br />

hence when taken as a pair they may reveal some more information than taking each of these<br />

individually. CCA will aim at extracting the features in each dataset (audio and visual) that<br />

correlate best for each person individually. In each space (audio and video), CCA would find one<br />

or more eigenvectors that maximize correlation across pair of first, second, third eigenvectors in<br />

each basis. In other words, CCA tries to find a linear combination of audio features and a linear<br />

combination of video features that correlate best. Data once projected in each of these separate<br />

sets of eigenvectors would then be most representative of a person and hence can be used to<br />

best discriminate afterwards (e.g. by using the projected data in a classifier afterwards).<br />

While PCA works with a single dataset and maximizes the variance of the projections of the data<br />

onto a set of eigenvectors forming a basis of this dataset, CCA works with a pair of random<br />

vectors determined in each basis separately and maximizes correlation between sets of<br />

projections. In other words, while PCA leads to an eigenvector problem, CCA leads to a<br />

generalized eigenvector problem.<br />

M<br />

i N<br />

i q<br />

Consider a pair of multivariate datasets X = { x ∈ } , Y =∈{ y ∈ }<br />

i<br />

i<br />

M<br />

i i<br />

measure a sample of M instances ( x , y )<br />

{ }<br />

i=<br />

1<br />

° ° of which we<br />

M<br />

= 1 = 1<br />

. CCA consists in determining a set of projection<br />

vectors w and w for X and Y such that the correlation ρ between the projections<br />

x<br />

y<br />

T<br />

T<br />

X'<br />

= w X and Y' = w Y (the canonical variates) is maximized.<br />

x<br />

y<br />

T<br />

{ }<br />

T<br />

T<br />

wxE XY wy wxCxywy<br />

ρ = max corr ( X',Y' ) = max = max<br />

(2.16)<br />

wx,<br />

wy<br />

w X w Y w C w w C w<br />

w , T , T T<br />

x w<br />

T<br />

y wx wy<br />

x y x xx x y yy y<br />

Where Cxy, Cxx,<br />

Cxy<br />

are respectively the inter-set and within sets covariance matrices.<br />

C is N× q, C is N× N and C q×<br />

N.<br />

xy xx xy<br />

Given that the correlation is not affected by rescaling the norm of the vectors w , w , we can set<br />

that:<br />

w C w = w C w = 1<br />

(2.17)<br />

T<br />

x<br />

T<br />

xx x y yy y<br />

x<br />

y<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!