MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA MACHINE LEARNING TECHNIQUES - LASA

01.11.2014 Views

80 1 X2 0.5 0 -0.5 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.5 Eigenvector=1; Eigenvalue=0.246 X1 1 X2 0.5 0 -0.5 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.5 Eigenvector=2; Eigenvalue=0.232 X1 1 X2 0.5 0 -0.5 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.5 Eigenvector=1; Eigenvalue=0.160 X1 1 X2 0.5 0 -0.5 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.5 Eigenvector=2; Eigenvalue=0.149 X1 Figure 5-2: Example of clustering done with kernel PCA with a Gaussian Kernel and kernel widthσ = 0.1 (top) and σ = 0.04 (bottom). Reconstruction in the original dataspace of the projections onto the first two eigenvectors. The contour lines represent regions with equal projection value. © A.G.Billard 2004 – Last Update March 2011

81 5.4 Kernel CCA The linear version of CCA was treated in Section 2.2. Here we consider its extension to nonlinear projections to find each pair of eigenvectors. We start with a brief recall of CCA. M i N i q Consider a pair of multivariate datasets X = { x ∈ } , Y =∈{ y ∈ } i i M i i measure a sample of M instances ( x , y ) vectors and x y { } i= 1 ° ° of which we M = 1 = 1 . CCA consists in determining a set of projection w w for X and Y such that the correlation ρ between the projections T T X' = w X and Y' = w Y (the canonical variates) is maximized: x y T { } T T wxE XY wy wxCxywy ρ = max corr ( X',Y' ) = max = max (5.15) wx, wy w X w Y w C w w C w w , T , T T x w T y wx wy x y x xx x y yy y Where Cxy, Cxx, Cxy are respectively the inter-set and within sets covariance matrices. C is N× q, C is N× N and C q× N. xy xx xy Non-linear Case: Kernel CCA extends this notion to non-linear projections. As for kernel PCA, let us assume that both sets of data have been projected into a feature space through a non-linear map φ , φ , such i that we have now the two sets φx( x ) M i { } and { φy( y )} M i= 1 i= 1 M M . Let us further assume that the i i data are centered in feature space, i.e. ∑φx( x ) = 0 and ∑ φy( y ) = 0 (if the data are not i= 1 i= 1 centered in feature space, one can find a Gram matrix that ensures that these are centered, as done for kernel PCA, see exercise session). Kernel canonical correlation analysis aims at maximizing the correlation between the data in their corresponding projection space. Similarly to T kernel PCA, we can construct the two kernel matrices , F and F are two M M x i projections φx( x ) y M i { } and { φy( y )} x K = F F K = F F , where T x x x y y y × matrices, whose columns are composed of the M i= 1 i= 1 y , respectively. The weights w , w can be expressed as a linear combination of the training examples in feature space, i.e. following optimization: w = Fα and w = Fα . Substituting into the equation for linear CCA yields the x x x y y y α KKα max ρ = max (5.16) α , α α , α x y x y x x y y 1/2 1/2 2 2 ( αxKxαx) ( αyKyαy) x y © A.G.Billard 2004 – Last Update March 2011

80<br />

1<br />

X2<br />

0.5<br />

0<br />

-0.5<br />

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1<br />

1.5 Eigenvector=1; Eigenvalue=0.246 X1<br />

1<br />

X2<br />

0.5<br />

0<br />

-0.5<br />

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1<br />

1.5 Eigenvector=2; Eigenvalue=0.232 X1<br />

1<br />

X2<br />

0.5<br />

0<br />

-0.5<br />

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1<br />

1.5 Eigenvector=1; Eigenvalue=0.160 X1<br />

1<br />

X2<br />

0.5<br />

0<br />

-0.5<br />

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1<br />

1.5 Eigenvector=2; Eigenvalue=0.149 X1<br />

Figure 5-2: Example of clustering done with kernel PCA with a Gaussian Kernel and kernel<br />

widthσ = 0.1 (top) and σ = 0.04 (bottom). Reconstruction in the original dataspace of the projections onto<br />

the first two eigenvectors. The contour lines represent regions with equal projection value.<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!