MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
76<br />
5.3 Kernel PCA<br />
We start our review of kernel methods with kernel PCA, a non-linear extension to Principal<br />
Component Analysis (PCA).<br />
To recall, PCA is a powerful technique for extracting structure from high-dimensional data. In<br />
Section 2.1 and Section 6.6.3, of these lecture notes, we review two ways of computing PCA. We<br />
can either do it through an analytical decomposition into eigenvalues and eigenvectors of the data<br />
space, or through an iterative decomposition using Hebbian Learning in multiple input-output<br />
networks.<br />
A key assumption of PCA was that the transformation applied on the data was linear. Kernel PCA<br />
is a generalization of the standard PCA, in that it considers any linear or non-linear transformation<br />
of the data. We first start by reintroducing the PCA notation and show how we can go from linear<br />
PCA to non-linear PCA.<br />
Linear PCA:<br />
Assume that the dataset is composed of a set of vectors x<br />
i<br />
∈ °<br />
N<br />
, i = 1... M . Assume further that the<br />
M<br />
dataset is zero mean, i.e. ∑ x i<br />
= 0.<br />
i = 1<br />
PCA finds an orthonormal basis such that the projections along each axis maximize the variance<br />
of the data. To do so, PCA proceeds by diagonalizing the covariance matrix of the dataset C=XX<br />
T<br />
.<br />
C is positive definite and can thus be diagonalized with non-negative eigenvalues.<br />
The principal components are then all the vectors v<br />
i<br />
, i=<br />
1..., N, solution of:<br />
Cv<br />
i<br />
= λ<br />
i<br />
iv<br />
(5.4)<br />
where λ is a scalar and corresponds to the eigenvalue associated to the eigenvector v<br />
i<br />
i<br />
.<br />
Non-Linear Case:<br />
Observe first that one can rewrite the eigenvalue decomposition problem of linear PCA in terms<br />
i<br />
of dot product across pairs of the training datapoints. Indeed, each principal component v can be<br />
M<br />
1 j j<br />
T<br />
expressed as a linear combination of the datapoints. Using C= ∑ x ( x ) and replacing in<br />
M j = 1<br />
(5.4), we obtain:<br />
M<br />
1<br />
T<br />
Cv = x ( x ) v = λiv<br />
M<br />
i j j i i<br />
j=<br />
1<br />
which allows us to express each eigenvector as follows:<br />
∑ (5.5)<br />
© A.G.Billard 2004 – Last Update March 2011