01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

76<br />

5.3 Kernel PCA<br />

We start our review of kernel methods with kernel PCA, a non-linear extension to Principal<br />

Component Analysis (PCA).<br />

To recall, PCA is a powerful technique for extracting structure from high-dimensional data. In<br />

Section 2.1 and Section 6.6.3, of these lecture notes, we review two ways of computing PCA. We<br />

can either do it through an analytical decomposition into eigenvalues and eigenvectors of the data<br />

space, or through an iterative decomposition using Hebbian Learning in multiple input-output<br />

networks.<br />

A key assumption of PCA was that the transformation applied on the data was linear. Kernel PCA<br />

is a generalization of the standard PCA, in that it considers any linear or non-linear transformation<br />

of the data. We first start by reintroducing the PCA notation and show how we can go from linear<br />

PCA to non-linear PCA.<br />

Linear PCA:<br />

Assume that the dataset is composed of a set of vectors x<br />

i<br />

∈ °<br />

N<br />

, i = 1... M . Assume further that the<br />

M<br />

dataset is zero mean, i.e. ∑ x i<br />

= 0.<br />

i = 1<br />

PCA finds an orthonormal basis such that the projections along each axis maximize the variance<br />

of the data. To do so, PCA proceeds by diagonalizing the covariance matrix of the dataset C=XX<br />

T<br />

.<br />

C is positive definite and can thus be diagonalized with non-negative eigenvalues.<br />

The principal components are then all the vectors v<br />

i<br />

, i=<br />

1..., N, solution of:<br />

Cv<br />

i<br />

= λ<br />

i<br />

iv<br />

(5.4)<br />

where λ is a scalar and corresponds to the eigenvalue associated to the eigenvector v<br />

i<br />

i<br />

.<br />

Non-Linear Case:<br />

Observe first that one can rewrite the eigenvalue decomposition problem of linear PCA in terms<br />

i<br />

of dot product across pairs of the training datapoints. Indeed, each principal component v can be<br />

M<br />

1 j j<br />

T<br />

expressed as a linear combination of the datapoints. Using C= ∑ x ( x ) and replacing in<br />

M j = 1<br />

(5.4), we obtain:<br />

M<br />

1<br />

T<br />

Cv = x ( x ) v = λiv<br />

M<br />

i j j i i<br />

j=<br />

1<br />

which allows us to express each eigenvector as follows:<br />

∑ (5.5)<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!