01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

74<br />

X<br />

M<br />

i<br />

= x is the set of our M observations, we can construct a projection of X onto the<br />

If { }<br />

i=<br />

1<br />

feature space H through a non-linear mapping φ :<br />

φ : X → H<br />

x→φ<br />

( x)<br />

.<br />

(5.1)<br />

Note that if the dimension of X is N , then the dimension of the feature space may be greater<br />

than N .<br />

Finding the appropriate non-linear transformationφ is difficult in practice. The kernel trick relates<br />

i<br />

to the observation that most methods rely on computing an inner product ,<br />

x x<br />

i<br />

observations ,<br />

j<br />

x x<br />

j<br />

across pairs of<br />

∈ X (e.g., recall that in order to perform PCA, we needed to compute the<br />

T<br />

XX ;<br />

covariance matrix on X , which is given by the dot product on all the data points, i.e.<br />

similarly when performing linear regression, we had to multiply by the inverse of the covariance<br />

on X).<br />

Hence, determining an expression for this inner product in feature space may save us from<br />

having to compute explicitly the mappingφ . This inner product in feature space is expressed<br />

through the kernel function (or simply the kernel) k (.,.).<br />

All the kernel methods we will see in these lecture notes assume that the feature space H into<br />

which the data are sent through the non-linear functionφ is a Reproducing Kernel Hilbert Space<br />

{ 1<br />

.,<br />

2<br />

.,...}<br />

(RKHS). H is hence a space H:= h ( ) h ( )<br />

( ) ( )<br />

of functions from<br />

N<br />

° to ° . In this space,<br />

φ ., x i , φ ., x j ∀x i , x j ∈X<br />

defines a set of functions indexed by x . The “reproducing<br />

property" of the kernel ensures that taking the inner product between a pair of functions in H<br />

k ., x : = φ ., x yields the following kernel:<br />

yields a new function in H . Hence setting ( ) ( )<br />

( i , j ) φ( ., i ), φ( ., j ) φ( , i ), φ( ,<br />

j<br />

)<br />

∞<br />

k x x = x x = ∫ z x z x dz<br />

(5.2)<br />

The kernel in (5.2) is the dot product in feature space. Most techniques covered in these lecture<br />

notes will be based on this kernel.<br />

i<br />

Note that, in the literature, people tend to omit the open parameter on φ (.,<br />

x )<br />

simply φ ( x<br />

i<br />

) . We will follow this notation in the remainder of this document.<br />

−∞<br />

and write<br />

For proper use in the algorithms we will see next, the kernel must satisfy a number of properties<br />

that follow the Mercer’s theorem. Among these, we will retain that k is a symmetric continuous<br />

positive function that maps:<br />

k:<br />

X × X →°<br />

( i , j ) → φ( i ),<br />

φ( j<br />

)<br />

k x x x x<br />

(5.3)<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!