MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
74<br />
X<br />
M<br />
i<br />
= x is the set of our M observations, we can construct a projection of X onto the<br />
If { }<br />
i=<br />
1<br />
feature space H through a non-linear mapping φ :<br />
φ : X → H<br />
x→φ<br />
( x)<br />
.<br />
(5.1)<br />
Note that if the dimension of X is N , then the dimension of the feature space may be greater<br />
than N .<br />
Finding the appropriate non-linear transformationφ is difficult in practice. The kernel trick relates<br />
i<br />
to the observation that most methods rely on computing an inner product ,<br />
x x<br />
i<br />
observations ,<br />
j<br />
x x<br />
j<br />
across pairs of<br />
∈ X (e.g., recall that in order to perform PCA, we needed to compute the<br />
T<br />
XX ;<br />
covariance matrix on X , which is given by the dot product on all the data points, i.e.<br />
similarly when performing linear regression, we had to multiply by the inverse of the covariance<br />
on X).<br />
Hence, determining an expression for this inner product in feature space may save us from<br />
having to compute explicitly the mappingφ . This inner product in feature space is expressed<br />
through the kernel function (or simply the kernel) k (.,.).<br />
All the kernel methods we will see in these lecture notes assume that the feature space H into<br />
which the data are sent through the non-linear functionφ is a Reproducing Kernel Hilbert Space<br />
{ 1<br />
.,<br />
2<br />
.,...}<br />
(RKHS). H is hence a space H:= h ( ) h ( )<br />
( ) ( )<br />
of functions from<br />
N<br />
° to ° . In this space,<br />
φ ., x i , φ ., x j ∀x i , x j ∈X<br />
defines a set of functions indexed by x . The “reproducing<br />
property" of the kernel ensures that taking the inner product between a pair of functions in H<br />
k ., x : = φ ., x yields the following kernel:<br />
yields a new function in H . Hence setting ( ) ( )<br />
( i , j ) φ( ., i ), φ( ., j ) φ( , i ), φ( ,<br />
j<br />
)<br />
∞<br />
k x x = x x = ∫ z x z x dz<br />
(5.2)<br />
The kernel in (5.2) is the dot product in feature space. Most techniques covered in these lecture<br />
notes will be based on this kernel.<br />
i<br />
Note that, in the literature, people tend to omit the open parameter on φ (.,<br />
x )<br />
simply φ ( x<br />
i<br />
) . We will follow this notation in the remainder of this document.<br />
−∞<br />
and write<br />
For proper use in the algorithms we will see next, the kernel must satisfy a number of properties<br />
that follow the Mercer’s theorem. Among these, we will retain that k is a symmetric continuous<br />
positive function that maps:<br />
k:<br />
X × X →°<br />
( i , j ) → φ( i ),<br />
φ( j<br />
)<br />
k x x x x<br />
(5.3)<br />
© A.G.Billard 2004 – Last Update March 2011