01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

75<br />

The kernel k provides a metric of similarity across datapoints. Using k may allow extracting<br />

features common to all training datapoints. These features are some non-linear correlations not<br />

visible in the original space.<br />

Classical kernels one finds in the literature are:<br />

• Homogeneous Polynomial Kernels: ( )<br />

k x, x' = x, x' , p∈•<br />

;<br />

• Inhomogeneous Polynomial Kernels: ( ) ( p<br />

)<br />

p<br />

k x, x' = x, x' + c , p∈•<br />

, c≥0;<br />

• Hyperbolic Tangent Kernel (similar to the sigmoid function):<br />

( ) ( θ x x )<br />

k x, x' = tanh + , ' , θ∈°<br />

;<br />

• Gaussian Kernel (translation-invariant): ( )<br />

The most popular kernel is the Gaussian kernel.<br />

x−x'<br />

−<br />

k x x e σ σ<br />

2<br />

2<br />

, ' = , ∈° .<br />

5.2 Which kernel, when?<br />

A recurrent question we often hear in class is “how does one choose the kernel”. A number of<br />

recent works have addressed this problem by offering techniques to learn the kernel. This often<br />

amounts to building esimators that rely on mixture of kernels. One then learns how to combine<br />

these kernels to find the optimal mixture. There is however unfortunately no good recipe to<br />

determine which kernel to use when. One may either try different kernel in an iterative manner<br />

and look at which provides the best result for a given technique. For instance, if you perform<br />

classification, you may want to compare the performance obtained on both training and testing<br />

sets after crossvalidation when doing so with either Gaussian kernel or Polynomial kernel. You<br />

may then pick the kernel that yields to best performance.<br />

While choosing the kernel is already an issue, once a kernel is chosen, one is left with the<br />

problem of chosing the hyperparameter of the kernel. These are, for instance, the variance σ in<br />

the Gaussian kernel or the order of the polynome p in the polynomial kernels. There again,<br />

there is no good recipe. When using a Gaussian kernel, a number of approaches, known under<br />

the term of kernel polarization, have been proposed whereby one learns the optimal covariance<br />

parameters. Most of these approaches however are iterative by nature and rely on a discrete<br />

sampling of values taken by the parameters, using some heuristics or boosting techniques to<br />

optimize the search through these parameter values.<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!