MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
75<br />
The kernel k provides a metric of similarity across datapoints. Using k may allow extracting<br />
features common to all training datapoints. These features are some non-linear correlations not<br />
visible in the original space.<br />
Classical kernels one finds in the literature are:<br />
• Homogeneous Polynomial Kernels: ( )<br />
k x, x' = x, x' , p∈•<br />
;<br />
• Inhomogeneous Polynomial Kernels: ( ) ( p<br />
)<br />
p<br />
k x, x' = x, x' + c , p∈•<br />
, c≥0;<br />
• Hyperbolic Tangent Kernel (similar to the sigmoid function):<br />
( ) ( θ x x )<br />
k x, x' = tanh + , ' , θ∈°<br />
;<br />
• Gaussian Kernel (translation-invariant): ( )<br />
The most popular kernel is the Gaussian kernel.<br />
x−x'<br />
−<br />
k x x e σ σ<br />
2<br />
2<br />
, ' = , ∈° .<br />
5.2 Which kernel, when?<br />
A recurrent question we often hear in class is “how does one choose the kernel”. A number of<br />
recent works have addressed this problem by offering techniques to learn the kernel. This often<br />
amounts to building esimators that rely on mixture of kernels. One then learns how to combine<br />
these kernels to find the optimal mixture. There is however unfortunately no good recipe to<br />
determine which kernel to use when. One may either try different kernel in an iterative manner<br />
and look at which provides the best result for a given technique. For instance, if you perform<br />
classification, you may want to compare the performance obtained on both training and testing<br />
sets after crossvalidation when doing so with either Gaussian kernel or Polynomial kernel. You<br />
may then pick the kernel that yields to best performance.<br />
While choosing the kernel is already an issue, once a kernel is chosen, one is left with the<br />
problem of chosing the hyperparameter of the kernel. These are, for instance, the variance σ in<br />
the Gaussian kernel or the order of the polynome p in the polynomial kernels. There again,<br />
there is no good recipe. When using a Gaussian kernel, a number of approaches, known under<br />
the term of kernel polarization, have been proposed whereby one learns the optimal covariance<br />
parameters. Most of these approaches however are iterative by nature and rely on a discrete<br />
sampling of values taken by the parameters, using some heuristics or boosting techniques to<br />
optimize the search through these parameter values.<br />
© A.G.Billard 2004 – Last Update March 2011