01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

90<br />

5.7 Support Vector Machines<br />

Adapted from "Learning with Kernels" by B. Scholkopf and A. Smola, MIT Press 2002, and "A Tutorial on Support<br />

Vector Machines for Pattern Recognition", by C.J.C. Burges, Data Mining and Knowledge Discovery, 2, 1998.<br />

Support vector machine (SVM) is probably one of the most popular applications of kernel<br />

methods. SVM exploits the kernel trick to extend classical linear classification to non-linear<br />

classification. It has been shown to be very powerful at separating highly intertwined data. Its<br />

simplicity of use and the large number of available software makes it easy to implement. We will<br />

here very briefly review the principle and the derivation of the algorithm. We will highlight on the<br />

sensivity to some hyperparameter so as to guide the potential user and ensure optimal use of the<br />

algorithm.<br />

Linear Case<br />

Let us first start by considering a very simple classification problem which illustrates well the<br />

reasoning behind using Kernel methods for classification. Suppose we are given two classes of<br />

objects. We are then faced with a new object, and we have to assign it to one of the two classes.<br />

This problem can be formalized as follows:<br />

Consider a training set composed of M input-output pairs where each input { } i =<br />

i<br />

1...<br />

associated with a label { } i =<br />

i<br />

1...<br />

y<br />

M<br />

. The label<br />

In SVM, we consider solely binary classification problems, i.e.:<br />

i<br />

y denotes the class to which the pattern<br />

{ , i=<br />

1... M<br />

i i<br />

x y } X { 1 }<br />

x<br />

M N<br />

∈ ° is<br />

i<br />

x belongs.<br />

∈ × ± (5.27)<br />

Note that there exist extensions to multiclass SVM. We will here focus first on the binary<br />

classification case.<br />

Given this training set, we wish to build a model of the relationships across the input points and<br />

their associated class label that would be a good predictor of the class to which each pattern<br />

belongs and would allow us to do inference: that is, given a new pattern x , we could estimate the<br />

class to which this new pattern belongs. In some sense, for a given new pattern x , we would<br />

choose a corresponding y, so that the pair { xyis , } somewhat similar to the training examples. To<br />

this end, we need a notion of similarity in X and in{ ± 1}<br />

.<br />

Similarity Measures: Characterizing the similarity of the outputs { 1}<br />

± is easy. In binary<br />

classification, only two situations occur: two labels can either be identical or different. The choice<br />

of the similarity measure for the inputs, on the other hand, is more complex and is tighly linked to<br />

k x, x'<br />

gives a<br />

the idea of kernel. As we have seen previously in these lecture notes, the kernel ( )<br />

measure of similarity across two datapoints x and x' . A natural choice for the kernel when<br />

considering the simple linear classification problem outlined above is to take the dot product, i.e.:<br />

N<br />

k x, x' = x, x' =∑ x x'<br />

(5.28)<br />

( )<br />

The geometrical interpretation of the canonical dot product is that it computes the cosine of the<br />

angle between the vectors x and x ' , provided they are normalized with length 1.<br />

i=<br />

1<br />

i<br />

i<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!