01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

94<br />

quadratic programming problem, since the objective function is itself convex, and those points<br />

which satisfy the constraints also form a convex set (any linear constraint defines a convex set,<br />

and a set of N simultaneous linear constraints defines the intersection of N convex sets, which<br />

is also a convex set). This means that we can equivalently solve the following dual problem:<br />

maximize L , subject to the constraints that the gradient of L with respect to w and b vanish,<br />

P<br />

α ≥ .<br />

and subject also to the constraints that the 0<br />

i<br />

Requesting that the gradient of<br />

L with respect to<br />

P<br />

w and b vanish<br />

P<br />

∂<br />

i i<br />

P j i<br />

, j=1,....N<br />

j<br />

w L = w −∑<br />

∂<br />

α y x<br />

(5.39)<br />

j<br />

i<br />

gives the conditions:<br />

∂<br />

∂b L<br />

P<br />

∑ i<br />

(5.40)<br />

i<br />

i<br />

=− α y = 0<br />

M<br />

w = ∑ α<br />

(5.41)<br />

M<br />

i=<br />

1<br />

i=<br />

1<br />

i i<br />

i<br />

yx<br />

i<br />

∑ α i<br />

y = 0.<br />

(5.42)<br />

Since these are equality constraints in the dual formulation, we can substitute them into (5.38) to<br />

give:<br />

1<br />

j<br />

L α = α y − αα y y x , x .<br />

( )<br />

∑ ∑ (5.43)<br />

i i j i<br />

D i i j<br />

i 2 i,<br />

j<br />

Note that we have now given the Lagrangian different labels (P for primal, D for dual) to<br />

emphasize that the two formulations are different: L and L arise from the same objective<br />

function but with diferent constraints; and the solution is found either by minimizing<br />

maximizing<br />

L . Note also that if we formulate the problem with b = 0<br />

D<br />

P<br />

D<br />

L or by<br />

P<br />

, which amounts to<br />

requiring that all hyper-planes contain the origin (this is a mild restriction for high dimensional<br />

spaces, since it amounts to reducing the number of degrees of freedom by one), support vector<br />

training (for the separable, linear case) then amounts to maximizing L with respect to the α ,<br />

subject to constraints (5.42) and positivity of theα , with solution given by (5.41). Notice that<br />

there is a Lagrange multiplier α for every training point. In the solution those points for which<br />

i<br />

α<br />

i<br />

> 0 are called support vectors and lie on one of the hyperplanes H1,<br />

2<br />

α = , and lie either on<br />

1<br />

H , or on that side of<br />

1<br />

points have<br />

i<br />

0<br />

inequality in (5.37) is satisfied.<br />

H or<br />

2<br />

i<br />

D<br />

H . All other training<br />

H or H such that the strict<br />

2<br />

i<br />

For these machines, the support vectors are the critical elements of the training set. They lie<br />

closest to the decision boundary; if all other training points were removed (or moved around, but<br />

so as not to cross H or<br />

1<br />

H ), and training was repeated, the same separating hyperplane would<br />

2<br />

be found.<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!