01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

93<br />

( ) ( , )<br />

f x = sign w x + b<br />

(5.34)<br />

We can now define a learning algorithm for linearly separable problems. First, observe that<br />

among all hyperplanes separating the data, there exists a unique optimal hyperplane,<br />

distinguished by the maximum margin of separation between any training point and the<br />

hyperplane, defined by:<br />

{ }<br />

i<br />

maximizew ∈H,<br />

b∈Rmin x−x , x∈ H, w, x + b= 0, i = 1,..., M (5.35)<br />

While in the simple classification problem we had presented earlier on, it was sufficient to simply<br />

compute the distance between the two cluster’ means to define the normal vector and so the<br />

hyperplane, here, the problem of finding the normal vector that leads to the largest margin is<br />

slightly more complex.<br />

To construct the optimal hyperplane, we have to solve for the objective function τ ( w)<br />

:<br />

subject to the inequality constraints:<br />

minimize<br />

1 2<br />

∈ ∈R<br />

τ ( w)<br />

= w<br />

(5.36)<br />

2<br />

w H,<br />

b<br />

i<br />

( )<br />

i<br />

y w, x + b ≥1, ∀ i=1,...M<br />

(5.37)<br />

Consider the points for which the equality in (5.37) holds (requiring that there exists such a point<br />

is equivalent to choosing a scale for w and b). These points lie on two hyperplanes<br />

i<br />

( + = ) and H2 ( wx , 1)<br />

i b<br />

H1 wx , b 1<br />

+ =− with normal w and perpendicular distance from<br />

the origin 1 − b / w . Hence d = d = 1/ w and the margin is simply 2 / w . Note that<br />

1<br />

2<br />

+ −<br />

H and<br />

H are parallel (they have the same normal) and that no training points fall between them. Thus<br />

we can find the pair of hyperplanes which gives the maximum margin by minimizing<br />

to constraints (5.37) that ensures that the class label for a given<br />

for y = −1.<br />

i<br />

2<br />

w , subject<br />

i<br />

i<br />

x will be + 1 if y = + 1, and − 1<br />

Let us now rephrase the minimization under constraint problem given by (5.36) and (5.37) in<br />

terms of the Lagrange multipliers α<br />

i, i= 1,..., l, one for each of the inequality constraints in (5.37).<br />

Recall that the rule is that for constraints of the form ci<br />

≥ 0, the constraint equations are<br />

multiplied by positive Lagrange multipliers and subtracted from the objective function, in (5.36), to<br />

form the Lagrangian. For equality constraints, the Lagrange multipliers are unconstrained. This<br />

gives the Lagrangian:<br />

We must now minimize<br />

of<br />

L with respect to all the<br />

P<br />

l<br />

1 2<br />

i<br />

i<br />

( ) ( )<br />

L wb , , α ≡ w − α y wx , + b + α .<br />

P i i<br />

2 i= 1 i=<br />

1<br />

l<br />

∑ ∑ (5.38)<br />

L with respect to<br />

P<br />

w , b , and simultaneously require that the derivatives<br />

α vanish, all subject to the constraints α ≥ 0. This is a convex<br />

i<br />

i<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!