MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
93<br />
( ) ( , )<br />
f x = sign w x + b<br />
(5.34)<br />
We can now define a learning algorithm for linearly separable problems. First, observe that<br />
among all hyperplanes separating the data, there exists a unique optimal hyperplane,<br />
distinguished by the maximum margin of separation between any training point and the<br />
hyperplane, defined by:<br />
{ }<br />
i<br />
maximizew ∈H,<br />
b∈Rmin x−x , x∈ H, w, x + b= 0, i = 1,..., M (5.35)<br />
While in the simple classification problem we had presented earlier on, it was sufficient to simply<br />
compute the distance between the two cluster’ means to define the normal vector and so the<br />
hyperplane, here, the problem of finding the normal vector that leads to the largest margin is<br />
slightly more complex.<br />
To construct the optimal hyperplane, we have to solve for the objective function τ ( w)<br />
:<br />
subject to the inequality constraints:<br />
minimize<br />
1 2<br />
∈ ∈R<br />
τ ( w)<br />
= w<br />
(5.36)<br />
2<br />
w H,<br />
b<br />
i<br />
( )<br />
i<br />
y w, x + b ≥1, ∀ i=1,...M<br />
(5.37)<br />
Consider the points for which the equality in (5.37) holds (requiring that there exists such a point<br />
is equivalent to choosing a scale for w and b). These points lie on two hyperplanes<br />
i<br />
( + = ) and H2 ( wx , 1)<br />
i b<br />
H1 wx , b 1<br />
+ =− with normal w and perpendicular distance from<br />
the origin 1 − b / w . Hence d = d = 1/ w and the margin is simply 2 / w . Note that<br />
1<br />
2<br />
+ −<br />
H and<br />
H are parallel (they have the same normal) and that no training points fall between them. Thus<br />
we can find the pair of hyperplanes which gives the maximum margin by minimizing<br />
to constraints (5.37) that ensures that the class label for a given<br />
for y = −1.<br />
i<br />
2<br />
w , subject<br />
i<br />
i<br />
x will be + 1 if y = + 1, and − 1<br />
Let us now rephrase the minimization under constraint problem given by (5.36) and (5.37) in<br />
terms of the Lagrange multipliers α<br />
i, i= 1,..., l, one for each of the inequality constraints in (5.37).<br />
Recall that the rule is that for constraints of the form ci<br />
≥ 0, the constraint equations are<br />
multiplied by positive Lagrange multipliers and subtracted from the objective function, in (5.36), to<br />
form the Lagrangian. For equality constraints, the Lagrange multipliers are unconstrained. This<br />
gives the Lagrangian:<br />
We must now minimize<br />
of<br />
L with respect to all the<br />
P<br />
l<br />
1 2<br />
i<br />
i<br />
( ) ( )<br />
L wb , , α ≡ w − α y wx , + b + α .<br />
P i i<br />
2 i= 1 i=<br />
1<br />
l<br />
∑ ∑ (5.38)<br />
L with respect to<br />
P<br />
w , b , and simultaneously require that the derivatives<br />
α vanish, all subject to the constraints α ≥ 0. This is a convex<br />
i<br />
i<br />
© A.G.Billard 2004 – Last Update March 2011