MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
106<br />
5.8.1 n-SVR<br />
The e parameter in e-SVR determines the desired accuracy of the approximation. Determining a<br />
good e may be difficult in practice. n-SVR is an alternative method that suppresses the need to<br />
pre-specifying the free parameter C in (5.65), by introducing (yet another!) parameter n. The idea<br />
if that this new parameter will make it easier to estimate e as one finds a tradeoff between model<br />
complexity and slack variables.<br />
By introducing a penalty throughν ≥ 0 , the optimization of (5.65) becomes:<br />
( )<br />
( * M<br />
1 2 ⎛⎛ 1<br />
w ξ ε) w ⎜⎜νε + ∑( ξ *<br />
i<br />
+ ξi<br />
)<br />
minimize L , , + C<br />
2 ⎝⎝<br />
i<br />
( )<br />
i<br />
( )<br />
⎧⎧ i<br />
w,<br />
φ x + b− y ≤ ε + ξi<br />
⎪⎪<br />
⎪⎪ i<br />
*<br />
subject to ⎨⎨y − w,<br />
φ x −b≤ ε + ξi<br />
⎪⎪<br />
*<br />
⎪⎪ξi<br />
≥ 0, ξi<br />
≥ 0, ε ≥ 0<br />
⎩⎩<br />
M<br />
i=<br />
1<br />
⎞⎞<br />
⎟⎟<br />
⎠⎠<br />
(5.74)<br />
Notice that we now optimize also for the value of e. The term νε in the objective function is now<br />
a weighting term that balances a growth ofε and the effect this has on the increase of the poorly<br />
fit datapoints (last term of the objective function).<br />
This last new equality constraint on e yields a new Lagrange solution with associated Lagrange<br />
multiplier β . One can then proceed to the same steps as done when solving e-SVM, i.e. write the<br />
Lagrangian, take the partial derivatives, write the dual and the solution to KKT conditions (see<br />
Scholkopf & Smola 2002 for details). This yields the following n-SVR optimization problem:<br />
For ν ≥ 0, C>0<br />
⎛⎛<br />
M<br />
M<br />
* 1<br />
⎞⎞<br />
i * *<br />
i j<br />
max { ⎜⎜∑( αi −αi<br />
) y − ∑( αi −αi<br />
)( α<br />
j<br />
−α<br />
j) k( x , x ) ⎟⎟,<br />
⎝⎝ i= 1 2 i, j=<br />
1<br />
⎠⎠<br />
(*)<br />
M<br />
α ∈°<br />
M<br />
*<br />
∑ ( αi<br />
αi<br />
)<br />
subject to − = 0,<br />
i, j=<br />
1<br />
α<br />
(*)<br />
i<br />
M<br />
*<br />
∑ ( i i )<br />
i, j=<br />
1<br />
⎡⎡ C ⎤⎤<br />
∈ ⎢⎢ 0, ,<br />
M ⎥⎥<br />
⎣⎣ ⎦⎦<br />
α + α ≤Cν.<br />
(5.75)<br />
The regression estimate then takes the same form as before and is expressed as a linear<br />
(*)<br />
combination of the kernel estimated on each of the data point with non zero α (the support<br />
i<br />
vectors), i.e.:<br />
M<br />
*<br />
i<br />
( ) ( αi<br />
αi<br />
) ( )<br />
f x = ∑ − k x , x + b.<br />
(5.76)<br />
i, j=<br />
1<br />
As we did before solyly for, we can now also find b and ε by solving the KKT conditions.<br />
© A.G.Billard 2004 – Last Update March 2011