01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

60<br />

with w ,..., 1 K<br />

K<br />

k<br />

( ) ( )<br />

C x = ∑ wkC x<br />

k = 1<br />

w the weights associated with each classifier.<br />

Boosting builds classifiers incrementally so as to minimize classification error. To do this, one first<br />

updates the weights associated to each datapoint. These weights represent the probability with<br />

which the datapoint should be used. The less well the point is classified, the more likely it is to be<br />

included in the training set for the new classifier.<br />

Fist one computes how well each classifier represents the whole dataset by computing a local<br />

error:<br />

e<br />

k<br />

1<br />

= ∑ vi<br />

M<br />

(3.27)<br />

i k i i<br />

x : C ( x ) ≠ y<br />

The weight of each classifier is then computed to reflect how well this particular classifier<br />

w = log 1/ β .<br />

represents all the data at hand according to ( )<br />

k<br />

Then, one re-estimates the weight of each datapoint so as to give more weight to points that are<br />

ek<br />

misclassified. If β<br />

k<br />

= , k = 1... K<br />

1−<br />

e<br />

all the weights of the datapoints.<br />

k<br />

, then<br />

∑<br />

vi<br />

→ vi⋅β<br />

k<br />

k i i<br />

C ( x ) = y<br />

k<br />

, followed by a normalization of<br />

More recent advances in boosting techniques determine the weights associated to each classifier<br />

by making a line search on the global error function. The error on the complete classifier is given<br />

by:<br />

i<br />

( )<br />

M<br />

i<br />

( ) sgn 1 ( )<br />

∑ (3.28)<br />

e x = −C x ⋅y<br />

i=<br />

1<br />

Since this function is discontinuous and may change drastically as an effect of adding a new<br />

classifier, one prefers to optimize an upper bound on this function with the following loss function:<br />

i<br />

( ( ))<br />

M<br />

i<br />

( ( )) exp ( )<br />

∑ (3.29)<br />

LC X = − Cx ⋅y<br />

i=<br />

1<br />

which is continuous and onto which one can then perform a line search on a sort of gradient<br />

descent on the error. The local loss reduction resulting from adding a new classifier<br />

associated weight w<br />

k +<br />

can be computed as follows::<br />

1<br />

k +<br />

( ( ) 1<br />

k + 1 ( ))<br />

∂ LC X + w C X<br />

∂w<br />

k 1<br />

C +<br />

with<br />

(3.30)<br />

Deriving and recomputing the weights for each datapoint for the new classifier<br />

with<br />

v<br />

i<br />

=<br />

M<br />

e<br />

∑<br />

j=<br />

1<br />

k + 1<br />

e<br />

k +<br />

i<br />

( x )<br />

j<br />

1 ( x )<br />

, one gets the same reformulation of the weight update as<br />

w<br />

1 ⎛⎛1−<br />

e log<br />

⎝⎝<br />

k+<br />

1<br />

k+<br />

1<br />

= ⎜⎜ ⎟⎟<br />

2 ek<br />

+ 1<br />

⎞⎞<br />

⎠⎠<br />

(3.31)<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!