MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
60<br />
with w ,..., 1 K<br />
K<br />
k<br />
( ) ( )<br />
C x = ∑ wkC x<br />
k = 1<br />
w the weights associated with each classifier.<br />
Boosting builds classifiers incrementally so as to minimize classification error. To do this, one first<br />
updates the weights associated to each datapoint. These weights represent the probability with<br />
which the datapoint should be used. The less well the point is classified, the more likely it is to be<br />
included in the training set for the new classifier.<br />
Fist one computes how well each classifier represents the whole dataset by computing a local<br />
error:<br />
e<br />
k<br />
1<br />
= ∑ vi<br />
M<br />
(3.27)<br />
i k i i<br />
x : C ( x ) ≠ y<br />
The weight of each classifier is then computed to reflect how well this particular classifier<br />
w = log 1/ β .<br />
represents all the data at hand according to ( )<br />
k<br />
Then, one re-estimates the weight of each datapoint so as to give more weight to points that are<br />
ek<br />
misclassified. If β<br />
k<br />
= , k = 1... K<br />
1−<br />
e<br />
all the weights of the datapoints.<br />
k<br />
, then<br />
∑<br />
vi<br />
→ vi⋅β<br />
k<br />
k i i<br />
C ( x ) = y<br />
k<br />
, followed by a normalization of<br />
More recent advances in boosting techniques determine the weights associated to each classifier<br />
by making a line search on the global error function. The error on the complete classifier is given<br />
by:<br />
i<br />
( )<br />
M<br />
i<br />
( ) sgn 1 ( )<br />
∑ (3.28)<br />
e x = −C x ⋅y<br />
i=<br />
1<br />
Since this function is discontinuous and may change drastically as an effect of adding a new<br />
classifier, one prefers to optimize an upper bound on this function with the following loss function:<br />
i<br />
( ( ))<br />
M<br />
i<br />
( ( )) exp ( )<br />
∑ (3.29)<br />
LC X = − Cx ⋅y<br />
i=<br />
1<br />
which is continuous and onto which one can then perform a line search on a sort of gradient<br />
descent on the error. The local loss reduction resulting from adding a new classifier<br />
associated weight w<br />
k +<br />
can be computed as follows::<br />
1<br />
k +<br />
( ( ) 1<br />
k + 1 ( ))<br />
∂ LC X + w C X<br />
∂w<br />
k 1<br />
C +<br />
with<br />
(3.30)<br />
Deriving and recomputing the weights for each datapoint for the new classifier<br />
with<br />
v<br />
i<br />
=<br />
M<br />
e<br />
∑<br />
j=<br />
1<br />
k + 1<br />
e<br />
k +<br />
i<br />
( x )<br />
j<br />
1 ( x )<br />
, one gets the same reformulation of the weight update as<br />
w<br />
1 ⎛⎛1−<br />
e log<br />
⎝⎝<br />
k+<br />
1<br />
k+<br />
1<br />
= ⎜⎜ ⎟⎟<br />
2 ek<br />
+ 1<br />
⎞⎞<br />
⎠⎠<br />
(3.31)<br />
© A.G.Billard 2004 – Last Update March 2011