Object Detection Using Nested Cascades of Boosted Classifiers

Object Detection Using Nested Cascades of Boosted Classifiers Object Detection Using Nested Cascades of Boosted Classifiers

vision.die.uchile.cl
from vision.die.uchile.cl More from this publisher
11.02.2015 Views

5.2. Multiclass Adaboost and Multiclass Weak Classifiers 5.2.3 Multiclass Weak Classifiers The multiclass weak classifiers can have many different structures and this does not need to be related to the boosting classifier being used. This means that the boosting algorithm and the multiclass weak classifier structure do not need to be connected (as done in [Torralba et al., 2007] [Huang et al., 2007]) and that other boosting approaches, such as Gentleboost [Lienhart et al., 2003], Logiboost [Friedman et al., 1998], could be used with the multiclass weak classifiers instead of the proposed generalized version of Adaboost. Taking this into consideration, we identify three different ways of selecting the functions h t ( f t (x),m). In the first one, that we call independent components, the components of⃗h t ( f t (x)), {h t ( f t (x),m)} m=1,...,M , are chosen independently of each other (as in [Huang et al., 2007]). In the second case, joint components, the components are chosen jointly (like in [Torralba et al., 2007]): the same function ¯h t (·) is used for different components/classes and the remaining components output a zero value: h t ( f t (x),m) = β m t ¯h t ( f t (x)),β m t ∈ {0,1}. (5.3) In the third case, we introduce the concept of coupled components, where components share a common function, but for each component this function is multiplied by a different scalar value γ t m : h t ( f t (x),m) = γ t m ¯h t ( f t (x)),γ t m ∈R. (5.4) This last case, coupled components (Equation 5.4), resembles the case of joint components (Equation 5.3), but it presents several advantages: its training time is much faster (unlike [Torralba et al., 2007] it does not need a heuristic search), it is scalable with the number of classes, and as we will see later, it is much more accurate. In some cases coupled components also presents advantages over independent components, e.g. less parameters need to be estimated, which can help to avoid over-fitting, in particular when small training sets are used (this is shown in [Torralba et al., 2007] for joint components). The use of fewer parameters can be also useful in the cases when the memory footprint is important, for example, when the detector is to be used in mobile devices or robots with low memory capacity. Note, however, that there is a trade-off, not only between the complexity of the weak components and the size of the training set, but also the size of the boosted classifier. Optimization Problem As in Chapter 4, the weak classifiers are designed after the domain-partitioning weak hypotheses paradigm [Schapire and Singer, 1999]. Each feature domainFis partitioned into disjoint blocks F 1 ,...,F J , and a weak classifier h( f(x),m) will have an output for each partition block of its associated feature. During training, the use of coupled, independent or joint components, together with domain partitioning classifiers leads to different optimization problems that are outlined in the following. As part of the optimization problem, in all three cases, the values W j,m l with j = {1,...,J}, m = {1,...M}, and l = {−1,+1} need to be evaluated, where W j,m l represents the bin j of a weighted histogram (of the feature values) for the component m for “positive” (l = 1) or “negative” (l = −1) examples: W j,m l = ∑ w i, j , (5.5) i, j: f(x i )∈F j ∧(⃗a i ) m =l 69

5.2. Multiclass Adaboost and Multiclass Weak Classifiers with (⃗a i ) m representing the value of the component m of the vector⃗a i . It is important to note that the evaluation of {W j,m l } m, j,l takes order O(N), with N the number of training examples, and does not depend on J (number of partitions) nor on M (number of classes/components). This linear dependency, on N only, is very important to keep a low computational complexity of the training algorithm. In contrast to the one-class case, the output of the domain partitioning weak classifier also depends on the component of the multiclass classifier, therefore the output of the classifier, for a feature f , is: h( f(x),m) = c j,m ∈R such that f(x) ∈ F j . For each weak classifier, the value associated to each partition block (c j,m ), i.e. its output, is selected to minimize Z t : min Z t = min ∑ W j,m c j,m ∈R c j,m ∈R +1 e−c j,m +W j,m −1 ec j,m (5.6) j,m and the value of c j,m depends on the kind of weak classifier being used. In the case of independent components, this minimization problem has an analytic solution (using standard calculus): ( c j,m = 1 j,m W 2 ln +1 + ε ) m W j,m −1 + ε (5.7) m with ε m a regularization parameter. Note that the output components are independent, i.e., the values of c j,m1 and c j,m2 , for m 1 m 2 , are calculated independently and do not depend on each other. It is important to note how this regularization value, ε m , is selected. In [Schapire and Singer, 1999], in a two class classification problem, is shown that it is appropriate to choose ε ≪ 2J 1 , with J the number of partitions. The authors recommended to use ε on the order of 1/n, with n the number of training examples. If the same analysis is done in the multiclass setting used here, the value of ε m should be of the order of 1/n m , with n m the number of training examples for class m (this analysis is straight forward from Equation (11) on [Schapire and Singer, 1999]). This corresponds to smoothing the weighted histograms taking into account the number of training samples used to evaluate it. In the case of weak classifiers with joint components, the optimization problem also has an analytical solution, but it requires to solve a combinatorial optimization problem. More formally, in this case c j,m = β m c j , with β m ∈ {0,1}. The optimization problem to be solved is: min Z t = min min ∑ W j,m c j ∈R,β m ∈{0,1} β m ∈{0,1} c j ∈R +1 e−c jβ m +W j,m −1 ec jβ m (5.8) j,m In order to obtain the optimal value of this problem, 2 M problems of the form min ĉ j ∈R ∑ j Ŵ j +1 e−ĉ j +Ŵ j −1 eĉ j (5.9) must be solved, with each of these problems having a solution of the form (Ŵ ĉ j = 1 j 2 ln +1 + ε ) Ŵ j −1 + ε , with Ŵ j l = ∑ W j,m l ,l = {−1,+1} (5.10) m:β m =1 70

5.2. Multiclass Adaboost and Multiclass Weak <strong>Classifiers</strong><br />

with (⃗a i ) m representing the value <strong>of</strong> the component m <strong>of</strong> the vector⃗a i .<br />

It is important to note that the evaluation <strong>of</strong> {W j,m<br />

l<br />

} m, j,l takes order O(N), with N the number<br />

<strong>of</strong> training examples, and does not depend on J (number <strong>of</strong> partitions) nor on M (number <strong>of</strong><br />

classes/components). This linear dependency, on N only, is very important to keep a low computational<br />

complexity <strong>of</strong> the training algorithm.<br />

In contrast to the one-class case, the output <strong>of</strong> the domain partitioning weak classifier also<br />

depends on the component <strong>of</strong> the multiclass classifier, therefore the output <strong>of</strong> the classifier, for a<br />

feature f , is: h( f(x),m) = c j,m ∈R such that f(x) ∈ F j .<br />

For each weak classifier, the value associated to each partition block (c j,m ), i.e. its output, is<br />

selected to minimize Z t :<br />

min Z t = min ∑ W j,m<br />

c j,m ∈R c j,m ∈R<br />

+1 e−c j,m<br />

+W j,m<br />

−1 ec j,m<br />

(5.6)<br />

j,m<br />

and the value <strong>of</strong> c j,m depends on the kind <strong>of</strong> weak classifier being used.<br />

In the case <strong>of</strong> independent components, this minimization problem has an analytic solution<br />

(using standard calculus):<br />

(<br />

c j,m = 1 j,m W<br />

2 ln +1 + ε )<br />

m<br />

W j,m<br />

−1 + ε (5.7)<br />

m<br />

with ε m a regularization parameter. Note that the output components are independent, i.e., the<br />

values <strong>of</strong> c j,m1 and c j,m2 , for m 1 m 2 , are calculated independently and do not depend on each<br />

other.<br />

It is important to note how this regularization value, ε m , is selected. In [Schapire and Singer,<br />

1999], in a two class classification problem, is shown that it is appropriate to choose ε ≪<br />

2J 1 , with J<br />

the number <strong>of</strong> partitions. The authors recommended to use ε on the order <strong>of</strong> 1/n, with n the number<br />

<strong>of</strong> training examples. If the same analysis is done in the multiclass setting used here, the value <strong>of</strong><br />

ε m should be <strong>of</strong> the order <strong>of</strong> 1/n m , with n m the number <strong>of</strong> training examples for class m (this<br />

analysis is straight forward from Equation (11) on [Schapire and Singer, 1999]). This corresponds<br />

to smoothing the weighted histograms taking into account the number <strong>of</strong> training samples used to<br />

evaluate it.<br />

In the case <strong>of</strong> weak classifiers with joint components, the optimization problem also has an<br />

analytical solution, but it requires to solve a combinatorial optimization problem. More formally,<br />

in this case c j,m = β m c j , with β m ∈ {0,1}. The optimization problem to be solved is:<br />

min Z t = min min ∑ W j,m<br />

c j ∈R,β m ∈{0,1} β m ∈{0,1} c j ∈R<br />

+1 e−c jβ m<br />

+W j,m<br />

−1 ec jβ m<br />

(5.8)<br />

j,m<br />

In order to obtain the optimal value <strong>of</strong> this problem, 2 M problems <strong>of</strong> the form<br />

min<br />

ĉ j ∈R ∑ j<br />

Ŵ j +1 e−ĉ j<br />

+Ŵ j −1 eĉ j<br />

(5.9)<br />

must be solved, with each <strong>of</strong> these problems having a solution <strong>of</strong> the form<br />

(Ŵ<br />

ĉ j = 1 j<br />

2 ln +1 + ε<br />

)<br />

Ŵ j −1 + ε , with Ŵ j<br />

l<br />

= ∑ W j,m<br />

l<br />

,l = {−1,+1} (5.10)<br />

m:β m =1<br />

70

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!