Human Detection in Video over Large Viewpoint Changes

Human Detection in Video over Large Viewpoint Changes Human Detection in Video over Large Viewpoint Changes

media.cs.tsinghua.edu.cn
from media.cs.tsinghua.edu.cn More from this publisher
27.09.2014 Views

1254 G. Duan, H. Ai, and S. Lao w.r.t. the classifier score. The weight of k th classifier over i th sample is updated by w ki = ∂ log J ∂y ki = t i − P i P i P ki (x i ). (11) We sum up the training algorithms of MC and SC in Table 2. Table 2: Learning algorithms of MC and SC. Input: Sample set S = {(x i , y i )|1 ≤ i ≤ m} where y i = ±1; Detection rate r in each layer. Output: H k (x) = P t α kt(x), k = 1, · · · , K. Loop: For t = 1, · · · , T (MC) P i w kih kt (x i ). – Find weak classifiers h t, (h kt = h t, k = 1, · · · , K) that maximize P K k=1 – Find the weak-learner weights α kt , (k = 1, · · · , K) that maximize Γ (H + α kt h kt ). – Update weights by Eq.11. (SC) For k = 1, · · · , K – Find weak classifiers h kt that maximize P i w kih kt (x i ). – Find the weak-learner weights α kt that maximize Γ (H + α kt h kt ). – Update weights by Eq.11. Update thresholds θ k (k = 1, · · · , K) to satisfy detection rate r. 4.3 A general EMC-Boost The three components of EMC-Boost have different properties. CC takes all samples as one cluster while MC or SC considers that the whole sample space consists of multiple clusters. CC tends to distinguish positives from negatives; while SC tends to cluster the sample space, and MC can do both work at the same time but not so accurately as CC in classifying positives from negatives or as SC in clustering. Compared with SC, one particular advantage of MC is sharing weak features among all clusters. We combine these three components to become a general EMC-Boost as shown in Fig. 3 (d) which contains five steps:(Note that Step 2 is similar to CBT [14]) Step 1. CC learns a classifier for all samples considered as one category. Step 2. K-means algorithm clusters sample space with learned weak features. Step 3. MC clusters sample space coarsely. Step 4. SC clusters the sample space further. Step 5. CC learns a classifier for each cluster center. 5 Multiple Video Sampling In order to deal with the change in video frame rate or abrupt motion problem, we introduce a Multiple Video Sampling (MVS) strategy as illustrated in Fig. 4 (a) and (b). Considering five consecutive frames in (a), a positive sample is made up of two frames, where one is the first frame and the other is from the

Human Detection in Video over Large Viewpoint Changes 1255 1 2 3 4 5 (a) 1 2 1 3 1 4 1 5 (b) (c) Fig. 4: The MVS strategy and some positive samples. Appearance Motion Appearance+Motion 23.3% i th cluster k th cluster 34.9% 15.5% 1 st stage 2 nd stage (a) 1 st stage 2 nd stage (b) 10.5% 15.8% Fig. 5: Two-Stage Tree Structure in (a) and an example in (b). The number in the box gives the percentage of samples belonging to that branch. next four frames as shown in (b). In other words, one annotation corresponds to five consecutive frames and generates 4 positives. Some more positives are shown in Fig. 4 (c). Suppose that the original frame rate is R and the used positives consist of the 1 st and the r th frames (r > 1), then the possible frame rate covered by MVS strategy is R/(r −1). If these positives are extracted from 30 fps videos, the trained detector is able to deal with 30fps(30/1),15fps(30/2),10fps(30/3) and 7.5fps(30/4) videos where r is 2, 3, 4 and 5 respectively. 6 Overview of Our Approach We adopt EMC-Boost selecting I 2 CF as weak features to learn a strong classifier for multiple viewpoint human detection, in which positive samples are achieved through MVS strategy. Due to the large amount of samples and features, it is difficult to learn a detector directly by a general EMC-Boost. We modify the detector structure slightly and propose a new structure containing two stages as shown in Fig. 5 (a) with an example in (b), which is called two-stage tree structure: In the 1 st stage, it only uses appearance information for learning and clustering; In the 2 nd stage, it uses both appearance and motion information for clustering first, and then for learning classifiers for all clusters. 7 Experiments We carry out some experiments to evaluate our approach by False Positive Per Image (FPPI) on several real-world challenging datasets, ETHZ, PET2007 and our own collected dataset. When the intersection between a detection response

1254 G. Duan, H. Ai, and S. Lao<br />

w.r.t. the classifier score. The weight of k th classifier <strong>over</strong> i th sample is updated<br />

by<br />

w ki = ∂ log J<br />

∂y ki<br />

= t i − P i<br />

P i<br />

P ki (x i ). (11)<br />

We sum up the tra<strong>in</strong><strong>in</strong>g algorithms of MC and SC <strong>in</strong> Table 2.<br />

Table 2: Learn<strong>in</strong>g algorithms of MC and SC.<br />

Input: Sample set S = {(x i , y i )|1 ≤ i ≤ m} where y i = ±1; <strong>Detection</strong> rate r <strong>in</strong> each layer.<br />

Output: H k (x) = P t α kt(x), k = 1, · · · , K.<br />

Loop: For t = 1, · · · , T<br />

(MC)<br />

P<br />

i w kih kt (x i ).<br />

– F<strong>in</strong>d weak classifiers h t, (h kt = h t, k = 1, · · · , K) that maximize P K<br />

k=1<br />

– F<strong>in</strong>d the weak-learner weights α kt , (k = 1, · · · , K) that maximize Γ (H + α kt h kt ).<br />

– Update weights by Eq.11.<br />

(SC) For k = 1, · · · , K<br />

– F<strong>in</strong>d weak classifiers h kt that maximize P i w kih kt (x i ).<br />

– F<strong>in</strong>d the weak-learner weights α kt that maximize Γ (H + α kt h kt ).<br />

– Update weights by Eq.11.<br />

Update thresholds θ k (k = 1, · · · , K) to satisfy detection rate r.<br />

4.3 A general EMC-Boost<br />

The three components of EMC-Boost have different properties. CC takes all<br />

samples as one cluster while MC or SC considers that the whole sample space<br />

consists of multiple clusters. CC tends to dist<strong>in</strong>guish positives from negatives;<br />

while SC tends to cluster the sample space, and MC can do both work at the<br />

same time but not so accurately as CC <strong>in</strong> classify<strong>in</strong>g positives from negatives<br />

or as SC <strong>in</strong> cluster<strong>in</strong>g. Compared with SC, one particular advantage of MC is<br />

shar<strong>in</strong>g weak features among all clusters. We comb<strong>in</strong>e these three components<br />

to become a general EMC-Boost as shown <strong>in</strong> Fig. 3 (d) which conta<strong>in</strong>s five<br />

steps:(Note that Step 2 is similar to CBT [14])<br />

Step 1. CC learns a classifier for all samples considered as one category.<br />

Step 2. K-means algorithm clusters sample space with learned weak features.<br />

Step 3. MC clusters sample space coarsely.<br />

Step 4. SC clusters the sample space further.<br />

Step 5. CC learns a classifier for each cluster center.<br />

5 Multiple <strong>Video</strong> Sampl<strong>in</strong>g<br />

In order to deal with the change <strong>in</strong> video frame rate or abrupt motion problem,<br />

we <strong>in</strong>troduce a Multiple <strong>Video</strong> Sampl<strong>in</strong>g (MVS) strategy as illustrated <strong>in</strong><br />

Fig. 4 (a) and (b). Consider<strong>in</strong>g five consecutive frames <strong>in</strong> (a), a positive sample<br />

is made up of two frames, where one is the first frame and the other is from the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!