Human Detection in Video over Large Viewpoint Changes

27.09.2014 Views
1254 G. Duan, H. Ai, and S. Lao w.r.t. the classifier score. The weight of k th classifier over i th sample is updated by w ki = ∂ log J ∂y ki = t i − P i P i P ki (x i ). (11) We sum up the training algorithms of MC and SC in Table 2. Table 2: Learning algorithms of MC and SC. Input: Sample set S = {(x i , y i )|1 ≤ i ≤ m} where y i = ±1; Detection rate r in each layer. Output: H k (x) = P t α kt(x), k = 1, · · · , K. Loop: For t = 1, · · · , T (MC) P i w kih kt (x i ). – Find weak classifiers h t, (h kt = h t, k = 1, · · · , K) that maximize P K k=1 – Find the weak-learner weights α kt , (k = 1, · · · , K) that maximize Γ (H + α kt h kt ). – Update weights by Eq.11. (SC) For k = 1, · · · , K – Find weak classifiers h kt that maximize P i w kih kt (x i ). – Find the weak-learner weights α kt that maximize Γ (H + α kt h kt ). – Update weights by Eq.11. Update thresholds θ k (k = 1, · · · , K) to satisfy detection rate r. 4.3 A general EMC-Boost The three components of EMC-Boost have different properties. CC takes all samples as one cluster while MC or SC considers that the whole sample space consists of multiple clusters. CC tends to distinguish positives from negatives; while SC tends to cluster the sample space, and MC can do both work at the same time but not so accurately as CC in classifying positives from negatives or as SC in clustering. Compared with SC, one particular advantage of MC is sharing weak features among all clusters. We combine these three components to become a general EMC-Boost as shown in Fig. 3 (d) which contains five steps:(Note that Step 2 is similar to CBT [14]) Step 1. CC learns a classifier for all samples considered as one category. Step 2. K-means algorithm clusters sample space with learned weak features. Step 3. MC clusters sample space coarsely. Step 4. SC clusters the sample space further. Step 5. CC learns a classifier for each cluster center. 5 Multiple Video Sampling In order to deal with the change in video frame rate or abrupt motion problem, we introduce a Multiple Video Sampling (MVS) strategy as illustrated in Fig. 4 (a) and (b). Considering five consecutive frames in (a), a positive sample is made up of two frames, where one is the first frame and the other is from the

Human Detection in Video over Large Viewpoint Changes 1255 1 2 3 4 5 (a) 1 2 1 3 1 4 1 5 (b) (c) Fig. 4: The MVS strategy and some positive samples. Appearance Motion Appearance+Motion 23.3% i th cluster k th cluster 34.9% 15.5% 1 st stage 2 nd stage (a) 1 st stage 2 nd stage (b) 10.5% 15.8% Fig. 5: Two-Stage Tree Structure in (a) and an example in (b). The number in the box gives the percentage of samples belonging to that branch. next four frames as shown in (b). In other words, one annotation corresponds to five consecutive frames and generates 4 positives. Some more positives are shown in Fig. 4 (c). Suppose that the original frame rate is R and the used positives consist of the 1 st and the r th frames (r > 1), then the possible frame rate covered by MVS strategy is R/(r −1). If these positives are extracted from 30 fps videos, the trained detector is able to deal with 30fps(30/1),15fps(30/2),10fps(30/3) and 7.5fps(30/4) videos where r is 2, 3, 4 and 5 respectively. 6 Overview of Our Approach We adopt EMC-Boost selecting I 2 CF as weak features to learn a strong classifier for multiple viewpoint human detection, in which positive samples are achieved through MVS strategy. Due to the large amount of samples and features, it is difficult to learn a detector directly by a general EMC-Boost. We modify the detector structure slightly and propose a new structure containing two stages as shown in Fig. 5 (a) with an example in (b), which is called two-stage tree structure: In the 1 st stage, it only uses appearance information for learning and clustering; In the 2 nd stage, it uses both appearance and motion information for clustering first, and then for learning classifiers for all clusters. 7 Experiments We carry out some experiments to evaluate our approach by False Positive Per Image (FPPI) on several real-world challenging datasets, ETHZ, PET2007 and our own collected dataset. When the intersection between a detection response

Page 1 and 2: Human Detection in Video over Large



Page 7: Human Detection in Video over Large

Page 11 and 12: Reca l Reca l Reca l Reca l Reca l


detection

features

viewpoint

feature

granules

frames

multiple

boost

cluster

ethz

media.cs.tsinghua.edu.cn

Human Detection in Video over Large Viewpoint Changes

Human Detection in Video over Large Viewpoint Changes ... View more Human Detection in Video over Large Viewpoint Changes

Delete template?

Save as template ?

Human Detection in Video over Large Viewpoint Changes Human Detection in Video over Large Viewpoint Changes