27.09.2014 Views

Human Detection in Video over Large Viewpoint Changes

Human Detection in Video over Large Viewpoint Changes

Human Detection in Video over Large Viewpoint Changes

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Human</strong> <strong>Detection</strong> <strong>in</strong> <strong>Video</strong> <strong>over</strong> <strong>Large</strong> Viewpo<strong>in</strong>t <strong>Changes</strong> 1255<br />

1 2 3 4 5<br />

(a)<br />

1 2 1 3 1 4 1 5<br />

(b)<br />

(c)<br />

Fig. 4: The MVS strategy and some positive samples.<br />

Appearance Motion Appearance+Motion<br />

23.3%<br />

i th cluster<br />

k th cluster<br />

34.9%<br />

15.5%<br />

1 st stage 2 nd stage<br />

(a)<br />

1 st stage 2 nd stage<br />

(b)<br />

10.5%<br />

15.8%<br />

Fig. 5: Two-Stage Tree Structure <strong>in</strong> (a) and an example <strong>in</strong> (b). The number <strong>in</strong> the box<br />

gives the percentage of samples belong<strong>in</strong>g to that branch.<br />

next four frames as shown <strong>in</strong> (b). In other words, one annotation corresponds to<br />

five consecutive frames and generates 4 positives. Some more positives are shown<br />

<strong>in</strong> Fig. 4 (c). Suppose that the orig<strong>in</strong>al frame rate is R and the used positives<br />

consist of the 1 st and the r th frames (r > 1), then the possible frame rate c<strong>over</strong>ed<br />

by MVS strategy is R/(r −1). If these positives are extracted from 30 fps videos,<br />

the tra<strong>in</strong>ed detector is able to deal with 30fps(30/1),15fps(30/2),10fps(30/3) and<br />

7.5fps(30/4) videos where r is 2, 3, 4 and 5 respectively.<br />

6 Overview of Our Approach<br />

We adopt EMC-Boost select<strong>in</strong>g I 2 CF as weak features to learn a strong classifier<br />

for multiple viewpo<strong>in</strong>t human detection, <strong>in</strong> which positive samples are achieved<br />

through MVS strategy. Due to the large amount of samples and features, it is<br />

difficult to learn a detector directly by a general EMC-Boost. We modify the<br />

detector structure slightly and propose a new structure conta<strong>in</strong><strong>in</strong>g two stages<br />

as shown <strong>in</strong> Fig. 5 (a) with an example <strong>in</strong> (b), which is called two-stage tree<br />

structure: In the 1 st stage, it only uses appearance <strong>in</strong>formation for learn<strong>in</strong>g and<br />

cluster<strong>in</strong>g; In the 2 nd stage, it uses both appearance and motion <strong>in</strong>formation for<br />

cluster<strong>in</strong>g first, and then for learn<strong>in</strong>g classifiers for all clusters.<br />

7 Experiments<br />

We carry out some experiments to evaluate our approach by False Positive Per<br />

Image (FPPI) on several real-world challeng<strong>in</strong>g datasets, ETHZ, PET2007 and<br />

our own collected dataset. When the <strong>in</strong>tersection between a detection response

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!