27.09.2014 Views

Human Detection in Video over Large Viewpoint Changes

Human Detection in Video over Large Viewpoint Changes

Human Detection in Video over Large Viewpoint Changes

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1256 G. Duan, H. Ai, and S. Lao<br />

and a ground-truth box is larger than 50% of their union, we consider it to be<br />

a successful detection. Only one detection per annotation is counted as correct.<br />

For simplicity, the three typical viewpo<strong>in</strong>ts mentioned <strong>in</strong> Fig. 1 are represented<br />

as Horizontal Viewpo<strong>in</strong>t (HV), Slant Viewpo<strong>in</strong>t (SV) and Vertical Viewpo<strong>in</strong>t<br />

(VV) <strong>in</strong> turn from left to right.<br />

Datasets. The datasets used <strong>in</strong> the experiments are ETHZ dataset [20],<br />

PETS2007 dataset [21] and our own collected dataset. ETHZ dataset provides<br />

four video sequences, Seq.#0∼Seq.#3 (640×480 pixels at 15 fps). This dataset<br />

whose viewpo<strong>in</strong>t is near HV is recorded us<strong>in</strong>g a pair of cameras, and we only use<br />

the images provided by the left camera. PETS2007 dataset conta<strong>in</strong>s 9 sequences<br />

S00∼S08 (720×576 pixels at 30 fps) and each sequence has 4 fixed cameras and<br />

we choose the 3 rd camera whose viewpo<strong>in</strong>t is near SV. There are 3 scenarios <strong>in</strong><br />

PETS2007, with <strong>in</strong>creas<strong>in</strong>g scene complexity, loiter<strong>in</strong>g (S01 and S02), attended<br />

luggage removal (S03, S04, S05 and S06) and unattended luggage (S07 and S08).<br />

In the experiments, we use S01, S03, S05 and S06. In addition, we have collected<br />

several sequences by hand-held DV cameras: 2 sequences near HV (853×480<br />

pixels at 30 fps), and 2 sequences near SV (1280×720 pixels at 30 fps) and 8<br />

sequences near VV (2 sequences are 1280×720 pixels at 30 fps and the others<br />

are 704×576 pixels at 30 fps).<br />

Tra<strong>in</strong><strong>in</strong>g and test<strong>in</strong>g datasets. S01, S03, S05 and S06 of PETS2007 and<br />

our own dataset are labeled every five frames manually for tra<strong>in</strong><strong>in</strong>g and test<strong>in</strong>g,<br />

while ETHZ dataset provides the groundtruth already. The tra<strong>in</strong><strong>in</strong>g datasets<br />

conta<strong>in</strong> Seq.#0 of ETHZ, S01, S03 and S06 of PETS2007, 2 sequences (near HV),<br />

2 sequences (near SV) and 6 sequences (near VV) of ours. The test<strong>in</strong>g datasets<br />

conta<strong>in</strong> Seq.#1, Seq.#2 and Seq.#3 of ETHZ, S05 of PETS2007, and 2 sequences<br />

of ours (near VV). Note that the groundtruths of the <strong>in</strong>ternal unlabeled frames<br />

<strong>in</strong> the test<strong>in</strong>g datasets are achieved through <strong>in</strong>terpolation. The properties of the<br />

test<strong>in</strong>g datasets may have impacts on all detectors, like camera motion states<br />

(fixed or mov<strong>in</strong>g), illum<strong>in</strong>ation conditions (slightly or significantly light changes),<br />

etc. Details about the test<strong>in</strong>g datasets are summarized <strong>in</strong> Table 3.<br />

Tra<strong>in</strong><strong>in</strong>g detectors. We have labeled 11768 different humans <strong>in</strong> total and<br />

obta<strong>in</strong> 47072 positives after MVS, where the number of positives near HV, SV<br />

Table 3: Some details about the test<strong>in</strong>g datasets.<br />

Description Seq. 1 Seq. 2 S05 Seq.#1 Seq.#2 Seq.#3<br />

Source ours ours PETS2007 ETHZ ETHZ ETHZ<br />

Camera Fixed Fixed Fixed Mov<strong>in</strong>g Mov<strong>in</strong>g Mov<strong>in</strong>g<br />

Light changes Slightly Slightly Slightly Slightly Slightly Significantly<br />

Frame rate 30fps 30fps 30fps 15fps 15fps 15fps<br />

Size 704×576 704×576 720×576 640×480 640×480 640×480<br />

Frames 2420 1781 4500 999 450 354<br />

Annotations 591 1927 17067 5193 2359 1828

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!