27.09.2014 Views

Human Detection in Video over Large Viewpoint Changes

Human Detection in Video over Large Viewpoint Changes

Human Detection in Video over Large Viewpoint Changes

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Human</strong> <strong>Detection</strong> <strong>in</strong> <strong>Video</strong> <strong>over</strong> <strong>Large</strong> Viewpo<strong>in</strong>t <strong>Changes</strong> 1249<br />

and gradient of an object to some extent. HOG computes oriented gradient distribution<br />

<strong>in</strong> a rectangle image w<strong>in</strong>dow. An edgelet is a short segment of l<strong>in</strong>e or<br />

curve, which is predef<strong>in</strong>ed based on prior knowledge. 2) <strong>Detection</strong> <strong>over</strong> videos<br />

as [1] [2]. Both of them are already mentioned <strong>in</strong> the previous section. 3) Object<br />

track<strong>in</strong>g as [8] [9]. Some need manual <strong>in</strong>itializations as <strong>in</strong> [8], and some are<br />

with the aid of detection as <strong>in</strong> [9]. 4) Detect<strong>in</strong>g events or human behaviors. 3D<br />

volumetric features [10] are designed for event detection, which can be 3D harr<br />

like features. ST-patch [11] is used for detect<strong>in</strong>g behaviors. Inspired by those<br />

works, we propose Intra-frame and Inter-frame Comparison Features (I 2 CF s)<br />

to comb<strong>in</strong>e appearance and motion <strong>in</strong>formation.<br />

Due to the large variation <strong>in</strong> human appearance <strong>over</strong> wide viewpo<strong>in</strong>t changes,<br />

it is impossible to tra<strong>in</strong> a usable detector by tak<strong>in</strong>g the sample space as a<br />

whole. The solution is, divide and conquer, to cluster the sample space <strong>in</strong>to<br />

some subspaces dur<strong>in</strong>g tra<strong>in</strong><strong>in</strong>g. A subspace can be dealt with as one class and<br />

the difficulty exists ma<strong>in</strong>ly <strong>in</strong> cluster<strong>in</strong>g the sample space. An efficient way is<br />

to cluster sample space automatically like <strong>in</strong> [12] [13] [14]. Clustered Boost<strong>in</strong>g<br />

Tree (CBT) [14] splits sample space automatically by the already learned discrim<strong>in</strong>ative<br />

features dur<strong>in</strong>g tra<strong>in</strong><strong>in</strong>g process for pedestrian detection. Mixture of<br />

Experts [12] (MoE) jo<strong>in</strong>tly learns multiple classifiers and data partitions. It emphasizes<br />

much on local experts and is suitable when <strong>in</strong>put data can be naturally<br />

divided <strong>in</strong>to homogeneous subsets, which is even impossible for a fixed viewpo<strong>in</strong>t<br />

of human as shown <strong>in</strong> Fig. 1. MC-Boost [13] co-clusters images and visual<br />

features by simultaneously learn<strong>in</strong>g image clusters and boost<strong>in</strong>g classifiers. Risk<br />

map, def<strong>in</strong>ed on pixel level distance between samples, is also used to reduce the<br />

search<strong>in</strong>g space of the weak classifiers <strong>in</strong> [13]. To solve our problem, we propose<br />

an Enhanced Multiple Clusters Boost (EMC-Boost) algorithm to co-cluster the<br />

sample space and discrim<strong>in</strong>ative features automatically which comb<strong>in</strong>es the benefits<br />

of Cascade [15], CBT [14] and MC-Boost [13]. The selection of EMC-Boost<br />

<strong>in</strong>stead of MC-Boost is discussed <strong>in</strong> Sec. 7.<br />

Our contributions are summarized <strong>in</strong> four folds: 1) Intra-frame and Interframe<br />

Comparison Features (I 2 CF s) are proposed to comb<strong>in</strong>e appearance and<br />

motion <strong>in</strong>formation for human detection <strong>in</strong> video <strong>over</strong> large viewpo<strong>in</strong>t changes;<br />

2) An Enhanced Multiple Clusters Boost (EMC-Boost) algorithm is proposed<br />

to co-cluster the sample space and discrim<strong>in</strong>ative features automatically; 3) A<br />

Multiple <strong>Video</strong> Sampl<strong>in</strong>g (MVS) strategy is used to make our approach robust<br />

to human motion and video frame rate changes; 4) A two-stage tree structure<br />

detector is presented to fully m<strong>in</strong>e the discrim<strong>in</strong>ative features of the appearance<br />

and motion <strong>in</strong>formation. The experiments <strong>in</strong> challeng<strong>in</strong>g real-world scenes show<br />

that our approach is robust to human motion and frame rate changes.<br />

3 Intra-frame and Inter-frame Comparison Features<br />

3.1 Granular space<br />

Our proposed discrim<strong>in</strong>ative feature is def<strong>in</strong>ed <strong>in</strong> Granular space [16]. A granule<br />

is a square w<strong>in</strong>dow patch def<strong>in</strong>ed <strong>in</strong> grey images, which is def<strong>in</strong>ed as a triplet

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!