27.09.2014 Views

Human Detection in Video over Large Viewpoint Changes

Human Detection in Video over Large Viewpoint Changes

Human Detection in Video over Large Viewpoint Changes

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Human</strong> <strong>Detection</strong> <strong>in</strong> <strong>Video</strong> <strong>over</strong> <strong>Large</strong> Viewpo<strong>in</strong>t <strong>Changes</strong> 1251<br />

g(2,4,1)<br />

g(3,11,2)<br />

g(16,2,0)<br />

g(10,6,3)<br />

g2<br />

g1<br />

g2<br />

g1<br />

g3<br />

g4<br />

g1<br />

g2<br />

g3<br />

g4<br />

(a)<br />

(b) (c) (d)<br />

Fig. 2: Our proposed I 2 CF . (a) Granular space with four scales (s = 0,1,2,3) of granules<br />

comes from [5]. (b)Two granules g 1 and g 2 connected by a solid l<strong>in</strong>e form one pair of<br />

granules applied <strong>in</strong> APCF [5]. (c) Two pairs of granules are used <strong>in</strong> each cell of I 2 CF .<br />

The solid l<strong>in</strong>e between g 1 and g 2 (or g 3 and g 4) means that g 1 and g 2 (or g 3 and g 4)<br />

come from the same frame. The dashed l<strong>in</strong>e connect<strong>in</strong>g g 1 and g 3 (or g 2 and g 4) means<br />

that the locations of g 1 and g 3 (or g 2 and g 4) are related. This relation of locations<br />

is shown <strong>in</strong> (d). For example, g 3 is <strong>in</strong> the neighborhood of g 1. This way reduces the<br />

feature pool a lot but still reserves the discrim<strong>in</strong>ative weak features.<br />

The motion filters <strong>in</strong> [1] [2] calculate the difference between one region and<br />

a shifted one by mov<strong>in</strong>g it up, left, right or bottom 1 or 2 pixels <strong>in</strong> the second<br />

frame. There are three ma<strong>in</strong> differences between the D-mode and those methods:<br />

1) The restriction for the locations of these regions is def<strong>in</strong>ed spatially and much<br />

looser; 2) D-mode considers two pair of regions each time; 3) The only operation<br />

of D-mode is a comparison operator after subtractions.<br />

Consistent-mode (C-mode). C-mode compares the sums of two pairs of<br />

granules to take advantage of consistent <strong>in</strong>formation <strong>in</strong> the appearance of one<br />

frame or successive frames, def<strong>in</strong>ed as:<br />

f C (g1, i g j 1 , gi 2, g j 2 ) = (gi 1 + g2) i ≥ (g j 1 + gj 2 ). (5)<br />

C-mode is much simpler and can be quickly calculated compared with 3D<br />

volumetric features [10] and spatial temporal patches [11].<br />

An I 2 CF of length n is represented as {c 0 , c 1 , · · · , c n−1 } and its feature<br />

value is def<strong>in</strong>ed as a b<strong>in</strong>ary concatenation of correspond<strong>in</strong>g functions of cells <strong>in</strong><br />

reverse order as f I2 CF = [b n−1 b n−2 · · · b 2 b 1 ], where b k = f(mode, g i 1, g j 1 , gi 2, g j 2 )<br />

for 0 ≤ k < n.<br />

⎧<br />

⎪⎨ f A (g1, i g j<br />

f(mode, g1, i g j 1 , gi 2, g j 1 , gi 2, g j 2 ), mode = A,<br />

2 ) = f<br />

⎪ D (g1, i g j 1 , gi 2, g j 2 ), mode = D, (6)<br />

⎩<br />

f C (g1, i g j 1 , gi 2, g j 2 ), mode = C.<br />

3.3 Heuristic learn<strong>in</strong>g I 2 CF s<br />

Feature reduction. For 58 × 58 samples, there are ∑ 3<br />

s=0 (58 − 2s + 1) × (58 −<br />

2 s +1) = 12239 granules <strong>in</strong> total and the feature pool conta<strong>in</strong>s 3×12239 2 ≃ 6.7×<br />

10 16 weak features without any restrictions, which make the tra<strong>in</strong><strong>in</strong>g time and

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!