Human Detection in Video over Large Viewpoint Changes

27.09.2014 Views
1250 G. Duan, H. Ai, and S. Lao g(x, y, s), where (x, y) is the position and s is the scale. For instance, g(x, y, s) indicates that the size of this granule is 2 s × 2 s and its left-top corner is at position (x, y) of an image. In an image I, it can be calculated as g(x, y, s) = 2 1 ∑ s −1 2 s × 2 s j=0 2∑ s −1 k=0 I(x + k, y + j). (1) s is set to 0,1,2 or 3 in this paper and the four typical granules are shown in Fig. 2 (a). In order to calculate the distance between two granules, Granular space G is mapped into 3D space I, where for each element g ∈ G and γ ∈ I, g(x, y, s) → γ(x + 2 s , y + 2 s , 2 s ). The distance between two granules in G is defined to be the Euclidean distance between two corresponding points in I, d(g 1 , g 2 ) = d(γ 1 , γ 2 ) where g 1 , g 2 ∈ G, γ 1 , γ 2 ∈ I and γ 1 , γ 2 correspond to g 1 , g 2 respectively. d(γ 1 (x 1 , y 1 , z 1 ), γ 2 (x 2 , y 2 , z 2 )) = √ (x 1 − x 2 ) 2 + (y 1 − y 2 ) 2 + (z 1 − z 2 ) 2 . (2) 3.2 Intra-frame and Inter-frame Comparison Features (I 2 CF s) Similar to the approach in [1], we consider two frames each time, the previous one and the latter one, from which two pairs of granules are extracted to fully capture the appearance and motion features of an object. An I 2 CF can be represented as a five-tuple c = (mode, g1, i g j 1 , gi 2, g j 2 ), which is also called a cell according to [5]. The mode is Appearance mode, Difference mode or Consistent mode. g1, i g j 1 , gi 2 and g j 2 are four granules. The first pair of granules, gi 1 and g j 1 are from the previous frame to describe the appearance of an object. The second pair of granules, g2 i and g j 2 comes from the previous or latter frame to describe either appearance or motion information. When the second pair are from the previous frame, which means that both pairs are from the previous frame, this kind of feature is Intra-frame Comparison Feature (Intra-frame CF); when the second pair come from the latter one, the feature becomes Inter-frame Comparison Feature (Inter-frame CF). Both of these two kinds of comparison features are combined to be Intra-frame and Inter-frame Comparison Feature (I 2 CF ). Appearance mode (A-mode). Pairing Comparison of Color feature (PCC) is proved to be simple, fast and efficient in [5] . As PCC can describe the invariance of color to some extent, we extend this idea to 3D space. A-mode compares two pairs of granules simultaneously: f A (g i 1, g j 1 , gi 2, g j 2 ) = gi 1 ≥ g j 1 &&gi 2 ≥ g j 2 . (3) PCC feature is a special case of A-mode. f A (g i 1, g j 1 , gi 2, g j 2 ) = gi 1 ≥ g j 1 , when g i 1 == g i 2 and g j 1 == gj 2 . Difference mode (D-mode). D-mode computes the absolute subtractions of two pairs of granules, defined as: f D (g1, i g j 1 , gi 2, g j 2 ) = |gi 1 − g2| i ≥ |g j 1 − gj 2 |. (4)

Human Detection in Video over Large Viewpoint Changes 1251 g(2,4,1) g(3,11,2) g(16,2,0) g(10,6,3) g2 g1 g2 g1 g3 g4 g1 g2 g3 g4 (a) (b) (c) (d) Fig. 2: Our proposed I 2 CF . (a) Granular space with four scales (s = 0,1,2,3) of granules comes from [5]. (b)Two granules g 1 and g 2 connected by a solid line form one pair of granules applied in APCF [5]. (c) Two pairs of granules are used in each cell of I 2 CF . The solid line between g 1 and g 2 (or g 3 and g 4) means that g 1 and g 2 (or g 3 and g 4) come from the same frame. The dashed line connecting g 1 and g 3 (or g 2 and g 4) means that the locations of g 1 and g 3 (or g 2 and g 4) are related. This relation of locations is shown in (d). For example, g 3 is in the neighborhood of g 1. This way reduces the feature pool a lot but still reserves the discriminative weak features. The motion filters in [1] [2] calculate the difference between one region and a shifted one by moving it up, left, right or bottom 1 or 2 pixels in the second frame. There are three main differences between the D-mode and those methods: 1) The restriction for the locations of these regions is defined spatially and much looser; 2) D-mode considers two pair of regions each time; 3) The only operation of D-mode is a comparison operator after subtractions. Consistent-mode (C-mode). C-mode compares the sums of two pairs of granules to take advantage of consistent information in the appearance of one frame or successive frames, defined as: f C (g1, i g j 1 , gi 2, g j 2 ) = (gi 1 + g2) i ≥ (g j 1 + gj 2 ). (5) C-mode is much simpler and can be quickly calculated compared with 3D volumetric features [10] and spatial temporal patches [11]. An I 2 CF of length n is represented as {c 0 , c 1 , · · · , c n−1 } and its feature value is defined as a binary concatenation of corresponding functions of cells in reverse order as f I2 CF = [b n−1 b n−2 · · · b 2 b 1 ], where b k = f(mode, g i 1, g j 1 , gi 2, g j 2 ) for 0 ≤ k < n. ⎧ ⎪⎨ f A (g1, i g j f(mode, g1, i g j 1 , gi 2, g j 1 , gi 2, g j 2 ), mode = A, 2 ) = f ⎪ D (g1, i g j 1 , gi 2, g j 2 ), mode = D, (6) ⎩ f C (g1, i g j 1 , gi 2, g j 2 ), mode = C. 3.3 Heuristic learning I 2 CF s Feature reduction. For 58 × 58 samples, there are ∑ 3 s=0 (58 − 2s + 1) × (58 − 2 s +1) = 12239 granules in total and the feature pool contains 3×12239 2 ≃ 6.7× 10 16 weak features without any restrictions, which make the training time and

Page 1 and 2: Human Detection in Video over Large

Page 3: Human Detection in Video over Large



Page 11 and 12: Reca l Reca l Reca l Reca l Reca l


detection

features

viewpoint

feature

granules

frames

multiple

boost

cluster

ethz

media.cs.tsinghua.edu.cn

Human Detection in Video over Large Viewpoint Changes

Human Detection in Video over Large Viewpoint Changes ... View more Human Detection in Video over Large Viewpoint Changes

Delete template?

Save as template ?

Human Detection in Video over Large Viewpoint Changes Human Detection in Video over Large Viewpoint Changes