Fast Human Detection Using Node-Combined Part Detector

2011 18th IEEE International Conference on Image Processing 

FAST HUMAN DETECTION USING NODE-COMBINED PART DETECTOR 

Song CAO 

Department of Electronic Engineering, 

Tsinghua University, Beijing 100084, China 

Genquan DUAN, Haizhou AI 

Department of Computer Science and Technology, 

Tsinghua University, Beijing 100084, China 

ABSTRACT 

Detecting people in occlusion and articulated pose remains a 

big challenging problem in computer vision. To achieve a fast 

and accurate human detection algorithm, Node-Combined 

Part Detector (NCPD) Model is proposed in this paper. 

We make two major contributions: (1) We propose a novel 

method, torso-nodes combination, to integrate part detectors. 

(2) We adopt stable part detectors described by Associated 

Paring Comparison Features (APCF) and trained with Real- 

AdaBoost algorithm. This new human detection algorithm is 

not only much faster than the previous work but also maintaining 

competitive accuracy with the state-of-the-art human 

detection system. Besides, the algorithm performs better 

within low false alarm. For average time per image, our algorithm 

can achieve speedup rate of about 10x as compared 

with Deformable Part based Model (DPM) and over 125x as 

compared with Poselet Model. 

Index Terms— Object Detection, Node-Combined Part 

Detector, Occlusion, High Articulation 

1. INTRODUCTION 

Object detection is to locate objects in images, e.g. face detection 

[1] and pedestrian detection [2], which is well studied 

in computer vision. However, Detecting people in occlusion 

and high articulation remains a big challenge. There are 

mainly two difficulties for human detection: 1) Humans are 

non-rigid objects which cause variations in contour, shape and 

color, thus it is hard to use one holistic classifier to describe 

all the situations and variations. 2) There are occlusions, due 

to a multitude of occluding accessories such as backpacks, 

clothes, bags, or due to other persons and objects. To handle 

this challenge, part based model becomes popular [3] [4] [5], 

which can be regarded as providing more variables to describe 

a highly varied object. But how shall we select and train these 

part detectors? How to integrate them into an efficient robust 

human detector? 

Various algorithms have been proposed for human detection 

to deal with occlusion or articulated pose. Deformable 

Part based Model (DPM) [5] based on Histograms of Oriented 

Gradients (HOG) features [2] combined with Latent Support 

Vector Machines (LSVM) training strategy was proposed 

in [5] for object detection, in which several part detectors are 

learned within the model root (a bound box of object). The 

authors established a star model which made each part detector 

has its deformable position relationship with the model 

root. The inner part detectors contribute to a better description 

of inner details of an object, which explores more information 

for object detection. 

Poselet is an innovative work that was first proposed in 

[6], which achieves state-of-the-art results in the detection 

and segmentation of human in PASCAL Visual Object Classes 

(PASCAL VOC) [7]. In Poselet, the authors randomly select 

patches from the training images as seed poselets (poselet 

can be folded hands, occluded legs, hands holding up and so 

on). Each poselet is described by HOG feature and trained 

with linear SVM. Then the random selected poselet detectors 

are cluttered and have their own prediction of potential 

human location. Many weak and random selected poselets 

indicate human position and achieve state-of-the-art results in 

PASCAL VOC human detection in the recent several years. 

However, two issues exist in the Poselet based detection algorithm. 

The first issue is that it is relatively time-consuming 

because much of the time is spent on the detection of poselets 

and exploiting context among poselets. The other one is 

that most of the random selected poselet detectors have a relative 

low accuracy and most of poselets indicate the same body 

parts like face and head shoulder. 

Reviewing progress of detection problems, Boosting 

trained detector, eg. face detection [1], pedestrian detection 

[8] has proven to be efficient and accurate. To achieve a highly 

efficient detection algorithm, we propose Node Combined 

Part Detector (NCPD) Model which involves four stable part 

detectors described by Associated Paring Comparison Features 

(APCF) and trained with Real-AdaBoost algorithm. 

Our approach is an experimental study on AdaBoost based 

part detectors for human detection. 

We consider precise and well-trained part detectors are 

the key to real-time human detection in occlusion and high 

articulation. We pick up several stable part detectors integrated 

by the torso-nodes as demonstrated in Fig.1. We consider 

our stable part detectors should not only have a high detection 

accuracy, but also cover most of poselets used in [4]. 

Therefore, in implementation, four stable part detectors (i.e. 

face, head shoulder, upper body, whole body) are adopted. 

978-1-4577-1302-6/11/$26.00 ©2011 IEEE 3650


We integrated stable part detectors through torso-nodes to establish 

our NCPD Model. This new human detection algorithm 

can speed up the detection procedure significantly while 

maintaining an competitive accuracy similar to the existing 

state-of-the-art methods. 

Fig. 1. NCPD Model. The left image is the structure of our 

NCPD model. The right image explicitly demonstrates our 

stable part detectors 

Our contributions are summarized as follows: (1) Node- 

Combined Part Detector (NCPD) Model is proposed to integrate 

stable part detectors with torso-nodes. (2) Stable 

part detectors are learned by AdaBoost using APCF features 

which obtains high efficiency in human detection. 

The rest of this paper is organized as follows: The following 

Section gives the overview of our approach. Section 

3 presents the NCPD Model proposed in this paper. While 

in Section 4, we demonstrate the training methods of our stable 

part detectors, Quantitative experiments and evaluations 

on PASCAL VOC test datasets are carried out in Section 

5. Finally, conclusion and future work are offered in the last 

Section. 

2. OVERVIEW OF OUR APPROACH 

Our approach mainly contains three steps. The first step is to 

train our part detectors. To improve the human detection accuracy, 

we should require our part detectors to be robust with 

fewer variations. Based on such an idea, we train detectors for 

parts, e.g. face, head shoulder, upper body and whole body 

which will be explicitly explained in Sec.4. The second step 

is to integrate our stable part detectors as an efficient robust 

human detector. We propose Node-Combined Part Detector 

(NCPD) Model in Sec.3.2, where each stable part detector 

has a prediction of the position of torso-nodes. Finally, postprocessing 

is made by non-maximum suppression. Following 

this procedure, we obtain our efficient human detector which 

achieves competitive results in several challenging datasets. 

3. NODE COMBINED PART DETECTOR (NCPD) 

MODEL 

3.1. Stable Part Detectors 

We consider that a human in high articulation and occlusion 

can be described by many variables. Assuming there are N 

poselets in the human detection system where each poselet 

represents a variable, thus each person can be described by 

a N-length vector based on poselets representation. However, 

in a detection problem, we should acknowledge that a N 

(usually N > 150) dimension space is large and extensively 

makes detection task more complexity. By observing that 

some variables are redundant and represent the same semantic 

meanings (e.g. many poselets are similar to face), we consider 

further reducing the dimension space by using limited, but 

principal variables. In practical, we suggest to use stable part 

detectors as the principle variables which have fewer variations 

in a highly articulated or occluded human. Motivated 

by [3], we define our part detectors to be face, head shoulder, 

upper body and whole body. These four detectors are stable 

and are suitable for human detection. Even in Poselet framework, 

most of the effective poselets are similar to these four 

body parts, and on the other hand, these four stable parts nearly 

cover most of useful poselets when poselets are applied in 

detection task. We have also considered adding in more stable 

detectors like legs, left body and right body in our algorithm. 

However, these detectors are in large variations and less discriminative 

as compared with background. To achieve high 

accuracy and efficiency, we do not adopt them in our current 

algorithm. 

3.2. Integration of Stable Part Detectors 

Reviewing other tree structure models [5] [9], all the parts 

are integrated by one model root. Observing some empirical 

knowledge that torso is always under the head with fewer s- 

patial variations, similar to Pictorial Structure [9] [10], our 

Node-Combined Part Detector (NCPD) Model is established 

in which torso is set as its root. However, different from [10], 

we adopt a new method, named as torso-nodes combination, 

to integrate our stable part detectors into an efficient robust 

human detector. Our method, applying Hough voting idea, 

uses the distribution of root configuration instead of root s- 

patial center, to integrate our stable part detectors. After detection 

procedure of all four stable part detectors, assuming 

we get n part recalls where we rank them descending with 

detection scores as P 1 , P 2 , . . . , P n . Specifically, P 1 is the 

highest-probability part recalls. Let L i (N 1 i , N2 i , N3 i , N4 i ) represent 

the root configuration of each part P i . We can particularly 

consider L i as the torso-nodes distribution, where N k i 

is a Gaussian Distribution trained from training dataset. (In 

implementation, four torso-nodes refer to left/right shoulders 

and left/right hips). We integrated two part detector recalls i 

and j using Kullback-Leibler divergence as follows: 

4∑ 

S ij = D KL (N k i , N k j ) + D KL (N k i , N k j ) (1) 

k=1 

where S ij is an integration distance. If S ij is no larger 

than a threshold, then part P i and part P j belong to the same 

person. We consider integrating part recalls from the highest 

3651


score one. We adopt this greedy search procedure because it 

utilizes the most reliable information first which owns a computational 

advantage. We sum up all the part recalls which 

belong to one potential human location as the final human detection 

score. Therefore, we integrate our stable part detectors 

under the framework of spatial consistence with the information 

from less varied torso-nodes. An example of integration 

strategy is demonstrated in Fig.2. 

Fig. 2. Integration of Stable Part Detectors. Red, yellow and 

blue bound boxes demonstrate detection recalls of face, head 

shoulder and upper body respectively. As the torso-nodes distribution 

of face and upper body are close, they are integrated 

into the same potential human location. 

4. TRAINING STABLE PART DETECTORS 

4.1. Weak Features 

Previously, HOG feature combined with linear SVM is a classic 

method in pedestrian detection which has the advantage 

of capturing gradient information except its high computation 

complexity in both memory and time. We consider that both 

gradient and appearance features are important in a detection 

procedure, therefore we adopt Associated Paring Comparison 

Features (APCF) [8] which has been proved very efficient 

and accurate in pedestrian detection. APCF is a feature 

which describes invariance of color and gradient of an object 

to some extent and it contains two essential elements, Pairing 

Comparison of Color (PCC) and Pairing Comparison of Gradient 

(PCG). A PCC is a Boolean color comparison of two 

granules and a PCG is a Boolean gradient comparison of two 

granules in which a granule is a square window patch. For 

more details, please refer to [8]. 

4.2. The Training Algorithm 

The Real AdaBoost [11] is used to learn Nested Cascade Detector 

[12] for part detection. For interested readers, please 

refer to [11] [12] for more details. 

5. EXPERIMENTS 

We use the PASCAL VOC 2009 training dataset for training, 

where we annotated the position of the four stable parts and 

torso-nodes. To demonstrate the effectiveness and efficiency 

of our NCPD Model, we make the experiments on PAS- 

CAL VOC test dataset, using the same criteria as the PAS- 

CAL VOC detection competition, that is, the detection can be 

regarded as true positive only if it gets a ratio of overlap area 

to union area up to 50%. However, not as previous work in 

Deformable Part based Model (DPM) and Poselet, we do not 

use a bound box adjustment strategy as post-processing procedure, 

though according to reports, this adjustment strategy 

will improve the detection average precision for about 1% to 

3%. All experiments are tested on a computer with Intel Core 

2, 2.63GHz, 4GB RAM. 

Performance comparison. We compare the detection accuracy 

with two of the best human detection methods, Deformable 

Part based Model (DPM) and Poselet. The comparison 

with our NCPD Model is shown in Fig.3. These ROC 

curves are based on the part of PASCAL test dataset which 

were released with annotations. It can be found that our model 

(NCPD Model) gives relatively higher detection rate by 5% to 

some extent as compared with existing methods. We achieve 

better detection accuracy than Poselet in PASCAL VOC 2008 

and 2010, while in PASCAL VOC 2009, we obtain a similar 

performance. However, we do not outperform Deformable 

Part based Model (DPM) in PASCAL VOC 2010. 

Speed comparison. We test our model for the speedup 

rate. The average times per image for each model and NCPD 

model speedup rate are summarized in Table 1 and Table 2 

where PASCAL VOC 2008, 2009 and 2010 test dataset were 

used. It shows that Poselet is a time-consuming method. Our 

NCPD Model is faster than DPM, and achieves speedup rate 

for about 10x, and 125x as compared with Poselet. We admit 

that cascade DPM [13] has already improved the speed of 

DPM. However, our method still reach a speedup rate about 

2x. While as reported in [13], to achieve high efficiency, 

cascade DPM might suffer a loss in accuracy comparing with 

original version of DPM. 

Fig.4 shows some results comparing Poselet with our 

method. Our method can better deal with occlusion and articulated 

pose (e.g. (a)(b) in Fig.4) than Poselet. Also, our 

NCPD model shows its effectiveness when integrating part 

detectors (e.g. (c)(d) in Fig.4). This torso-nodes combination 

idea helps us get higher performance in low false alarm rate 

by effectively integrating our boosted stable part detectors. 

Therefore, we achieve a fast and accurate human detection 

algorithm using our NCPD model. 

Table 1. Average time per image for different models. 

average time per image 

PASCAL test dataset 2008 2009 2010 

Poselet 112s 118s 121s 

DPM 8.95s 9.03s 9.01s 

NCPD model 0.89s 0.87s 0.93s 

3652


Fig. 3. ROC curves comparison for three different models. (a) PASCAL 2008 dataset (197 pictures, 412 annotations). (b) 

PASCAL 2009 dataset (72 pictures, 162 annotations). (c) PASCAL 2010 dataset (505 pictures, 737 annotations) 

Table 2. NCPD model speedup rate. 

average time per image 

PASCAL test dataset 2008 2009 2010 

cf. Poselet 125.8x 135.6x 130.1x 

cf. DPM 10.1x 10.4x 9.7x 

Fig. 4. Detection Results. The first row is the detection results 

of Poselet. The second row is the detection results of our 

NCPD model 

6. CONCLUSION 

In this paper, we focus on human detection in occlusion and 

high articulation which remains a challenging problem in 

computer vision. We propose Node-Combined Part Detector 

(NCPD) Model which integrates stable part detectors using 

less varied torso-nodes into an efficient and robust human 

detector. Different from most previous part based work, we 

use AdaBoost with APCF features to train our part detectors. 

Our approach is well performing in occlusion and high articulation, 

and it demonstrates competitive detection accuracy 

and fast speed for human detection. We conclude that the 

model described in this paper for detecting people is equally 

applicable to other object categories. This is the subject of an 

ongoing research. 

7. ACKNOWLEDGEMENT 

This work is supported by National Science Foundation of 

China under grant No.61075026. 

8. REFERENCES 

[1] P. Viola and M. Jones., “Rapid object detection using a boosted 

cascade of simple features,” in Proc. CVPR, 2001. 

[2] N. Dalal and B. Triggs, “Histogram of oriented gradients for 

human detection,” in Proc. CVPR, 2005. 

[3] G. Duan, H. Ai, and S. Lao, “A structural filter approach to 

human detection,” in Proc. ECCV, 2010. 

[4] L. Bourdev, S. Maji, T. Brox, and J. Malik, “Detecting people 

using mutually consistent poselet activations,” in Proc. ECCV, 

2010. 

[5] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, 

“Object detection with discriminatively trained part 

based models,” IEEE Transactions on Pattern Analysis and 

Machine Intelligence, vol. 32, no. 9, 2010. 

[6] L. Bourdev and J. Malik, “Poselets: Body part detectors trained 

using 3d human pose annotations,” in Proc. ICCV, 2009. 

[7] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and 

A. Zisserman, “The pascal visual object classes (voc) challenge,” 

International Journal of Computer Vision, vol. 88, no. 

2, 2010. 

[8] G. Duan, C. Huang, H. Ai, and S. Lao, “Boosting associated 

pairing comparison features for pedestrian detection,” in Proc. 

ICCV Workshop, 2009. 

[9] P. Felzenszwalb and D. Huttenlocher, “Pictorial structures for 

object recognition,” International Journal of Computer Vision, 

vol. 61, no. 1, pp. 234–778, 2005. 

[10] M. Andriluka, S. Roth, and B. Schiele, “Pictorial structures 

revisited: People detection and articulated pose estimation,” in 

Proc. CVPR, 2009. 

[11] R. E. Schapire and Y. Singer, “Improved boosting algorithmsusing 

confidence-rated predictions,” Machine Learning, pp. 

297–336, 1999. 

[12] C. Huang, H. Ai, B. Wu, and S. Lao, “Boosting nested cascade 

detector for multi-view face detection,” in Proc. ICPR, 2004. 

[13] P. Felzenszwalb, R. Girshick, and D. McAllester, “Cascade 

object detection with deformable part models,” in Proc. CVPR, 

2010. 

3653

Fast Human Detection Using Node-Combined Part Detector

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?