13.07.2015 Views

robust object tracking with radial basis function networks

robust object tracking with radial basis function networks

robust object tracking with radial basis function networks

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Initial FrameObject/BackgroundSeparationCurrentFrameFeatureExtractionFeatureExtraction(a)ObjectClassifierNon−ObjectClassifier(b)ObjectClassifierNon−ObjectClassifierObjectPositionEstimationObject ModelObjectModelObjectPositionFig. 1. (a) Object Model development : Training phase (b) Object<strong>tracking</strong> : Testing Phase.of the labeled <strong>object</strong> and background pixels. The <strong>object</strong> and non<strong>object</strong>classier were tuned to maximize classication accuracy for<strong>object</strong> and background pixels correspondingly. Only the probabilityinformation of reliable <strong>object</strong> pixels, which are classied as <strong>object</strong>in both classiers, are used for building target <strong>object</strong> model. The<strong>tracking</strong> phase is illustrated in Fig. 1(b). Object localization starts atthe center of the <strong>object</strong> window in the frame where it was previouslytracked. In order to nd the <strong>object</strong> pixels, the extracted features fromthis location are tested <strong>with</strong> both <strong>object</strong> and non-<strong>object</strong> classiers.The displacement of the <strong>object</strong> is given by the shift in centroid of the<strong>object</strong> pixels. The <strong>object</strong> location is iteratively shifted and tested untilthe convergence. The cumulative displacement indicates the shiftin <strong>object</strong> location for the current frame.3. RBF NETWORKS-BASED OBJECT TRACKERThe following subsections explain the main modules of the proposedRBF Networks-based <strong>tracking</strong> system. In the following subsection,we detail the procedure for separating the foreground andbackground pixels.3.1. Object-background separationEfcient <strong>object</strong> <strong>tracking</strong> heavily depends on how well the <strong>object</strong> ismodeled free from background pixels. In order to obtain a reliable<strong>object</strong> model, we separate the <strong>object</strong> region from the surroundingbackground in the rst frame of the video sequence. The <strong>object</strong>backgroundseparation is used for labeling the <strong>object</strong> and backgroundpixels. The R-G-B based joint probability density <strong>function</strong> (pdf) ofthe <strong>object</strong> region and that of a neighborhood surrounding the <strong>object</strong>is obtained. This process is illustrated in Fig. 2. The region<strong>with</strong>in the red rectangle is used to obtain the <strong>object</strong> pdf and theregion between the green and red rectangles is used for obtainingthe background pdf [12]. The resulting log-likelihood ratio of foreground/backgroundregion is used to determine <strong>object</strong> pixels. Thelog-likelihood of a pixel considered <strong>with</strong>in the outer bounding rectangleis (green rectangle in Fig. 2) obtained asL i =log max{ho(i),ɛ}(1)max{h b (i),ɛ}where h o(i) and h b (i) are the probabilities of ith pixel belongingto the <strong>object</strong> and background respectively; and ɛ is a small non-zerovalue to avoid numerical instability. The non-linear log-likelihood(a) (b) (c)Fig. 2. (a) Initial frame <strong>with</strong> <strong>object</strong> boundary (b) likelihood map L(c) Mask obtained after morphological operations (T ).maps the multimodal <strong>object</strong>/background distribution as positive valuesfor colors associated <strong>with</strong> foreground and negative values forbackground. The binary weighting factor M for ith pixel is1 if Li >τM i = o(2)0 otherwisewhere, τ o is the threshold to decide on the most reliable <strong>object</strong> pixels.Once the <strong>object</strong> is localized, by user interaction or detection in therst frame, the likelihood map of the <strong>object</strong>/background is obtainedusing (2). In our experiments, τ o is set at 0.8.3.2. Feature ExtractionFeature extraction is one of the computationally intensive modulesin most of the classication problems. In order to perform <strong>object</strong><strong>tracking</strong> in realtime we need to extract features that can distinguishclasses effectively and incurs minimal computational load. In theproposed tracker the feature used for modeling the <strong>object</strong> are simplepixel color based features. The pixel-based color features such asR-G-B and Y-Cb-Cr are extracted from the given video frame.4. RADIAL BASIS FUNCTION BASED OBJECT MODELIn this paper, the <strong>object</strong> model development is converted to a classicationof <strong>object</strong> pixels from the non-<strong>object</strong> (background) pixels.The <strong>object</strong> model is developed using two <strong>radial</strong> <strong>basis</strong> <strong>function</strong> (RBF)classiers, namely ‘Object Classier’ and ‘Non-<strong>object</strong> Classier’.Here, the <strong>object</strong> RBF classier is developed such that it maximizesthe classication of number of <strong>object</strong> pixels in the region of interest.Similarly, the non-<strong>object</strong> RBF classier maximizes the classicationof number of non-<strong>object</strong> pixels.In general, a two-class problem can be stated in the followingmanner. Suppose, we have the N observation samples {U i,T i} N i=1,where U i =[u i1, ··· ,u id ] ∈R d is a d-dimensional feature of ithsample and T i ∈{−1, +1} is its coded class label. If sample U i isassigned to <strong>object</strong> class then T i is one otherwise (background class)it is −1. The <strong>object</strong>ive of the classication problem is to estimatethe <strong>function</strong>al relationship between the random samples and its classlabels from the known set of data. Here, we use <strong>radial</strong> <strong>basis</strong> <strong>function</strong>classiers developed using fast learning algorithm [9] to approximatethe <strong>function</strong>al relationship.It has been proved in literature that the neural network basedclassier model developed using mean square error loss <strong>function</strong> canapproximate the posterior probability very well [11]. Since, the fastlearning algorithm also uses least square estimate to minimize theerror, the output of RBF network approximates the posterior probability.The posterior probability of pixel U i obtained using the <strong>object</strong>classier isˆp o (U i|c) ≈max(min(Ŷ,1), −1) + 12(3)I ­ 938


Similarly, the posterior probability of <strong>object</strong> pixel U i obtained usingthe non-<strong>object</strong> classier is!1 − max(min(Ŷ,1), −1) + 1ˆp b (U i|c) ≈(4)2Here, two classiers are used to estimate the posterior probabilityof the <strong>object</strong> pixels reliably. Since, ‘<strong>object</strong> classier’ maximizesthe number of <strong>object</strong> pixels, it might include some of the backgroundpixels as <strong>object</strong>. Similarly, the ‘non-<strong>object</strong> classier’ might includesome <strong>object</strong> pixels as background. The <strong>object</strong>ive here is to identifythe <strong>object</strong> pixels <strong>with</strong> high condence. Hence, we neglect the pixelswhich are assigned different class labels by <strong>object</strong> and non-<strong>object</strong>classiers. The posterior probability of these two classiers are usedto obtain the <strong>object</strong> model ˆp t as given below.ˆp t (U i|c) = min [ˆp o, ˆp b ] (5)The posterior probability of the target model is used for localizing<strong>object</strong> in the subsequent frames.4.1. Fast Learning Radial Basis Function ClassierIn this section, we present a brief overview of the fast learning algorithmfor <strong>radial</strong> <strong>basis</strong> <strong>function</strong> network [9]. Radial <strong>basis</strong> <strong>function</strong> networkis three layered feed-forward network. The rst layer is linearand only distributes the input signal, while the next layer is nonlinearand uses Gaussian <strong>function</strong>s. The third layer linearly combines theGaussian outputs.Using universal approximation property, one can say that thesingle hidden layer feed-forward network <strong>with</strong> sufcient number ofhidden neurons can approximate any <strong>function</strong> to any arbitrary levelof accuracy. Let μ i ∈R d and σ i ∈R + be the center and width ofith Gaussian hidden neuron, α be 1 × K the output weights and Nbe the number of pixels. The output (Ŷ ) of RBF network <strong>with</strong> Kneurons has the following form:Ŷ =KXj=1 −‖U − μj‖α j exp, (6)σ jEquation (6) can be written in matrix form asŶ = αY H (7)Here, Y H (is of dimension K × N) is called the hidden layer outputmatrix of the neural network; the ith row of Y H is the ith hidden neuronoutput <strong>with</strong> respect to inputs U 1,U 2, ··· ,U N . For most of thepractical problems, it is assumed that the number of hidden neuronsare always less than of number of training samples. In fast learningalgorithm, for a given number of hidden neurons (K), it is assumedthat the center μ and width σ of hidden neurons are selected randomly.The output weights are estimated analytically asα = TY † H(8)where Y † His the Moore-Penrose generalized pseudo-inverse of YH.The network output Ŷ is used for estimating the posterior probability(ˆp k ) using equations (3)-(5).(a) (b) (c)Fig. 3. Posterior probability of pixels for a given <strong>object</strong> window. (a)Object Classier (b) Non-<strong>object</strong> Classier (c) Object Model.4.2. Object LocalizationIn this section we explain the development of <strong>object</strong> model fromRBF classiers and <strong>object</strong> localization based on the estimated posteriorprobability. The posterior probability of the current <strong>object</strong> windowestimated by the <strong>object</strong> and non-<strong>object</strong> classiers for the <strong>object</strong>window (starting frame) of PETS video sequence are shown inFig. 3(a)-(b) respectively. The corresponding classication matrixare given below. 730 47747 69C o =C38 358b =(9)21 336From (9), we can observe that, <strong>object</strong> classier maximizes the classicationaccuracy for <strong>object</strong> class (C o(2, 2) = 358 >C b (2, 2) =336). Similarly non-<strong>object</strong> classier maximizes the classicationaccuracy for the background class (C b (1, 1) = 747 >C o(1, 1) =730). The <strong>object</strong> model (ˆp t) is developed using (5) and is shown inFig. 3(c). This target model uses only reliable <strong>object</strong> pixels whichare classied as <strong>object</strong> in both classiers.Let Xc k be the <strong>object</strong> center, Xi k be the candidate pixel locations(centered around Xc k )andˆp k c (i) be the corresponding posteriorprobability of ith candidate pixel at kth iteration. The posteriorprobability at kth iteration (ˆp k c ) is obtained by testing the features obtainedfrom locations Xi k . Now the new location of the <strong>object</strong> centeris estimated as the centroid of posterior probability (ˆp k c ) weighted bytarget model (ˆp t).X k+1c =PPi Xk i ˆp k c (i)ˆp t(i)i ˆpk c (i)ˆp t(i)(10)The iteration is terminated when the change in centroid location forany two consecutive iteration falls below a threshold t s. Typicalvalue of t s used in our experiments is in the range of 0-2.4.3. RBF-Networks Based Tracker Algorithm1. Select the <strong>object</strong> to be tracked in the initial frame by selectinga rectangle window around it.2. Separate the <strong>object</strong> from background based on the <strong>object</strong> likelihoodmap.3. Extract the <strong>object</strong> and background features from the labeled<strong>object</strong> and neighboring background pixels.4. Obtain the <strong>object</strong> model using the RBF- network.5. Test for <strong>object</strong> in the next frame starting from the previouslyobtained <strong>object</strong> location.6. Recursively obtain the <strong>object</strong> location for the current frameusing (10).7. Go to step 5.I ­ 939


Fig. 4. Tracking result of proposed system (solid yellow) againstmean-shift (dashed blue) tracker for ‘train’ sequence. Frames shown1, 10, 30, 60, 100 and 150.5. EXPERIMENTS AND DISCUSSIONSThe proposed algorithm has been tested on several complex videosequences. The ’train’ and ’walk’ videos used in our experimentsare shot by a handheld camcorder, and hence include a wide varietyof camera operations. For example the train sequence contains lotsof jerky motion <strong>with</strong> camera pan, tilt and zoom operations. We havecompared the performance of the proposed tracker against the meanshift(MS) tracker [13], which is known for <strong>robust</strong> <strong>object</strong> <strong>tracking</strong>in cluttered environment. Fig. 4 shows the <strong>tracking</strong> result for theproposed and MS tracker for a toy train moving in a cluttered background.Though both the proposed and the mean-shift tracker tracksthe <strong>object</strong>, the accuracy of the MS tracker degrades very much duringlarge displacement of the <strong>object</strong> due to camera panning operation.The last 2 frames of Fig. 4 shows how the accuracy of MStracker reduces <strong>with</strong> camera zoom-out operation. The proposed algorithmtracks the <strong>object</strong> <strong>with</strong> a better precision during the aforementionedcamera operations. Fig. 5 shows the <strong>tracking</strong> results fora PETS sequence. The proposed algorithm tracks the <strong>object</strong> throughout the sequence <strong>with</strong> better accuracy compared to the MS algorithm.The MS tracker fails at the end part of the sequence. Theproposed tracker has been tested for various speeds by temporallydownsampling the video. For example the result shown in Fig. 5is for downsampling factor 4. These results clearly show that thefast learning neural <strong>networks</strong> are well suited for <strong>tracking</strong> <strong>object</strong>s inreal-world scenario. Further, the RBF tracker uses, on an average,1-2 iteration to localize the <strong>object</strong> and the computational effort requiredto calculate the posterior probabilities in the testing phase isnegligible. Hence the proposed tracker is suitable for real-time applications.6. CONCLUSIONSIn this paper we have proposed an <strong>object</strong> tracker using RBF-<strong>networks</strong>.The fast learning algorithm is used to develop the <strong>object</strong> model atthe initial frame using two RBF classiers. The <strong>object</strong> localizationis achieved using the target model and the posterior probability ofthe current <strong>object</strong> window. The performance of the proposed trackeris compared <strong>with</strong> the well-known MS tracker for various complexvideo sequences. The proposed tracker provides better <strong>tracking</strong> accuracycompared to MS tracker. Since the testing phase of RBFnetwork incurs very low computational burden, the proposed trackeris suitable for real-time applications.Fig. 5. Tracking result of proposed system (solid yellow) againstmean-shift (dashed blue) tracker for ‘pets’ sequence.7. REFERENCES[1] A. Blake, “Visual <strong>tracking</strong>: a very short research roadmap,”Electronics Letters, vol. 42, no. 5, pp. 254–256, 2006.[2] Yi Lu Murphey and Yun Luo, “Feature Extraction for a MultiplePattern Classication Neural Network System,” PatternRecognition Proceedings, vol. 2, pp. 220–223, 2002.[3] M. Voultsidou, S. Dodel, and J. M. Herrmann, “Neural NetworksApproach to Clustering of Activity in fMRI Data,” IEEETrans. on Medical Imaging, vol. 24, no. 8, pp. 987–996, Aug.2005.[4] E. Trentin and M. Gori, “Robust combination of neural <strong>networks</strong>and hidden markov models for speech recognition,”IEEE Trans. on Neural Networks, vol. 14, no. 6, pp. 1519–1531, Nov. 2003.[5] C. Lee and D. A. Landgrebe, “Decision boundary feature extractionfor neural <strong>networks</strong>,” IEEE Transactions on NeuralNetworks, vol. 8, no. 1, pp. 75–83, 1997.[6] M. T. Musavi, W. Ahmed, K. H. Chan, K. B. Faris, and D. M.Hummels, “On training of <strong>radial</strong> <strong>basis</strong> <strong>function</strong> classiers,”Neural Networks, vol. 5, pp. 595–603, july-august 1992.[7] J. Moody and C. J. Darken, “Fast learning in network oflocally-tuned processing units,” Neural Computation, vol.1,pp. 281–294, 1989.[8] G.-B. Huang, Q. Y. Zhu, and C. K. Siew, “Extreme learningmachine: Theory and applications,” Neurocomputing, 2006.[9] G.-B. Huang and C.-K. Siew, “Extreme learning machine <strong>with</strong>randomly assigned rbf kernels,” International Journal of InformationTechnology, vol. 11, no. 1, pp. 16–24, 2005.[10] S. Avidan, “Support vector <strong>tracking</strong>,” in IEEE Conf. on ComputerVision and Pattern Recognition, 2001, vol. 1, pp. 184–191.[11] R. Rojas, “A short proof of the posterior probability propertyof classier neural <strong>networks</strong>,” Neural Computation, vol. 8, no.2, pp. 41–43, 1996.[12] R. T. Collins, “Mean-shift blob <strong>tracking</strong> through scale space,”in Proc. IEEE Conference on Computer Vision and PatternRecognition, Madison, Wisconsin, 2003, pp. 234–241.[13] D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based <strong>object</strong><strong>tracking</strong>,” IEEE Trans. Pattern Anal. Machine Intell., vol. 25,no. 5, pp. 564–577, 2003.I ­ 940

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!