robust object tracking with radial basis function networks

Initial FrameObject/BackgroundSeparationCurrentFrameFeatureExtractionFeatureExtraction(a)ObjectClassifierNon−ObjectClassifier(b)ObjectClassifierNon−ObjectClassifierObjectPositionEstimationObject ModelObjectModelObjectPositionFig. 1. (a) Object Model development : Training phase (b) Objecttracking : Testing Phase.of the labeled object and background pixels. The object and nonobjectclassier were tuned to maximize classication accuracy forobject and background pixels correspondingly. Only the probabilityinformation of reliable object pixels, which are classied as objectin both classiers, are used for building target object model. Thetracking phase is illustrated in Fig. 1(b). Object localization starts atthe center of the object window in the frame where it was previouslytracked. In order to nd the object pixels, the extracted features fromthis location are tested with both object and non-object classiers.The displacement of the object is given by the shift in centroid of theobject pixels. The object location is iteratively shifted and tested untilthe convergence. The cumulative displacement indicates the shiftin object location for the current frame.3. RBF NETWORKS-BASED OBJECT TRACKERThe following subsections explain the main modules of the proposedRBF Networks-based tracking system. In the following subsection,we detail the procedure for separating the foreground andbackground pixels.3.1. Object-background separationEfcient object tracking heavily depends on how well the object ismodeled free from background pixels. In order to obtain a reliableobject model, we separate the object region from the surroundingbackground in the rst frame of the video sequence. The objectbackgroundseparation is used for labeling the object and backgroundpixels. The R-G-B based joint probability density function (pdf) ofthe object region and that of a neighborhood surrounding the objectis obtained. This process is illustrated in Fig. 2. The regionwithin the red rectangle is used to obtain the object pdf and theregion between the green and red rectangles is used for obtainingthe background pdf [12]. The resulting log-likelihood ratio of foreground/backgroundregion is used to determine object pixels. Thelog-likelihood of a pixel considered within the outer bounding rectangleis (green rectangle in Fig. 2) obtained asL i =log max{ho(i),ɛ}(1)max{h b (i),ɛ}where h o(i) and h b (i) are the probabilities of ith pixel belongingto the object and background respectively; and ɛ is a small non-zerovalue to avoid numerical instability. The non-linear log-likelihood(a) (b) (c)Fig. 2. (a) Initial frame with object boundary (b) likelihood map L(c) Mask obtained after morphological operations (T ).maps the multimodal object/background distribution as positive valuesfor colors associated with foreground and negative values forbackground. The binary weighting factor M for ith pixel is1 if Li >τM i = o(2)0 otherwisewhere, τ o is the threshold to decide on the most reliable object pixels.Once the object is localized, by user interaction or detection in therst frame, the likelihood map of the object/background is obtainedusing (2). In our experiments, τ o is set at 0.8.3.2. Feature ExtractionFeature extraction is one of the computationally intensive modulesin most of the classication problems. In order to perform objecttracking in realtime we need to extract features that can distinguishclasses effectively and incurs minimal computational load. In theproposed tracker the feature used for modeling the object are simplepixel color based features. The pixel-based color features such asR-G-B and Y-Cb-Cr are extracted from the given video frame.4. RADIAL BASIS FUNCTION BASED OBJECT MODELIn this paper, the object model development is converted to a classicationof object pixels from the non-object (background) pixels.The object model is developed using two radial basis function (RBF)classiers, namely ‘Object Classier’ and ‘Non-object Classier’.Here, the object RBF classier is developed such that it maximizesthe classication of number of object pixels in the region of interest.Similarly, the non-object RBF classier maximizes the classicationof number of non-object pixels.In general, a two-class problem can be stated in the followingmanner. Suppose, we have the N observation samples {U i,T i} N i=1,where U i =[u i1, ··· ,u id ] ∈R d is a d-dimensional feature of ithsample and T i ∈{−1, +1} is its coded class label. If sample U i isassigned to object class then T i is one otherwise (background class)it is −1. The objective of the classication problem is to estimatethe functional relationship between the random samples and its classlabels from the known set of data. Here, we use radial basis functionclassiers developed using fast learning algorithm [9] to approximatethe functional relationship.It has been proved in literature that the neural network basedclassier model developed using mean square error loss function canapproximate the posterior probability very well [11]. Since, the fastlearning algorithm also uses least square estimate to minimize theerror, the output of RBF network approximates the posterior probability.The posterior probability of pixel U i obtained using the objectclassier isˆp o (U i|c) ≈max(min(Ŷ,1), −1) + 12(3)I 938

Similarly, the posterior probability of object pixel U i obtained usingthe non-object classier is!1 − max(min(Ŷ,1), −1) + 1ˆp b (U i|c) ≈(4)2Here, two classiers are used to estimate the posterior probabilityof the object pixels reliably. Since, ‘object classier’ maximizesthe number of object pixels, it might include some of the backgroundpixels as object. Similarly, the ‘non-object classier’ might includesome object pixels as background. The objective here is to identifythe object pixels with high condence. Hence, we neglect the pixelswhich are assigned different class labels by object and non-objectclassiers. The posterior probability of these two classiers are usedto obtain the object model ˆp t as given below.ˆp t (U i|c) = min [ˆp o, ˆp b ] (5)The posterior probability of the target model is used for localizingobject in the subsequent frames.4.1. Fast Learning Radial Basis Function ClassierIn this section, we present a brief overview of the fast learning algorithmfor radial basis function network [9]. Radial basis function networkis three layered feed-forward network. The rst layer is linearand only distributes the input signal, while the next layer is nonlinearand uses Gaussian functions. The third layer linearly combines theGaussian outputs.Using universal approximation property, one can say that thesingle hidden layer feed-forward network with sufcient number ofhidden neurons can approximate any function to any arbitrary levelof accuracy. Let μ i ∈R d and σ i ∈R + be the center and width ofith Gaussian hidden neuron, α be 1 × K the output weights and Nbe the number of pixels. The output (Ŷ ) of RBF network with Kneurons has the following form:Ŷ =KXj=1 −‖U − μj‖α j exp, (6)σ jEquation (6) can be written in matrix form asŶ = αY H (7)Here, Y H (is of dimension K × N) is called the hidden layer outputmatrix of the neural network; the ith row of Y H is the ith hidden neuronoutput with respect to inputs U 1,U 2, ··· ,U N . For most of thepractical problems, it is assumed that the number of hidden neuronsare always less than of number of training samples. In fast learningalgorithm, for a given number of hidden neurons (K), it is assumedthat the center μ and width σ of hidden neurons are selected randomly.The output weights are estimated analytically asα = TY † H(8)where Y † His the Moore-Penrose generalized pseudo-inverse of YH.The network output Ŷ is used for estimating the posterior probability(ˆp k ) using equations (3)-(5).(a) (b) (c)Fig. 3. Posterior probability of pixels for a given object window. (a)Object Classier (b) Non-object Classier (c) Object Model.4.2. Object LocalizationIn this section we explain the development of object model fromRBF classiers and object localization based on the estimated posteriorprobability. The posterior probability of the current object windowestimated by the object and non-object classiers for the objectwindow (starting frame) of PETS video sequence are shown inFig. 3(a)-(b) respectively. The corresponding classication matrixare given below. 730 47747 69C o =C38 358b =(9)21 336From (9), we can observe that, object classier maximizes the classicationaccuracy for object class (C o(2, 2) = 358 >C b (2, 2) =336). Similarly non-object classier maximizes the classicationaccuracy for the background class (C b (1, 1) = 747 >C o(1, 1) =730). The object model (ˆp t) is developed using (5) and is shown inFig. 3(c). This target model uses only reliable object pixels whichare classied as object in both classiers.Let Xc k be the object center, Xi k be the candidate pixel locations(centered around Xc k )andˆp k c (i) be the corresponding posteriorprobability of ith candidate pixel at kth iteration. The posteriorprobability at kth iteration (ˆp k c ) is obtained by testing the features obtainedfrom locations Xi k . Now the new location of the object centeris estimated as the centroid of posterior probability (ˆp k c ) weighted bytarget model (ˆp t).X k+1c =PPi Xk i ˆp k c (i)ˆp t(i)i ˆpk c (i)ˆp t(i)(10)The iteration is terminated when the change in centroid location forany two consecutive iteration falls below a threshold t s. Typicalvalue of t s used in our experiments is in the range of 0-2.4.3. RBF-Networks Based Tracker Algorithm1. Select the object to be tracked in the initial frame by selectinga rectangle window around it.2. Separate the object from background based on the object likelihoodmap.3. Extract the object and background features from the labeledobject and neighboring background pixels.4. Obtain the object model using the RBF- network.5. Test for object in the next frame starting from the previouslyobtained object location.6. Recursively obtain the object location for the current frameusing (10).7. Go to step 5.I 939

Fig. 4. Tracking result of proposed system (solid yellow) againstmean-shift (dashed blue) tracker for ‘train’ sequence. Frames shown1, 10, 30, 60, 100 and 150.5. EXPERIMENTS AND DISCUSSIONSThe proposed algorithm has been tested on several complex videosequences. The ’train’ and ’walk’ videos used in our experimentsare shot by a handheld camcorder, and hence include a wide varietyof camera operations. For example the train sequence contains lotsof jerky motion with camera pan, tilt and zoom operations. We havecompared the performance of the proposed tracker against the meanshift(MS) tracker [13], which is known for robust object trackingin cluttered environment. Fig. 4 shows the tracking result for theproposed and MS tracker for a toy train moving in a cluttered background.Though both the proposed and the mean-shift tracker tracksthe object, the accuracy of the MS tracker degrades very much duringlarge displacement of the object due to camera panning operation.The last 2 frames of Fig. 4 shows how the accuracy of MStracker reduces with camera zoom-out operation. The proposed algorithmtracks the object with a better precision during the aforementionedcamera operations. Fig. 5 shows the tracking results fora PETS sequence. The proposed algorithm tracks the object throughout the sequence with better accuracy compared to the MS algorithm.The MS tracker fails at the end part of the sequence. Theproposed tracker has been tested for various speeds by temporallydownsampling the video. For example the result shown in Fig. 5is for downsampling factor 4. These results clearly show that thefast learning neural networks are well suited for tracking objects inreal-world scenario. Further, the RBF tracker uses, on an average,1-2 iteration to localize the object and the computational effort requiredto calculate the posterior probabilities in the testing phase isnegligible. Hence the proposed tracker is suitable for real-time applications.6. CONCLUSIONSIn this paper we have proposed an object tracker using RBF-networks.The fast learning algorithm is used to develop the object model atthe initial frame using two RBF classiers. The object localizationis achieved using the target model and the posterior probability ofthe current object window. The performance of the proposed trackeris compared with the well-known MS tracker for various complexvideo sequences. The proposed tracker provides better tracking accuracycompared to MS tracker. Since the testing phase of RBFnetwork incurs very low computational burden, the proposed trackeris suitable for real-time applications.Fig. 5. Tracking result of proposed system (solid yellow) againstmean-shift (dashed blue) tracker for ‘pets’ sequence.7. REFERENCES[1] A. Blake, “Visual tracking: a very short research roadmap,”Electronics Letters, vol. 42, no. 5, pp. 254–256, 2006.[2] Yi Lu Murphey and Yun Luo, “Feature Extraction for a MultiplePattern Classication Neural Network System,” PatternRecognition Proceedings, vol. 2, pp. 220–223, 2002.[3] M. Voultsidou, S. Dodel, and J. M. Herrmann, “Neural NetworksApproach to Clustering of Activity in fMRI Data,” IEEETrans. on Medical Imaging, vol. 24, no. 8, pp. 987–996, Aug.2005.[4] E. Trentin and M. Gori, “Robust combination of neural networksand hidden markov models for speech recognition,”IEEE Trans. on Neural Networks, vol. 14, no. 6, pp. 1519–1531, Nov. 2003.[5] C. Lee and D. A. Landgrebe, “Decision boundary feature extractionfor neural networks,” IEEE Transactions on NeuralNetworks, vol. 8, no. 1, pp. 75–83, 1997.[6] M. T. Musavi, W. Ahmed, K. H. Chan, K. B. Faris, and D. M.Hummels, “On training of radial basis function classiers,”Neural Networks, vol. 5, pp. 595–603, july-august 1992.[7] J. Moody and C. J. Darken, “Fast learning in network oflocally-tuned processing units,” Neural Computation, vol.1,pp. 281–294, 1989.[8] G.-B. Huang, Q. Y. Zhu, and C. K. Siew, “Extreme learningmachine: Theory and applications,” Neurocomputing, 2006.[9] G.-B. Huang and C.-K. Siew, “Extreme learning machine withrandomly assigned rbf kernels,” International Journal of InformationTechnology, vol. 11, no. 1, pp. 16–24, 2005.[10] S. Avidan, “Support vector tracking,” in IEEE Conf. on ComputerVision and Pattern Recognition, 2001, vol. 1, pp. 184–191.[11] R. Rojas, “A short proof of the posterior probability propertyof classier neural networks,” Neural Computation, vol. 8, no.2, pp. 41–43, 1996.[12] R. T. Collins, “Mean-shift blob tracking through scale space,”in Proc. IEEE Conference on Computer Vision and PatternRecognition, Madison, Wisconsin, 2003, pp. 234–241.[13] D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based objecttracking,” IEEE Trans. Pattern Anal. Machine Intell., vol. 25,no. 5, pp. 564–577, 2003.I 940

robust object tracking with radial basis function networks

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?