An Intelligent Visual Tracking System Based on Adaptive Mean Shift Tracking Method

P.   Bhuvaneswari; S.   Veluchamy

doi:10.17577/IJERTV4IS040572

Volume 04, Issue 04 (April 2015)

An Intelligent Visual Tracking System Based on Adaptive Mean Shift Tracking Method

DOI : 10.17577/IJERTV4IS040572

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 48
Total Downloads : 252
Authors : P. Bhuvaneswari, S. Veluchamy
Paper ID : IJERTV4IS040572
Volume & Issue : Volume 04, Issue 04 (April 2015)
DOI : http://dx.doi.org/10.17577/IJERTV4IS040572
Published (First Online): 16-04-2015
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

An Intelligent Visual Tracking System Based on Adaptive Mean Shift Tracking Method

P. Bhuvaneswari

PG Scholar of Electronics and Communication Engineering, Anna University Regional Office,

Madurai 7, Tamil Nadu, India.

S.Veluchamy

Faculty of Electronics and Communication Engineering, Anna University Regional Office,

Madurai 7, Tamil Nadu, India.

Abstract Generally all sparsity based models yields tracking performance in impressive manner, however computational complexity and low estimation of scale and orientation changes of target is aggravated in motion blur and background clutter challenges. In this paper, proposed method is based on a framework which projects templates matrix in candidates space. By selecting and weighting sparse coefficients, DSS map with pooling method leads to choose best candidate for tracking and by scale and orientation adaptive mean shift tracking method we can estimate scale, orientation of target adaptively. The result shows that better accuracy in tracking and robust to above challenges.

Keywords Customized APG, DSS map and Adaptive Mean Shift Tracking.

INTRODUCTION

An object in image processing is an identifiable portion of an image that can be interpreted as a single unit. Object representation is the shape and appearance representation. Object recognition is the task of finding a given object in an image or video sequence. Visual Tracking is the process of locating and determining the dynamic configuration of one or many moving objects in each frame of one or several cameras. It is to associate targets in consecutive video frames. It can be especially difficult when the objects are moving fast relative to rate of frame.

For these situations video tracking systems usually employ a model which describes how the image of the target might change for different possible motions of the object. To perform video tracking, an algorithm must analyze consecutive video frames and outputs of the movement of targets between frames. There are so many algorithms, each having strengths and weaknesses. Two major components of a visual tracking system: target representation and localization, and also filtering and data association. Applications are Motion-based recognition: human identification, automatic object detection, etc. Although we have variety of sparsity methods for tracking, we have a situation that increases the complexity of problem when the tracked object changes orientation over time.

A. Related Research Work

Using patch-based appearance models [2], we can divide bounding box of a target into multiple patches and then selecting pertinent patches to construct appearance model. Due to this, we can obtain accurate tracking but it

has more computational complexity than [4], in which online tracking involved and [7], in which 2D principal component analysis involves and in matter of solving whole object occlusion problem, this is not very efficient. This computational complexity decreases in[10], but here it is difficult to yield a sequence of closed form of updates. In [9], Block Orthogonal Matching Pursuit (BOMP) algorithm, involves to facilitates tracking performance compared with prototype structured sparse representation model, but there is difficult to track fully occluded objects. By using Monte Carlo Tracking Technique [8], we can handle complete occlusion, drawback here is tough to locate position of target when large pose variations and drastic illumination changes occurs. In paper [11], Collaborative model (SDC and SGM) used here to handle occlusion, it alleviate drift problem also and in paper [12], sparse approximation problem in particle filter framework is used to get excellent performance with previously proposed trackers. The same particle filter framework with reverse sparse representation used in paper [1], results tracking object more superior in terms of cost-performance ratio. but here scale and orientation changes cannot be adaptively estimated. In paper [3], for non rigid objects, visual tracking is obtained by spatial masking with an isotropic kernel. But disadvantage is tracking specific task alone and in paper [6], Mean Shift Method used to get object shape well tracked and not completely solved for complex sequences. Hence we can go for SOAMST (Scale and Orientation Adaptive Mean Shift Tracking) algorithm in paper [5], we can obtain scale and orientation changes of target but low performance in tracking when motion blur and heavy occlusion occurs, hence through research, we can use this concept of AMST algorithm in analysis of DSS map visual tracking, we can get accurate tracking results even in motion blur and background clutter.

This paper is organized as follows. The proposed system includes formation of DSS (Discriminative Sparse Similarity) map and establishing AMST algorithm in section
Section III gives description of datasets used to test proposed method and performance analysis described in section IV. Section V illustrates experimental results and discussions. Finally, conclusion given is section VI.
1. PROPOSED SYSTEM
  
  where Zi
  
  indicates the state of the i -th sample. The
  
  t
  
  Through the above analysis, tracking of object will be complex when noise in images, rapid object motion, scene illumination changes occurs. Hence proposed method is used to handle motion blur and background clutter robustly in an object tracking with scale and orientation changes. The
  
  object is tracked based on reversed multitask sparse tracking framework which gives matrix templates into candidate
  
  posterior probability can be obtained from the Bayesian framework recursively as,
  
  p(Zt|At) p(At |Zt )p(Zt |Zt1)p(Zt1|At1)bxt1 (2)
  
  where p(Zt |Zt1) is the dynamic model and p(At |Zt ) denotes the observation likelihood. The state variable Zt is composed of six independent parameters {1, 2, 3, 4,
  
  , }, in which { , , , } are deformation
  
  1 2 1 2 3 4
  
  space. By selecting and weighting sparse coefficients DSS
  
  parameters , { , } contain 2D transformation
  
  1 2
  
  map, pooling method projects best candidate and by SOAMST algorithm, we can adaptively estimate scale and orientation of target leads to achieve accuracy in tracking an object. The block diagram of the proposed system is shown below. In first stage, input video is selected for tracking and recognition process and then frame conversion is done.
  
  SET AFFINE PARAMETERS FOR DSS MAP
  
  After that formulate multitask reverse sparse representation and then include laplacian term to keep coefficients similarity level in accordance with candidate similarity. From this we can get laplacian multitask reverse sparse representation and DSS map is constructed based on that similarity relationship that contains large template set combined by multiple positive and hundreds of negative templates. Then additive pooling approach is done. Finally, Weight image obtained from target model and target candidate model to estimate target scale and orientation. Using this we can compute moment features and then estimate width, height, orientation of object based on zeroth order moment, second order center moment and Bhattacharyya coefficient between target and target candidate model. By establishing this AMST algorithm in DSS map method, it provides good estimation accuracy of location of target with scale and orientation changes as output.
  
  INPUT (VIDEO)
  
  FRAME CONVERSION
  
  OUTPUT
  
  PROGRESS OF TRACKING
  
  ESTABLISHING AMST ALGORITHM
  
  Figure1. Block diagram for proposed method
  1. It is a framework for estimating posterior distribution of the state variables that characterize dynamic system, to provide robust tracking algorithm. Let observation set of target At = [A1, A2,, At], and let Zt be the state variable of an object at time t. Then we use affine transformation to motion of object between two consecutive frames. Then optimal state Zt can be computed by maximum posterior estimation as,
    
    ^
    
    t
    
    Zt = arg max p (Zi |At ) (1)
    
    information. Finally, dynamic model can be modeled by the Gaussian distribution,
    
    as,
    
    p(Zt |Zt-1) = N(Zt ; Zt-1, ) (3)
    
    where represents diagonal covariance matrix whose elements are variances of parameters. By this method, we get the candidates set X = [X1, X2, …, Xm] RbXc, in which b is the feature dimension and c is the number of candidates. The observation model p(At |Zt ) essentially reflects the likelihood of observing At at state Zt . In this paper, p(At |Zt ) is proportional to discriminative score get by exploiting the additive pooling scheme on the DSS map.
  2. Discriminative Reverse Sparse Representation
    
    In conventional sparse representation, an observed image patch associated with state is reconstructed by numerous target templates, we can construct candidate set X to represent each target template as in equation (4) below.
    
    arg min ||t Xc||22 + ||m||1, m
    2
    
    k+1 = arg min /2||C k+1 + F( k+1)/|| 2+ G(C)
    
    .
    
    c
    
    (11)
    
    where is the Lipschitz constant and finally, equation (11) is equivalent to,
    
    k+1 = max(0, gk+1 ) (12)
    
    …
    
    p positive templates , n negative templates
    
    (b)
    
    .
    
    m sampled candidates
    
    Figure2. The basic idea of the multi-task reverses sparse representation scheme illustration. (a) The positive and the negative template sets, (b) The sampled candidates.
  3. Laplacian Multi-task Reverse Sparse Representation
    
    The multi-task reverse sparse representation problem start from the equation as,
    
    arg min ||L XC||22+ ||ci ||1
    
    i
    
    s.t. ci 0, i = 1, 2, . . . , (p + n). (6)
    
    To preserve the similarity of codes for candidate features, we introduce a Laplacian regularization term. It is begin with formula as,
    
    arg min ||L XC||22+ ||ci ||1 + /2||ci cj ||2Bij
    
    c I ij
    
    s.t. ci 0, i = 1, 2, . . . , (p + n), (7)
    
    where is parameter to adjust new regularization term and B is a binary matrix. So, the Laplacian multi-task reverse optimization problem is reformulated as:
    
    arg min ||L XC||22+ ||ci ||1 + tr (CZCT)
  4. Refined DSS map
    
    To avoid potential instability and achieve better robustness, we refine the DSS map with adaptive weights. The weight Bij for an element Xij in the similarity map is constructed based on the difference between the j -th candidate yj and the i th template
    
    2
    
    ti : Bij exp(||ti yj || 2 ).
    
    (13)
    
    A candidate with small difference from foreground template indicates they share higher similarity with each other. For this, we separate the weight map into two submaps:
    
    Bpos = [b1T , . . . , bpT]T,
    
    Bneg = [bp+1 T ,. . . ,bp+n ] T, (14)
    
    where b1 = [Wi1, . . .Wim ] for i = 1, 2, . . . , (p + n)
    
    Finally, get two weighted DSS maps through:
    
    Xpos = Bpos Xpos,
    
    Xneg = Bneg Xneg, (15)
    
    where is the Hadamard product.
  5. Additive Pooling
    
    It contains 2 steps follows:
    
    Step 1: positive and negative templates (Equation (1)and(2) respectively) brought from similarity map X (Eqn(3)) as,
    
    xipos = [ X1i, . . . . . , Xpi] (16)
    
    xineg = [ X(p+1)i, . . . .. . , X(n+p)i ] (17)
    
    xi = [ X1i, . . . . , Xpi , X(p+1)i, . . . . , X(n+p)i] (18) then add largest l coefficients in xipos and xineg to
    
    get,
    
    yipos and yineg as,
    
    yipos = L(xipos , 1)+, Â· Â· Â·+ L(xipos , l) (19) yineg = L(xineg , 1)+, Â· Â· Â·+ L(xineg , l) (20)
    
    Step 2: discriminative score and score set for all is given by,
    
    yi = yipos yineg
    
    Y = {yi }, i=1,…,m (21)
  6. Adaptive Mean Shift Concept
  In the mean shift iteration, the estimated target moves from y to new position y1, which is given as,
  
  nh
  
  xiwig(||y-xi/h||2)
  
  y1 = i=1
  
  nh
  
  xiwig(||y-xi/h||2) (22) i=1
  
  By using Equation(22), the mean shift tracking algorithm finds object in the new frame that most similar to the object in reference frame. Note that the key parameters in the mean shift tracking algorithm are the weights iw. Estimate the target area by using equation as,
  
  n
  
  M00 = w(xi) (23)
  
  i=1
  
  The Bhattacharyya coefficient is used to adjust M00 in estimating target area, denoted by T. We propose the following equation to estimate it:
  
  T= b()M00 (24)
  
  where b() is monotonically increasing function with respect to the Bhattacharyya coefficient ( 0 1). From the moment features, the Covariance matrix in equation can be written as,
2. DESCRIPTION OF DATASETS
  
  The algorithm is tested using three following datasets.
  
  Dataset 1: In palm sequence video, tracking is done clearly with scale and orientation changes. As target undergoes abrupt motion, it is difficult to locate its position. However when palm is moving rapidly in frames 25 and 92, the estimated target scale and orientation are accurate and robust to motion blur challenge.
  
  Dataset 2: In car sequence video, there is a complex background in frames 38, 50. By introducing both template sets to model foreground andbackground information, we get enough discriminative information and store them in map. And by AMST algorithm we obtained accurate tracking results with scale and orientation changes and robust to this background clutter.
  
  Dataset 3: In walking man sequence, both in-plane and out-of-plane rotations occurs. However this tracker clearly tracks and gives scale and orientation changes successfully.
3. PERFORMANCE ANALYSIS
  
  The table shown below lists average number of iterations by three different datasets on three video sequences.
  
  Table I. Performance on average number of iterations.
  
  Cov = 20 11
  
  102 (25)
  
  Suppose that the target is represented by an ellipse, for which the lengths of the semi-major axis and semi-minor axis are denoted by a and b, respectively.
  
  Instead of using 1 and 2 directly as the width a and height b, it has been shown that the ratio of 1 to 2 can well approximate the ratio of a to b, i.e., 1/ 2 a b . Thus we can set a = k1 and b = k2 , where k is a scale factor. Since we have estimated the target area A, there is ab = (k1)( k2) = A . Then it can be easily derived that,
  
  K=A/( 1 2) (26) a=1A/( 2) , b= 2A/( 1) (27) Now the covariance matrix becomes,
  
  Cov = 11 12 X a2 0 X 11 12 T
  
  21 22 0 b2 21 22
  
  (28)
  
  From this we can estimate width, height and orientation of target.
  
  Datasets
  
  Adaptive scale
  
  EM-Shift
  
  Our method
  
  Palm sequence
  
  14.62
  
  6.52
  
  3.30
  
  Car sequence
  
  11.25
  
  6.27
  
  2.34
  
  Walking man sequence
  
  13.43
  
  6.35
  
  2.59
  
  The adaptive scale is the highest because it runs algorithm three times and our method estimates only one time for each frame. Hence it is faster than others.
  
  Another table below gives TAR (True Area Ratios) values by three trackers on three datasets. TAR value is given by the ratio of overlapped area between tracking result and ground truth to area of ground truth.
  
  Table II. TAR values by competing tracking methods.
  
  Datasets
  
  OWN
  
  EM-SHIFT
  
  OUR METHOD
  
  Palm sequence
  
  60%
  
  85.82%
  
  96.51%
  
  Car sequence
  
  85.63%
  
  90.30%
  
  91.84%
  
  Walking man sequence
  
  72.5%
  
  64.23%
  
  89.53%
  
  Our proposed method achieves better results than others in above table II comparison.
4. RESULTS AND DISCUSSIONS
  
  The proposed algorithm is implemented by high level language MATLAB and the hardware requirements such as, Pentium IV 2.4 GHZ processor, 512 MB RAM, 2.0 GHZ processor speed and windows XP software.
  
  Tracking results for three datasets are provided below.
  
  Frame number 58 Frame number 100
  
  Figure 3. Tracking results for walking man sequence.
  
  In figure 1, palm sequence getting tracked and here motionTs are rapid. Even though in motion blur we are
  
  getting better and clear tracking with scale and orientation
  
  changes.
  
  In figure 2, car sequence has been tracked effectively even when background clutter occurs.
  
  In figure 3, walking man sequence tracked effectively even in occlusion.
  
  Frame number 25 Frame number 92 Figure 1. Tracking results for palm sequence.
  
  Frame number 38 Frame number 50
  
  Figure 2. Tracking results for car sequence.
5. CONCLUSION

In this paper, an intelligent visual tracking based on adaptive mean shift algorithm proposed. The DSS map represented as matrix templates to get tracking and adaptive mean shift concept used here to estimate scale and orientation changes of target. This proposed method achieves better accuracy in tracking with an exact scale and orientation changes even in challenges like motion blur, background clutter, etc.,

REFERENCES

Bohan Zhuang, Huchuan Lu, Ziyang Xiao, and Dong Wang, Visual Tracking via Discriminative Sparse Similarity Map, IEEE Transactions on Image Processing, Vol.23, no.4, 1872-1881, 2014.
Dae-Youn Lee, Jae-Young Sim, Chang Su Kim, Visual Tracking Using Pertinent Patch Selection and Masking, Computer Vision Foundation, 2014.
Dorin Comaniciu, Visvanathan Ramesh, Peter Meer, Kernal-Based Object Tracking, TEEE Transactions, Pattern Analysis and Machine Intelligence, Vol.25, no.5, 564-577, 2003.
Feng Chen, Qing Wang, Song Wang, Weidong Zhang, Wenli Xu, Object Tracking Via Apperance modeling and Sparse Representation, Image and Vision Computing, 29, 787-796, 2011.
Jifeng Ning, lei Zhang, David Zhang and Chengke Wu, Scale and Orientation Adaptive Mean Shift Tracking , National Sceince Foundation Council of China under Grants and the Chinese University Sceintific Fund under Grant No.QN2009091, 1-23, 2009.
Katharina Quast and Andre Kaup, Scale and Shape Adaptive mean Shift Object Tracking, Paper on 17th European Signal Processing Conference, 1513-1517, 2009.
Ming Li, Fang Lan Ma, Fuzhong Nian, Robust Visual Tracking Via Apperance Modeling and Sparse Representation, Journal of Computers, Vol.9, No.7, 2014.
P.Perez, C.Hue, J.Vermaak, M.Gangnet, Color-based probabilistic tracking, European Conference paper on Computer Vision (ECCV), 661-675, 2002.
Tianxiang Bai, Youfu Li, Robust Visual Tracking Using Flexible Structured Sparse Representation, IEEE Transactions on Industrial Informatics, Vol.10, No.1, 2014.
Tianzhu Zhang, Bernard Ghanem, Si Liu, Narendra Ahuja, Robust Visual Tracking via Structured Multi-Task Sparse Learning, Published on Springer Science and Business Media New York, Computer Vision,101:367383, 2013.
Wei Zhong, Huchuan Lu, Ming-Hsuan Yang, Robust Object Tracking via Sparsity-based Collaborative Model, IEEE Transactions, 1838-1845, 2012.
Xue Mei, Haibin Ling, Robust Visual Tracking using L1 Minimization, 12th International Conference on Computer Vision (ICCV), 1436-1443, 2009.
M. Yang, Y. Wu, and G. Hua, Context-aware visual tracking, IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 7, pp. 1195 1209, Jul. 2009.
G. Hager, M. Dewan, and C. Stewart, Multiple kernel tracking withSSD, in Proc. Conf. Comput. Vis. Pattern Recognit., pp. 790 797, 2004.
B. Babenko, M.-H. Yang, and S. Belongie, Visual tracking with online multiple instance learning, in Proc. Conf. Comput. Vis. Pattern Recognit., pp. 983990, 2009.

Volume 04, Issue 04 (April 2015)

An Intelligent Visual Tracking System Based on Adaptive Mean Shift Tracking Method

An Intelligent Visual Tracking System Based on Adaptive Mean Shift Tracking Method

p+n 1

p+1 2

p+n 2

i

i

i

Leave a Reply

Datasets	Adaptive scale	EM-Shift	Our method
Palm sequence	14.62	6.52	3.30
Car sequence	11.25	6.27	2.34
Walking man sequence	13.43	6.35	2.59

Datasets	OWN	EM-SHIFT	OUR METHOD
Palm sequence	60%	85.82%	96.51%
Car sequence	85.63%	90.30%	91.84%
Walking man sequence	72.5%	64.23%	89.53%