 Open Access
 Total Downloads : 888
 Authors : A.Ramya, Dr. P.Raviraj
 Paper ID : IJERTV2IS101129
 Volume & Issue : Volume 02, Issue 10 (October 2013)
 Published (First Online): 28102013
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
A Survey and Comparative Analysis of Moving Object Detection and Tracking
A.Ramya1 Dr. P.Raviraj2
PG Scholar, Kalaignar Karunanidhi Institute of Technology, Coimbatore, TN, India1 Professor, Kalaignar Karunanidhi Institute of Technology, Coimbatore, TN, India2
Abstract
Moving object detection in a video sequence is a critical task in vision application and it is an active research topic; it can be handled by indoor and outdoor environments. Identifying moving objects is an important task for automated video analysis. This paper presents the survey on object detection and tracking in automated video analysis in vision applications. Background subtraction is the foremost approach for detecting moving objects. Background may be static or dynamic and estimation of background detection is essential to detect an object in a video sequence. Tracking an object in a video sequence means identifying the location of the object continuously when either the object or camera is moving. Object tracking is required in vision applications that require the location and shape of object in every frame. This paper highlights survey of various object detection and background subtraction techniques.
Keywords: Moving object detection, Low rank component, sparse component, Background subtraction, Motion segmentation

Introduction
In automated video analysis [2], there are three key steps for automated video analysis; object detection, object tracking and behaviour recognition. As a first steps, object detection locate and segment objects in a video. Then the objects can be tracked from frame to frame and the behaviour of the tracked object is analysed. In this both object detection and tracking plays an important role in many practical applications. Use of cameras and other sensors to monitor activities with the goal of understanding events happening at a site automatically. The automatic detection of events in videos would enable efficient chronicle and automatic annotation. The detection and tracking of moving objects draw great attention from the researchers in the area of computer vision. Object detection is usually performed by object detectors or background subtraction Detection of moving objects provides a classification of the pixels in the video sequence into either foreground or background. This classification of pixels to detect moving objects uses an approach
background subtraction. In background subtraction, each pixel in the video frame gets deviate from the background taken as moving objects for applications such as surveillance so, [8] there are many challenges in developing a good background subtraction algorithm. First, background subtraction must be robust against illumination changes. Second, detection of nonstationary background objects and shadows cast by moving objects should be avoided. A good background model should have a sense of reacting quickly to changes in background and adapting itself to accommodate changes occurring in the background such as moving of a stationary object from one place to another. For a realtime system, good foreground detection rate and the processing time for background subtraction is essential. Object detection is divided into different stages as shown in figure1. For applications such as surveillance, Background subtraction is a hierarchy of techniques for segmenting out objects. Motion segmentation is a part of background subtraction. In motion segmentation, object is detected by classifying pixels according to motion patterns which usually called as motionbased object detection.
Figure 1. General block diagram of video surveillance system.
In motion segmentation [2] object which moves continuously moves present in the scene and due to motion of camera, background also moves.
Optical flow field defines translation of each pixel in a region which is a dense field of displacement vectors. Partition of Optical flow field is a most common approach for motion segmentation. Optical flow field [2] in a scene should be smooth in each layer of motion and at layer boundaries, only sharp motion changes should occur. Optical flow field and segmentation can work in presence of large camera motion however in general not true in practice. Foreground may be complex with non rigid shapes and background also is complex when considering under varying textures and illumination changes. Object detection and background subtraction techniques are further discussed in the following sections.

Background Subtraction Techniques

Eigen Background Subtraction
Eigen background subtraction proposed by Oliver, et al[3] uses an Eigen space to model the background for moving object segmentation. In this method, background model is learned from unconstraint video sequences is its ability, even when they contain moving foreground objects segmentation. PCA is used to reduce the dimensionality of the space. After PCA is performed, reduced space should represent only the unmoved parts of the image, if also moving objects presents in the space.
The main steps of the algorithm are as follows,[1]

A sample of n images with each image contains p pixels for all the images mean
is computed.

Covariance matrix is computed and when a new image I enters, it is projected on to Eigen space, I= (I ) and I is
projected back as I= I + .
density estimation. A major drawback of using nonparametric kernel density estimator is the computational cost.

Sparse Signal Recovery
Volkan et.al [6] proposed lattice matching pursuit for stable recovery to represent sparse signals using markov random field (MRF). In sparse signal representation nonzero coefficients are clustered together. In LAMP, likelihood of signal support is evaluated iteratively and optimised under an Ising model. Data residual is calculated in matching pursuit as a first step. Sparse signal is estimated using graphical model such as support and signal model. In graphical model, sparse support decreases the ambiguity and size of search space for unknown part determines the speed of the algorithm. LaMP achieves best performance at M=2.5 K when compared with other methods such as FPC and COSAMP.
Jinzhou et.al [5] proposed a new greedy sparse recovery algorithm in which data residue in iterative process pruned according to sparsity and group clustering rather than only sparsity. Proposed algorithm requires 5 main steps in each iteration,

Pruning the residue estimation

Merging the support sets.

Signal estimation by least square

Pruning the signal estimation and

Updating the signal/residue estimation and support set.

Data pruning is estimated in step1 and step 4 using DGS rather than ksparse approximation. Sparsity ranges can be set to run the AdaDGS recovery algorithm till the condition is true. As a result optimised background subtracted image and background image is got through using AdaDGS based background subtraction.


Difference I and I is
computed.
Julien et.al [7] proposed proximal operator using network flow algorithm for solving structured

Foreground points detected at location


where II > T.
An Eigen vector formed in subspace represents the static unmoved parts of the scene.
2.2. Kernel Density Estimation
Elgammal et. el [4] proposed kernel density function for building statistical representation of background and foreground. A background model andbackground subtraction process based on non parametric kernel density estimation uses pixel intensities a basic feature for modelling the background. In nonparametric kernel density approach, density function is estimated directly from data without any assumptions about underlying distribution. The choice of suitable kernel bandwidth is an important issue in kernel
sparse problems including sum of norms over the group of variables. This proximal operator which is associated with structured norm can be computed efficiently by solving quadratic mincost flow problem.
2.4. Motion Segmentation
For a robust foreground segmentation spagnolo.et.al [8] proposed an algorithm which combines background image and temporal image analysis. Updating of background image is done to all pixels including pixels covered by foreground objects. In this approach, to have effective moving points, use radiometric similarity between subsequent regions of consecutive frames and also between background image which is referenced and temporal image to segment objects of foreground.
Update of each pixel in the background image is according to the variations of pixels in image with same intensity value. For each pixel photometric gain is calculated and mean photometric gain calculated for pixels with same intensity value. This algorithm can withstand for light conditions for continuous variations, reduce noise in image, detect object accurately when foreground object moves after a long period of time. Drawback of this approach is it mistakenly detect object that are motionless in image and it cannot eliminate shadows.

Object Detection Techniques

Principal Component Pursuit
Wright.et.al [9] proposed an algorithm principal component pursuit (PCP) in which recovery of both lowrank component and sparse component of a data matrix. This PCP is a convex program solved for the recovery of lowrank L0 and sparse S0 by simply minimising the weighted combination of nuclear and l1 norm. Let denote the nuclear norm of the matrix M (i.e) sum of the singular values of M and let M1= denotes l1norm of M as seen long vector in Rn1*n2. PCP estimate solving,
Minimise L* + S1 Subject to L+S+M
can exactly recover lowrank L0 and sparse S0. Principal components of a data matrix can be recovered even though the entries are randomly corrupted. To get a robust principal component analysis (RPCA),
M=L0 + S0 +N0 N0 is a dense noise.
L0 can be approximated as lowrank and small errors obtained can then be added to all data entries (i.e) to get robust PCA combine sparse gross errors and dense small noise.
Figure 2. a) Original b) Lowrank L
(a) Frames of original video M.
(b)(c) Lowrank L and sparse component S obtained by PCP[9]

Mixture of Gaussian (MoG)
Stauffer[12] proposed Gaussian mixture model in which each pixel in the background is modelled as a mixture of Gaussian. Each and every pixel value is compared with the existing set of models to find the match. Parameters that are considered in this method are and T .parameters for the matched model gets updated based on the learning rate. If there finds no match, the least model got is discarded and it is replaced by new Gaussian with initialization by current pixel value. (i.e.) pixel values that dont fit in the background are considered to be background. MoG method is robust when dealing with different cameras and scene and also with slow changes in lighting. Due to automatic pixelwise threshold, method can recover quickly when background reappears

Detecting Contiguous Outliers in the Lowrank Representation (DECOLOR)
Xiaowei et.al [2] proposed an algorithm for moving object detection. In DECOLOR both object detection and estimation of background is performed simultaneously without training sequence. Assume background images in videos are linearly connected, so a matrix can be formed consisting of vectorized video frames can be estimated by lowrank. In the lowrank matrix, moving objects can be detected as outliers. This outlier detection makes easier estimation of foreground from background. D={I1,…..,In} IRm*n
is a matrix which represents n frames .B IRm*n is
a matrix which denotes underlying background images. S {0,1}m*n is a matrix which denotes foreground support
S={
S={
To estimate foreground and background, background model, foreground model and signal model combined to minimize the energy function can be formulated as,
frames
c) Sparse S
Figure . Shows that Background modelling from video.
+  
s.t rank(B) K.
This formulates that background images should form a lowrank matrix. Fig. 3(a) shows the averaged Fmeasure as a function of d with foreground not move for dframes,here rank(B0)=3.the accuracy of DECOLOR decrease as long as d >0 when K=7 which is a default
parameter. this is due to case DECOLOR overfits the static foreground.on decreasing K to 3 DECOLOR performs stable until d>6 which demonstrates that DECOLOR can tolerate a temporary stopping of foreground. Result of PCP is also shown for comparision in figure 3b. DECOLOR performs high accuracy when compared to PCP.If the object is large or moving slowly its interior region remains unchanged which
mincut/maxflow algorithm. Object segmentation with use of graphcut gives a modest computational cost. A binary labelling objective function is introduced for each object which combines low level pixelwise features, highlevel observations obtained through an independent detection module. P denotes the set of N pixels from an input image sequence. t is associated as a feature vector
Zs,t=( ) where is a 3dimensional
makes foreground to fit into the lowrank model so,
DECOLOR fails.
vector in RGB color space and is a 2
Figure 3. (a) Fmeasure as a function of d where d is the number of frames within with foreground stops moving.
Figure 3. (b) Fraction of accurate foreground detection as a function of and W[2]
In example of smoke detection,smoke is detected as foreground which makes background behind the smoke cannot be recovered which is occuled.


Proposed Work

Graph Cut
The objective of graphcut is to segment objects in a video. The mathematical description of a graph is as follows, [10]
G= (V, E)
is a graph with V denote vertices and E denote edges. A directed graph G consists of set of vertices and set of ordered pairs of edges. An st graph is a weighted directed graph with 2 nodes source s and sinks t nodes. An st cut in a graph G is set of edges Ecut such that from source to sink there is no path. The cost of a cut is the sum of the edges weights in E cut. The value of the flow is defined as,
f=
Represent the amount of flow passing from source to sink. The maxflow/mincut problem is to maximize the value of flow and minimize the cost.
Aureli.et.al proposed [11] a method to track and segment multiple objects in a video using
dimensional vector of optical flow values to each pixel. Assume at time t kt objects are tracked and ith
object at time t is denoted as . An energy function is introduced which is minimized using graphcut.
The above graph describes about the energy minimization at time t1 in left figure, optical flow vectors for the object is shown in blue, white nodes represented as objects and black nodes as background. The right figure describes the graph at time t, red circled pixel nodes correspond to mask of the two observations and dashed box indicates predicted mask.
Figure 4. a) Reference frame b) Current frame
c) Result of background subtraction (pixels in black are labelled as foreground) and derived object detections (indicated with red bounding boxes) [11].
The algorithm is robust to partial and complete occlusion, illumination changes, missing observations. The use of secondarylevel multi level energy function in the method allows individual tracking and segmentation of objects and the above observations of the figure are obtained by
simple background subtraction based on a single reference frame.


Conclusion
Availability, efficiency of usage and application automation of videos, along with the increasing popularity of video on internet and versatility of video application heavily rely on object detection and tracking in videos. Object detection gives accuracy at different scenarios based on the application. This paper discussed
some techniques to detect moving object in video frames and their advantage and limitations. Graph cut algorithm is robust to occlusion and complex background. DECOLOR gives high accuracy when background is static but performs poorly when camera is large and foreground moves slowly (or) occluded. This shows that considering graphcut algorithm can perform better when background is complex and occluded.
Comparative Analysis of Object Detection Techniques
Techniques 
Advantages 
Disadvantages 
Eigen Background Subtraction 


Kernel density Estimation 


Mixture of Gaussian (MoG) 


Background and temporal analysis 


Principal Component Pursuit(PCP) 


Detecting Contiguous Outliers in the LowRank Representation (DECOLOR ) 


Graph Cut 

– 
References
[1]. M.Piccardi, background subtraction techniques: a review, proc. IEEE intl conf. systems, man, and cybernetics, 2004. [2]. Xiaowei Zhou, student member, IEEE, Can Yang, and Weichuan Yu, member, IEEE moving object detection by detecting contiguous outliers in the lowrank representation, IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 3, march 2013 [3]. Kinjal a Joshi,Darshak g. Thakore a survey on moving object detection and tracking in video surveillance system, INTERNATIONAL JOURNAL OF SOFT COMPUTING AND ENGINEERING (IJSCE) issn: 22312307, volume2,issue3, july 2012
[4]. Ahmed Elgammal, Ramani Duraiswami, member, IEEE, David Harwood, and Larry s. Davis, fellow, IEEEbackground and foreground modeling using nonparametric kernel density estimation for visual surveillance.[5]. Junzhou Huang, Xiaolei Huang, Dimitris Metaxas learning with dynamic group sparsity
[6]. Volkan Cevher, Marco f. duarte, Chinmay hegde, richard g. Baraniuk, sparse signal recovery using markov random fields. [7]. Julien Mairal, Rodolphe jenatton, guillaume obozinski, francis bach, network flow algorithms for structured sparsity [8]. SenChing S.Cheung and Chandrika Kamath Robust techniques for background subtraction in urban traffic video [9]. P. Spagnolo, t.d orazio *, m. leo, a. Distante moving object segmentation by background and temporal analysis ELSEVIER image and vision computing 24 (2006) 411423. [10]. E.candes, X. Li, Y. Ma, and J. Wright,robust principal component analysis?J. ACM, vol. 58, article 11, 2011. [11]. Zhayida Simayijiang, Stefanie Grimm,Segmentation with Graph Cuts. [12]. Chris Stauffer, W.E.L Grimson, Adaptive background mixture models for realtime tracking, Proc.IEEE Conf. Computer Vision and Pattern Recognition 1999.