A Survey : on Multiple Object Detection and Tracking

Download Full-Text PDF Cite this Publication

Text Only Version

A Survey : on Multiple Object Detection and Tracking

Manisha R. Nangre

Electronics and telecommunication Department Sinhgad institute of technology and science, Pune University

Pune-41, India

S. S. Havanoor

Electronics and telecommunication Department Sinhgad institute of technology and science, Pune University

Pune-41, India

AbstractObject tracking is an important task in the field of computer vision. It is a challenging problem. There are many difficulties arises in tracking the objects due to abrupt object motion, changing appearance patterns of both foreground and background scene, non-rigid object structures, object-to-object and object-to- scene occlusions, and camera motion. This paper selectively gives the reviews to research papers for object detection and tracking methods.

KeywordsObject Representation, Object Tracking, Object Detection, Computer Vision.


    Object tracking is an important task within the field of computer vision. The availability of high definition quality and inexpensive video cameras, and the increasing need for automatic video analysis has generated a great deal of interest in object tracking. There are three key steps in video analysis: detection of moving objects, tracking of objects from frame, and recognize their behavior. Therefore, the use of object tracking is important in the tasks of: motion-based recognition, automatic object detection, etc;

    • Automated surveillance.

    • Video indexing, that is, automatic annotation and retrieval of the videos in multimedia databases;

    • Gesture recognition, eye gaze tracking for data input to computers, etc.;

    • Traffic monitoring, vehicle navigation

      In its simplest form, tracking can be defined as the problem of estimating the trajectory of an object in the image plane as it moves around a scene.

      Tracking objects can be complex due to:

      • loss of information,

      • noise in images,

      • complex object motion,

      • non rigid or articulated nature of objects,

      • partial and full object occlusions,

      • complex object shapes,

      • scene illumination changes, and

      • Real-time processing requirements.

    Numerous approaches for object tracking have been proposed. These primarily differ from each other by the way they approach the following questions: Which object representation is suitable for tracking? Which image features should be used? How should the motion, appearance, and shape of the object be modeled?

    The answers to these questions depend on the context/environment in which the tracking is performed. A large number of tracking methods have been proposed which attempt to answer these questions for a variety of scenarios. The goal of this survey is to group tracking methods into broad categories and provides comprehensive descriptions of representative methods in each category. This survey will give readers, the ability to select the most suitable tracking algorithm for their application needs.


    In a tracking scenario, an object can be defined as anything that is of interest. For example, boats on the sea, fish inside an aquarium, vehicles on a road, planes in the air, people walking on a road are a set of objects that may be important to track in a specific domain. Objects can be represented by their shapes and appearances. In this section, we will first describe the object shape representations commonly used for tracking.

    1. points: The object is represented by a point, that is, the centroid (Figure 1(a)) [2] or by a set of points (Figure 1(b)) [3]. In general, the point representation is suitable for tracking objects that occupy small regions in an image.

    2. Primitive geometric shapes: Object shape is represented by a rectangle, ellipse (Figure 1(c), (d) [4], etc.

    3. Object silhouette and contour: Contour re presentation defines the boundary of an Object (Figure 1(g), (h). The region inside the contour is called the silhouette of the object (see Figure 1(i)). [1].

    4. Articulated shape models: Articulated objects are composed of body parts that are held together with joints. For example, the human body is an articulated object with torso, legs, hands, head, and feet connected by joints, shown in Figure 1(e).

    5. Skeletal models: Object skeleton can be extracted by applying medial axis transform to the object silhouette .This model is commonly used as a shape representation for recognizing objects [5].(see Figure1(f)).

    6. Probability densities of object appearance: The probability density estimates of the object appearance can either

      be parametric, such as Gaussian [8] and a mixture of Gaussians[7], or Nonparametric, such as Parzen windows [6] and histograms.

    7. Templates. Templates are formed using simple geometric shapes or silhouettes [Fieguth and Terzopoulos 1997].An advantage of a template is that it carries both spatial and appearance information.


    Fig. 1. Object representations. (a) Centroid, (b) multiple Points, (c) rectangular patch, (d) elliptical patch, (e) part-based multiple patches, (f) object skeleton, (g) complete object contour, (h) control points on object contour, (i) object silhouette.[9]


  1. Feature Selection for tracking

    Selecting the right features plays a critical role in tracking. In general, the most desirable property of a visual feature is its uniqueness so that the objects can be easily distinguished in the feature space. Feature selection is closely related to the object representation. The details of common visual features are as follows.

    1. Color. It is influenced primarily by two physical factors, 1) the spectral power distribution of the illuminant

      4) Pixel in a region. It is computed using the brightness constraint, which assumes brightness constancy of

      and 2) the surface reflectance properties of the object. In image processing, the RGB (red, green, blue) color space is usually used to represent color.

    2. Edges. Object boundaries usually generate strong changes in image intensities. Edge detection is used to identify these changes. An important property of edges is that they are less sensitive to illumination changes compared to color features.

    3. Optical Flow. Optical flow is a dense field of displacement vectors which defines the translation of each

      corresponding pixels in consecutive frames [Horn and S. chunk 1981]. Optical flow is commonly used as a feature in motion-based segmentation and tracking applications.

      1. Texture. Texture is a measure of the intensity variation of a surface which quantifies properties such as smoothness and regularity. Compared to color, texture requires a processing step to generate the descriptors.

        Fig.2 object tracking methods.

      2. Point Tracking: Objects detected in consecutive frames are represented by points, and the association of the points is based on the previous object state which can include object position and motion. This approach requires an external mechanism to detect the objects in every frame.

      3. Kernel Tracking: Kernel refers to the object shape and appearance. For example, the kernel can be a rectangular template or an elliptical shape with an associated histogram. Objects are tracked by computing the motion of the kernel in consecutive frames. This motion is usually in the form of a paramtric transformation such as translation, rotation, and affine.

      4. Silhouette Tracking: Tracking is performed by estimating the object region in each frame. Silhouette tracking methods use the information encoded inside the object region. This information can be in the form of appearance density and shape models which are usually in the form of edge maps. Given the object models, silhouettes are tracked by either shape matching or contour evolution. Both of these methods can essentially be considered as object segmentation applied in the temporal domain using the priors generated from the previous frames. The kernel in consecutive frames. This motion is usually in the form of a parametric transformation such as translation, rotation, and affine.


In this article, we present a survey of object tracking methods. We divide the tracking methods into three categories based on the use of object representations, namely, methods establishing point correspondence, methods using primitive geometric models, and methods using contour evolution. Note that all these classes require object detection at some point. For instance, the point trackers require detection in every frame, whereas geometric region or contours-based trackers require detection only when the object first appears in the scene. Recognizing the importance of object detection for tracking systems, we include a short discussion on popular object detection methods.


  1. YILMAZ, A., LI, X., AND SHAH, M. 2004. Contour based object tracking with occlusion handling in video acquired using mobile Cameras. IEEE Trans. Patt. Analy. Mach. Intell. 26, 11, 15311536.

  2. VEENMAN, C., REINDERS, M., AND BACKER, E. 2001. Resolving motion correspondence for densely moving points. IEEE Trans. Patt. Analy. Mach. Intell. 23, 1, 5472.

  3. SERBY, D., KOLLER-MEIER, S., AND GOOL, L. V. 2004. Probabilistic object tracking using multiple features. In IEEE International Conference of Pattern Recognition (ICPR). 184187


  4. COMANICIU, D., RAMESH, V., ANDMEER, P. 2003. Kernel- based object tracking. IEEE Trans. Patt. Analy. Mach. Intell. 25, 564575.

  5. ALI, A. AND AGGARWAL, J. 2001. Segmentation and recognition of continuous human activity. In IEEE Workshop on Detection and Recognition of Events in Video. 2835.


    L. 2002. Background and foreground modeling using nonparametric kernel density estimation for visual surveillance. Proceedings of IEEE 90, 7, 11511163.

  7. PARAGIOS, N. AND DERICHE, R. 2002. Geodesic active regions and level set methods for supervised texture segmentation. Int. J. Comput. Vision 46, 3, 223247.

  8. ZHU, S. AND YUILLE, A. 1996. Region competition: unifying snakes, region growing, and bayes/ mdl for multiband image segmentation. IEEE Trans. Patt. Analy. Mach. Intell. 18, 9, 884900.

  9. Yilmaz, A., Javed, O., and Shah, M. 2006. Object tracking: A survey.

ACM Comput. Surev.Vol no. 38, 4, Article 13 (Dec. 2006), 45 pages.


Leave a Reply

Your email address will not be published. Required fields are marked *