Abandoned or Removed Objects Detection from Surveillence Video using Codebook

DOI : 10.17577/IJERTV2IS50297

Download Full-Text PDF Cite this Publication

Text Only Version

Abandoned or Removed Objects Detection from Surveillence Video using Codebook

Abandoned or Removed Objects Detection from Surveillence Video using Codebook

Sajith K

    1. ech Student (CSE)

      Viswajyothi College of Engineering and Technology Vazhakulam, Ernakulam (dist), Kerala, India

      Dr. K. N. Ramachandran Nair

      Head of the Department (CSE) Viswajyothi College of Engineering and Technology

      Vazhakulam, Ernakulam (dist), Kerala, India

      AbstractDetection of abandoned or removed objects from complex surveillance videos isdifficult due to many factors like occlusion, quick lighting changes etc. The proposed system efficiently detects abandoned and removed objects in surveillance video from a stationary camera based on foreground analysis. As the most important phase, the background subtraction is implemented using CODEBOOK method with several improvements like dynamic updation of the background model, reduced computation time and efficient memory utilization. The same codebook is used for static object detection by generating two background models at different frame rates. Moreover the static object detection is combined with tracking to reduce negative results.Classification of abandoned and removed objects are achieved by a matching method based on edge detection and user defined parameters such as size and position. The system is further improved with a human detection method based on histogram of oriented gradients.

      Keywordsabandoned object; removed object; codebook; background subtraction (BGS); video surveillence; antiterrorism


        Video Surveillance allows us to remotely monitor a live or recorded video feed which often includes people. As a tool for crime control and investigations video surveillance now receives increased attention and hence there has been a significant increase in the use of video surveillance cameras in public locations such as stores, ATMs, schools, buses, subway stations, and airports in order to combat crime. Now automatic systems to detect abnormal activities from surveillance videos are gaining importance as a means to help the security personals. One among them is the abandoned objects detection system.

        object that has not been in the scene before, and a removed object to be a stationary object that has been in the scene before, but is not there anymore. Hence the objective is to detect static regions that have recently changed in the scene and determine whether they correspond to abandoned or removed objects using background subtraction and foreground analysis based on user defined parameters.


        1. Codebook For Background Model Generation

          The codebook method proposed by Kyungnam Kim, Thanarat.H.Chalidabhongse, David Harwood and Larry Davis

          [1] presents a background subtraction algorithm where backgrounds are modeled using multiple codewords. The key features of the algorithm

          • A compact background model to capture structural background motion over a long period of time under limited memory.

          • Can model moving or repeatedly changing backgrounds.

          • Can cope with local and global illumination changes.

          • Unconstrained training that allows moving foreground objects in the scene during the initial training period.

          • Layered modeling generates multiple layers of background.

        2. Construction of Codebook

          In order to generate a background model that captures moving or repeatedly changing environment, each pixel in the video frame is modelled using a codebook. A codebook is a collection of codewords{ c1, c2,

          ..c }corresponding to each pixel and the num er

          n b of

          Detection of abandoned objects is very important to prevent attacks on landmarks, public transportation, and critical assets. It is very challenging for security officers as well as video surveillance solutions to quickly detect objects that have been left behind at places with high traffic flows like train/subway stations, airports, big cities, and other public places. The existing tracking based approaches for abandoned object detection are unreliable in complex surveillance videos due to problems like occlusions, lighting changes, and other factors. An abandoned object can be defined to be a stationary

          codewords for each pixel may be different. Each codewordciis a set of 9 values { Ri, Gi, Bi, Imin, Imax, f, MNRL, p, q }.

          Ri, Gi, Bi: the RGB values corresponding to the pixel.

          Imin, Imax: the min and max brightness, for the pixels assigned to this codeword

          f: the frequency with which the codeword has occurred.

          MNRL : the maximum negative run-length (MNRL) defined as the longest interval during which

          thecodeword has not recurred.

          p, q : the first and last access times, respectively, that thecodeword has occurred.

          During background generation value of the pixels in the current frame is compared to the codebook to determine which codeword cm it matches (if any). In order to find a matching codeword a color distortion measure and brightness bounds is used.

          The color distortion for a pixel xwith RGB values {R, G, B} w.r.t a codeword ci with RGB values {Ri, Gi, Bi} can be calculated as below :

          C= (null set), FDM=0 (black frame), FDC=1(first codeword)

          1. For each frame F(t) do

          2. For each pixel F(x, y) do

            (a) Assign = , , and = 2 + 2 + 2

            1. Find a matching codeword ci from C(x, y) based on two conditions (1) and (2):

              1. , ( is a threshold value) 2. , , =

            2. If C= or no matching codeword is found

              , = 2 + 2 + 2

              ( + + )2

              1. Mark the pixel as forground, by setting it to

                2 + 2 + 2

                The brightness of a pixel x with RGB values {R, G, B} can be calculated as

                = 2 + 2 + 2 achcodeword stores Imin and Imax, the minimum and maximum brightness of all pixels assigned to that codeword respectively. The brightness is allowed to vary a certain range [Ilow, Ihigh] defined by two thresholds and as given below:

                white, ie FDM(x, y) = 1

              2. Increment L(x, y) by 1.

              3. Create a new codeword

                = R, G, B, I, I, 1, 1, ,

              4. Add it to codebook C(x, y).

            3. Else update the matching codeword

          = { , , , , , , , , }as


          1. =

          + , + , + , min , , max , ,



          +1 +1 +1

          where <1 and >1. Typically, lies between 0.4 (for large bounds) and 0.7 (for tight bounds), and limits Ihigh, inorder to handle shadow regions, its value lies between 1.1 and 1.5. The brightness function is given by:

          , , =

        3. Construction of Background Model

          The background model is computed through a per-pixel on-line statistical analysis of the video frames in order.Let F(t) be the W x H input frame at any time t, C be the W x H codebook where C(x, y) is the codebook corresponding to pixel at position (x, y) and L(x, y) gives the number of codewords in each codebook C(x, y). FDM and FDC are the W x H foreground detection mask and the foreground detection count (number of codewords to be considered for FDM generation) respectively.

          1. Initialize the code book.


            <> + 1, max , , ,

            2. If > (, )

            • Filter the codebook C(x,y).

            • Set FDC(x, y) = number of codewords in thefiltered codebook

          1. End For

          2. End For

        4. Filtering the Codebook

          Filtering is the process of removing the unwanted codewords from each codebook, thereby reducing the size of the codebook. The codebook needs to store only those codewords that represent the static background pixels or repeatedly occurring background pixels, removing the moving foreground pixels. So the codewords that doesnt occur continuously for a specific number of frames (µ) are removed from the codebook based on the MNRL value.

          The codebook is filtered at regular intervals ie after every Nth frames. The filtered codebook contains those

          codewordswhose MNRL value is less than a threshold µ. That is the filtered codebook = µ} .The codewords with large MNRL values ie greater than µ will be

          eliminated. Usually µ = N/2, where N is the number of frames after which the codebook is filtered and the codewords should recur at least every N/2 frames.

          100th frame 200th frame 300th frame 400th frame 500th frame 600th frame 700th frame

          Original Video Frame

          . . . . . . . . . . . . . . . . . .

          Background Model BG

          . . . . . . . . . . . . . . . . . .

          Background Model SBG

          Fig1 Generated Background Models

        5. Detecting the Static Foreground

          The static foreground objects may possibly represent an abandoned or removed object. Hence two background models together with foreground region tracking is employed to detect

          Let BG be the background model generated from codebook C1 and SBG be the background model generated from codebook C2 as given below:

          static foreground objects. Once a codebook is constructed,

          , = (, , )| ,

          1(, )

          there will be at least one codeword corresponding to each 1 1

          pixel. After the initial filtering of the codebook, the first

          , = (, , )| ,

          2(, )

          codeword represents the static background pixels. Hence the 1 1

          background model is constructed using the RGB values of the

          first codeword of each pixel, ie the background , =

          (, , )| 1, 1 (, ) .

          The figure 1 represents the two background models:

          1. BG generated from C1 with the filtering rate 100 and

            For detecting the static foreground pixels two codebooks C1 and C2 are employed. The difference between the two

            µ = 100 ×


            for nth filtering.

            codebooks is only in the filtering rate and value of filtering

            threshold (µ). The first codebook C1 is used for the generation of the FDM (foreground detection mask), hence the filtering rate N must be sufficiently high and the filtering threshold

          2. SBG generated from C2 with the filtering rate 40 and

        µ = 40 for every filtering.



        µ = ×


        for nth filtering. The second codebook C2 is used

        solely for detecting static foreground pixels, hence the filtering rate N is equal to the minimum number of frames for which an object must remain static before it can be classified as a static foreground object and the filtering threshold is µ = for every


        filtering. That is the filtering threshold µ for C1 will be

        changing with time while it remains constant for C2.

        The static object detection consist of the following process:

        1. Generate the Static Object Detection Mask (SDM) and Foreground Detection Mask (FDM).

        2. Tracking the blobs in the FDM.

        3. Combining the SDM with tracking results.

        1. Generating the FDM and SDM

          The Foreground Detection Mask (FDM) is generated by the background model generation method described in section above using codebook C1, while the Static Object Detection Mask (SDM) generation consists of two process:

          1. Generating the Background detection mask. (BDM)

          2. Multiplying BDM with FDM

          The BDM is generated from the background models SBG and BG, using the color distortion value as given below:

          , = 1 , , , 0

          After which each pixel in BDM is multiplied with corresponding pixels in FDM to generate SDM as given below:

          , = 1 , , = 1 0

          The figure 2 represents the FDM, BDM and SDM generated for every 100th frame.

          100th frame 200th frame 300th frame 400th frame 500th frame 600th frame 700th frame

          Foreground Detection Mask (FDM)

          . . . . . . . . . .

          Background Detection Mask (BDM)

          StaticObject Detection Mask (SDM)

          Fig 2. Generated Foreground Masks

        2. Tracking the Blobs

          The Foreground Detection mask consists of black and white pixels, where the white ones denote the foreground pixels as well as noise. The collection of neighboring white pixels or the blobs in the FDM represents foreground objects which must be tracked for detecting the static objects. Whenever an object (blob) is detected in a frame a new track is generated for it and each track consist of two values hit count and miss count. In the next frame if the blob is in the same position and the size (area) of the blob is same, the existing track for that object will be updated by incrementing the hit count by 1 and setting miss count to 0, else the miss count will be incremented by 1.

          The existing tracks will be deleted when the miss count of the corresponding track is greater than a threshold 1. 1 can be set as the number of adjacent frames for which the object can remain occluded before dropping the track. So for a moving object a new track will be generated for the object in each frame and will be dropped after 1 frames while for a static

          object there will be only a single track with hit count equal to the number of frames for which the object remained static. Therefore a static object can be classified as an abandoned or removed object if the hit count is greater than a threshold 2.2 can be set as the number frames for which an object must remain static before classifying it as an abandoned or removed object.

        3. Combining SDM with Tracking Results

        The blobs in SDM represent the static objects in the video frame. As the static objects are slowly updated to the background model the detected blobs may be only a part of the static object at the beginning and will increment with time. Hence the static objects detected in SDM is classified as abandoned or removed object based on conditions (1) and (2).

        1. whetherarea of the detected object in SDM is greater than half the size of the original object:

          /2, where a is the area of object in SDM and A is thearea of the original object.

        2. whether hit count of the track corresponding to the detectedobject in SDM is greater than a threshold

        2 , where 2 is the number frames for which an object must remain static before classifying it as an abandoned or removed object.

        If the above two conditions are satisfied the static object is classified as abandoned or removed object. From figure 3 we can see that the proposed method can easily detect removed and abandoned objects from the scene.

        100th frame 200th frame 300th frame 400th frame 500th frame 600th frame 700th frame

        Fig 3. Static Object Detection


        1. Static region type detection

          For static type detection we first detect the edges in the original video frame using one of the standard edge detection algorithms. Then we start a fill in function from the interior of the detected object to the boundaries. The fill in function stops at the boundaries of the object, leading to a segmented region. The same proess is then applied in the background image BG, leading to the generation of another segmented region. This method is inspired by the work done by YingLiTian and others for heal type detection [2].

          We can classify a detected static object into an abandoned object or removed object by comparing the area of segmented region. Let A be the area of segmented region generated for

          the background BG and a be the area of segmented region generated for the current frame. Then the static object:

          • Is classified as an abandoned objectif A>a

          • Is classified as a removed object if A<a

          • Is ignored if A = a as it can be caused by lighting changes not objects

          From figure 4 we can see that the segmented region is larger for the generated background than the frame with the actual abandoned object.



          300th frame Edge Detection Fill In Result Segmented Area

          Fig 4. Static Type Detection

        2. Human Detection

          The main problem we confront in the above detection method is that the system detects every static object whether human or nonhuman. Hence a Human detection method based on (Histogram of Oriented Gradients) HOG descriptor is

        3. User Interface

        employed for Feature extraction [3]. HOG parameters are computed for a training data set consisting of human and nonhuman images and are used to train a Neural Network classifier for categorizing objects and humans.

        Fig 5.User Interface

        After a static region is healed and classified as an abandoned or removed object, some conditions need to be verified before triggering an alert. These conditions are specified by the user using our system interface, as shown in Fig., which include the following inputs:

        1. Sizes: Minimum and maximum object size.

        2. Regions of interest: Rectagular regions manually drawn by the user in the image (objects are detected only on those regions).

        3. Abandoned/removed time: It indicates how long a foreground region corresponding to an abandoned/removed object should stay stationary in the scene in order to trigger an alert.

        4. Maximum Occlusion time: It indicates how long a foreground region corresponding to an abandoned/removed object can remain completely occluded before removing the alert.

        5. Human detection: If both human and nonhuman object classes are selected for abandoned and removed object detection, the human detection process will be skipped.

        6. Event Recording: Record the events of dropping or removing an object from the scene.


        The proposed system is tested using pets 2006 and 2007 dataset [9] [10] to identify the effectiveness of the system for abandoned/removed object detection in a variety of environments. The system provided good results with the training set.

        Table 1: PETS 2006 Test Results



        Camera view

        Static objects

        Static persons

        False Result

        Pets 2006


        View 1




        View 2




        View 3




        View 4




        1. PETS 2006 and 2007 Dataset

          PETS 2006 and 2007 datasets were designed to test abandoned object detection algorithms in a public space. The PETS dataset consists of sequences containing left- luggage scenarios with increasing scene complexity. There

          are different scenarios captured by four cameras from different viewpoints. The scenarios are relatively simple, without many occlusions and crowds. The proposed algorithm detected all abandoned items, with zero false alarms. A static person is detected as an abandoned item in sequence S3 of PETS 2006 dataset.

          Table 2: PETS 2007 Test Results



          Camera view

          Static objects

          Static persons

          False Result

          Pets 2007


          View 1




          View 2




          View 3




          View 4




        2. Limitations

          The accuracy of detection system is influenced by many factors.

          1. The size of the abandoned object is too small or the abandoned object is occluded.

          2. Low-light conditions reduce the ability to discern one object from another, causing higher error rates.

          3. Static-object detection in crowded scenes is difficult, leading to higher error rates.

          4. Quick-lighting changes cause problems to detect aban- doned or removed objects.

          5. Low contrast situationslead to missed detections.


The proposed system robustly and efficiently detects abandoned and removed objects in real-time video surveillance. Two background models are employed to detect both background and static foregrounds by using the same Codebook method. Then, the static foregrounds were classified into abandoned or removed objects by segmenting and comparing the background model with the foreground image. This method can handle occlusions in complex environments with crowds. The testing results shows the system can be successfully applied in real-world surveillance applications.


  1. Kyungnam Kim, Thanarat H. Chalidabhongse, David Harwood, Larry Davis,Real-time foregroundbackground segmentation using codebook model, Real-Time Imaging, ELSEIVER, 2005

  2. YingLi Tian, Rogerio Schmidt Feris, Haowei Liu, Arun Hampapur, and Ming-Ting Sun, Robust Detection of Abandoned and Removed Objects in Complex Surveillance Videos, IEEE Transactions on systems, man, and cybernetics part c: applications and reviews, vol. 41, no. 5, september 2011.

  3. Oswaldo Ludwig Junior, David Delgado, Valter Gonc¸alves, Urbano Nunes, Trainable Classifier-Fusion Schemes: an Applicationto Pedestrian Detection,,Internatoinal IEEE Conference on Intelligent transportation systems, October 2009.

  4. C. Stauffer and W. E. L. Grimson, Adaptive background mixture models for real-time tracking, inProc. CVPR99, Jun, pp. II-2246 2252.

  5. Domenico Bloisi and Luca Iocchi, Independent Multimodal Background Subtraction, Department of Computer, Control, and Management Engineering – Sapienza University of Rome, Italy.

  6. Mrs.Megha V.Gupta, Dr.S.D.Sawarkar Change Detection based Real Time Video Object Segmentation, International Journal of Engineering Research & Technology (IJERT) Vol. 1 Issue 7,

    September 2012

  7. J. Wang and W. Ooi, Detecting static objects in busy scenes, Dept. Computer. Science, Cornell University, Tech. Rep. TR991730, Feb. 1999

  8. F. Porikli, Detection of temporarily static regions by processing video at different frame rates, inProc. IEEE Int. Conf. Adv. Video Signal-BasedSurveillance, London, U.K., Sep. 2007.

  9. www.cvg.rdg.ac.uk/PETS2006/data.html

  10. www.cvg.rdg.ac.uk/PETS2007/data.html

Leave a Reply