Use Of Haar Cascade Classifier For Face Tracking System In Real Time Video

DOI : 10.17577/IJERTV2IS4381

Download Full-Text PDF Cite this Publication

Text Only Version

Use Of Haar Cascade Classifier For Face Tracking System In Real Time Video

K .H. Wanjale, Amit Bhoomkar, Ajay Kulkarni, Somnath Gosavi

Department Of Computer Engineering, V.I.I.T., Pune

ABSTRACT- In this paper, we present information about the face detection and tracking system with real time video as an input. The working method of this system is entirely divided into two main modules. The face recognition and detection from the video is the first module while the tracking is the second module. To detect the face in the image, Face Name Graph Matching algorithm is used. This algorithm involves various methods such as Haar-Cascade method, Open-cv libraries etc. Face clustering algorithm is used for tracking the face in the video. This system is mainly designed for the security purposes. Video recorded in public areas for known person or suspicious activities are used as an input for the system. System will identify and track one or more people captured in that video, this is taken as an output of this system. This paper involves working of face tracking system, algorithms which are involve and results of the system.

KEYWORDS- Face detection, Face recognition, Face tracking, Face clustering, Face name graph matching algorithm.


    Objective and Motivation:

    Object tracking in a video sequence has been studied extensively for surveillance purposes. The ability to detect an object and then tracking is of great interest in many applications like security, missile tracking and rescue operations. Tracking an object under

    time varying position and orientation is a basic ability of computerized visual system. Throughout our lives, people present their faces as a form of identification. In the natural world, face identification is so pervasive; therefore it is reasonable to consider faces as a means for recognition using machines. The first semi-automated system for face recognition developed in 1960s which required the administrator to locate features such as eyes, ears, nose and mouth etc. on the photographs before it calculated distances and ratios to a common reference point which were compared to reference data. Now, in present era, computer with developed technologies and signal processors of digital image processing has become the most common form image processing due to its versatility and low cost. Major advancements and initiatives are taken in face recognition technology in the last 10 to 15 years. Therefore automated face recognition is a relatively a new concept.

    Face tracking system is used to develop an application to recognize and track the suspicious people and provide the information about them. Now a days a lot of security threats, criminal causes and illegal offenses done in the public places. Thus to reduce the job of police men some automated system is required which will guide Police for recognizing and tracking the suspect from the recorded video.


We represent the step wise working of our system which gives the brief idea about the modules used in the system.

Step1- Browse the sample video from the hard disk or from any other storage device.

Step2- Face detection will takes place in the video. Details about the detected face can be filled up and detected person can be defined as suspect or non-suspect.

Step3- Extract good features of the detected face.

Step4- Face recognition can be done if the same video has that person or browsed video contain the person which was detected previously.

Step5- Track the recognized person if it is defined as suspect.

Step6- Save the record of suspect in the database with the location and time at which it comes in the video sequences.

Now for the detail procedure of actual working we must study the algorithms which are used in the system. Brief explanations of algorithms are given below.


    Facial detection is impossible if the face is not isolated from the background. Although many different algorithms exist to perform face detection, each has its own weaknesses and strengths. These algorithms are computationally expensive and that is the major problem of these algorithms. Analyzing the pixels for face detection is time consuming and difficult to accomplish because of wide variations of shape and indecent within a human face. Pixels are also required for scaling and precision. Viola and Jones devised an algorithm, called Haar Classifiers, to rapidly detect any object, including human faces, using AdaBoost classifier cascades that

    are based on Haar-like features and not pixels [2].


    The core basis for Haar classifier object detection is the Haar-like features. These features, rather than using the intensity values of a pixel, use the change in contrast values between adjacent rectangular groups of pixels. The contrast variances between the pixel groups are used to determine relative light and dark areas. Two or three adjacent groups with a relative contrast variance form a Haar-like feature. Haar-like features, as shown in Figure 1 are used to detect an image. Haar features can easily be scaled by increasing or decreasing the size of the pixel group being examined. This allows features to be used to detect objects of various sizes [2].

    Figure 1 Common Haar Features [2].

    The cascading of the classifiers allows only the sub-images with the highest probability to be analyzed for all Haar-features that distinguish an object. It also allows one to vary the accuracy of a classifier. One can increase both the false alarm rate and positive hit rate by decreasing the number of stages. The inverse of this is also true. Viola and Jones were able to achieve a 95% accuracy rate for the detection of a human face using only 200 simple features [2].

    Detecting human facial features, such as the mouth, eyes, and nose require that Haar classifier cascades first are trained. In order to train the classifiers, this gentle AdaBoost algorithm and Haar feature algorithms must be implemented. Fortunately, Intel developed an open source library devoted to easing the implementation of computer vision related programs called Open Computer Vision Library (OpenCV). The OpenCV library is designed to be used in conjunction with applications that pertain to the field of HCI, robotics, biometrics, image processing, and other areas where visualization is important and includes an implementation of Haar classifier detection and training [2].

    Thus with help of this algorithm system will detect the persons face in the video. Face of the person gets Green Square as an indication of detection process. As soon as the face gets detected user can paused the video and enters the data of detected person such as persons name, address, profession, criminal record if any. If the detected person has criminal record then it can be defined as suspect. Check box option is given in the system where user can tick whether the person is suspect on not. This is the working of first module in which sample video is browsed and face is detected.


    The main purpose of using features rather than the pixels directly is that features can act to encode ad-hoc domain knowledge, which is quite difficult to learn using a finite quantity of training data. In order to regionalize the image, one must first determine the likely area where a facial feature might exist. The simplest method is to perform facial detection on the image first. The area containing the face will also contain facial features [2]. The bestmethod to eliminate extra feature detection is to further regionalize the area for facial feature detection. It can be assumed that the eyes will be located near the top of the head, the nose will be located in the center

    area and the mouth will be located near the bottom.

    The upper 5/8 of the face is analyzed for the eyes. This area eliminates all other facial features while still allowing a wide variance in the tilt angle. The center of the face, an area that is 5/8 by 5/8 of the face, was used to for detection of the nose. This area eliminates all but the upper lip of the mouth and lower eyelid. The lower half of the facial image was used to detect the mouth. Since the facial detector used sometimes eliminates the lower lip the facial image was extended by an eighth for mouth detection only [2].

    There are two steps involve in facial feature extraction. Facial feature detection is the first step in which face is detected. This requires analyzing the entire image. Second step involves isolated face(s) to detect each feature. To extract features we have used Shi and Tomsi method. This method is based on general assumption that the luminance intensity does not change for image acquisition [1].


    Face recognition refers to an automated or semi automated process of matching facial images. This type of technology constitutes a wide group of technologies which all work with face but use different scanning techniques.

    All available face recognition techniques can be classified into four categories based on the way they represent face:

    1. Appearance based which uses holistic texture features.

    2. Model based which employ shape and texture of the face, along with 3D depth information.

    3. Template based face recognition.

    4. Techniques using Neural Networks.

    Fig2. Classification of Face Recognition Methods.

    Depends on the Appearance based in linear manner, we used PCA (Principal Component Analysis) method for face recognition. PCA is a way of identifying patterns in data and expressing the data in such a way to highlight their similarities and differences. The purpose of PCA is to reduce the large dimensionality of the data space (observed variables) to smaller intrinsic dimensionality of feature space (independent variables) which are needed to describe the data economically [3].

    The main idea of using PCA for face recognition is to express the large 1-D vector of pixels constructed from 2-D facial image into the compact principle components of feature space. This method is also called as Eigen space projection. With the help of PCA, we get a subset of principal components in a set of training faces. We project faces into this principal components space and get Eigen face vectors. Calculating the distance between these vectors, comparison is performed. After performing a PCA, original data is expressed in terms of Eigen vectors found from the comparison matrix.

    For the face recognition several types of decision can be made depending on the application. Face recognition is a broad term

    which may be further specified to one of following tasks:

    • Identification:- where the labels of individuals must be obtained

    • Recognition:- recognition of a person, where it must be decided if the individual has already been seen,

    • Categorization:- where the face must be assigned to a certain class

      PCA computes the basis of a space which is represented by its training vectors. These basis vectors, actually eigenvectors, computed by PCA are in the direction of the largest variance of the training vectors. When a particular face is projected onto the face space, its vector into the face space describes the importance of each of those features in the face. The face is expressed in the face space by its eigenface coefficients (or weights). We can handle a large input vector, facial image, only by taking its small weight vector in the face space. This means that we can reconstruct the original face with some error, since the dimensionality of the image space is much larger than that of face space [4].

      With the help of this technique, system can recognize already detected face from the video. Under the framework of Face Name Graph Matching, recognized face will automatically get the name and all other details. If the recognized face is of suspects face then symbol for recognition that is square box on the face gets the red color. Recognition method is very important and complex. Recognizing the faces from the real time video is very difficult and time consuming process.


    Video is everywhere; the goal of video recording is to track or identify one or more people or objects captured in video. Tracking of the face is based on its feature and movements. There are plenty of techniques are used to recognize movement of face in the video sequences. Some of these techniques are

    based on feature and pattern recognition and some other are simply based on pixels. Examples of these techniques are Block Matching Analysis, Optical Flow Estimation Method and so on.

    In computer vision there is a lot of optical flow estimation techniques applied in fields as behavior recognition or video surveillance. This method is fast enough for implementation that allows building real time applications. With the help of this technique, our system tracks the recognized face in the video sequences.

    We added a new method in which the record of suspects face can be tracked. Previously when the suspects face occurs in the video at different intervals of time, user will not get that time and position in the video. Therefore to track suspects face in the video we create a database in which the time of suspect found in the video is automatically saved in the database. With the help of this database user can get the knowledge of how many times the suspect appears in the video.

    Face tracking clusters and Error correcting graph matching techniques are used to track the suspect and maintain its database.

    Tracking is takes place in the video sequences which are recorded in the public places at real time. These videos are generally recorded by CCTV cameras therefore they are real. Our system identifies the suspect from these videos; therefore we called it as real time video tracking system.

  5. RECORD MAINTAINING System will separate out the frame which

was recognized in the video. These frames are saved in the hard disk for further use. Frames of suspicious frames are also saved in the database in the tabular form with its details.

This database is very easy to use and maintain. For the database connectivity we used linking between .net frameworks with ms-access. Informatory database is stored in

the access shit. Handling this database is very simple and more efficient.

These are the modules and algorithms which are used in our proposed system. Architectural diagram of the system is given below.


    This system will operate in Windows 7 as well as in XP. We are using Visual studio 2008 for .NET framework and for coding we are using C#. EMGU CV library is also required to be installing on the machine. This is Open-cv library which is used by .Net framework for image processing. Database is developed with the help of MS-ACCESS. For hardware requirements, a minimum 40 GB size of hard disk is needed with minimum 2 GB of RAM. These requirements are very simple and easily available in the market.


    For the better result there has to be some assumptions and dependencies which gives the output with less error. These are given as follows.

    • Input video must have good quality.

    • Frame in the video should have good effect of light and shadow.

    • Face of the suspect in the video must be very clear

    • Front view of the face s more preferable.

    Persons face in the video should be present in the front view because of which the process of detection becomes very efficient and fast. If the face orientation is slightly tilted or not perfectly centered, then detection process will takes more time to detect the face. These are some points which are taken into consideration while using this system.


    • It is used for real time videos which are recorded at public places for surveillance. Instead of images, videos are used as an input which is the big advantage of this system on the previous once.

    • Softwares required for the system are easily available because of which it becomes very less expensive system.

    • Database maintenance and handling is very simple that any user can do it independently.

    • System mainly used to recognize and track the suspect in the video. Therefore it is highly used for security purposes.

    • User friendly GUI can guide the new user to operate the system.


    In this paper, we have presented a face tracking system in a real time video environment. To detect the face in the video, we have used Haar Casacaded Classifier technique which is based on Haar-like features. For the recognition method, we used PCA technique which gives the good result. All the detection and recognition procedures are carried out with the help of Face Name Graph Matching algorithm.

    To the last part, i.e. for tracking, face clustering algorithms are used. Tracking database is maintained with the help of Ms- Access. This is how we describes the successful implementation of face tracking system in real time video as an input.


  1. Nandita Sethi and Alankrita Aggarwal, Robust Face Detection and Tracking using Pyramidal Lucas Kanade Tracker algorithm.Nandita Sethi et al, IJCTA, sept- oct 2011.

  2. Philip Ian Wilson and Dr. John Fernandez, Facial Feature Detection Using Haar Classifiers at CCSC 2006.

  3. Dr. Lahouri Ghouti project report on, Face Recognition System. King Fahad University of Petroleum and Minerals, ICS411.

  4. Kyungnam Kim, Face Recognition using Principal Component Analysis.

  5. Matthew Curtis Hesher submitted thesis on. Automated Face Tracking and Recognition, at Florida state university college of arts and commerce.

  6. Jitao Sang and Changsheng Xu,Robust Face -NameGraph Matching for movie character identification. IEEE Transactions on multimedia, volx, no.x, 20xx.

  7. Y.Higashijma, S.Takano and K.Niijma, Face recognition using long haar-like filters. Department of informatics, Kyushu university, Japan.

Leave a Reply