DRIVER’s EYE: A Behavioural Monitoring System using Deep Learning

Download Full-Text PDF Cite this Publication

Text Only Version

DRIVER’s EYE: A Behavioural Monitoring System using Deep Learning

Sijo Joseph, S I Adithyan, Jishnuraj R, Aswin S UG Scholar: Department of Computer Science College of Engineering, Perumon

Kollam, India

Varun Chand H

Assistant Professor: Department of Computer Science College of Engineering, Perumon

Kollam, India

Abstract Nowadays road accidents across the globe have increased a lot. The drivers distraction is one of the usual caus- es of such accidents. This results in health-related issues, loss of life, damages to vehicles, financial problems and sometimes it may create traffic congestions as well. The distractions can be reduced to an extent by using an alarming system when it de- tects the driver distraction. The paper proposed a system that intends to observe real time eye movement detection and in- vehicular signals. These observed data are fed into the model, which categories the driver movements as either distracted or normal. If distraction detects the alarming system is activated, which helps the driver to get back to the normal state. Viola Jones algorithms along with other machine learning algorithms are used for in-vehicular face analysis and vehicle signal analy- sis.

KeywordsViola Jones; Deep Learning; Driver Distraction; In-vehicular Signals; Face Data Analysis.


    For the past few years road accidents increased a lot, which resulted in serious health issues and even resulted in death [1, 2]. There are several risk factors associated with drivers, which finally results in accidents ranging from very minor to severe in nature [3]. The identified risk factors are professional driver, fatigue, large vehicle type, overload, and terrain. By providing technological solutions to these identi- fied risk factors can help to reduce accidents up to an extent. With advancements in technology various smart and intelli- gent techniques are widely used in the transportation system [4-6]. Such techniques help to optimize road traffic, allocate parking space, etc. These techniques help to reduce driver frustration, recklessness etc. Different driver monitoring sys- tems are currently available that focus on driver eye move- ment, behavior, fatigue etc.

    The basic idea behind the project "Drivers eye" is real time monitoring of the drivers eye and facial features using a camera. Then these images are processed to find the drivers behavioural conditions like distraction, drowsiness, fatigue etc. The signals from the vehicle parts like accelerator, brake, steering etc. are also recorded and both these data are simul- taneously processed to find out whether the driver is distract- ed. If the driver is distracted, the alarm system which is placed inside the vehicle will warn the driver. This will help the driver to recover from distracted state to the normal driv- ing state. The majority of driver distractions are caused due the usage of smartphone, social networking activities etc.

    while driving [7]. The proposed system can reduce the driver accidents that are caused due to such social activities.

    The proposed model uses Viola Jones Algorithm for Face analysis and combined with other machine learning algo- rithms for evaluation of both In-Vehicular and Face data evaluation. The open data datasets were used for training both In-Vehicular and Face Data.


    1. Viola Jones Algorithm

      The Viola-Jones algorithm, created by Paul Viola and Mi- chael Jones in 2001, is an object-recognition algorithm that enables for real-time identification of visual properties. De- spite being an ancient algorithm, Viola-Jones is highly pow- erful, and its application in real-time face detection has prov- en to be exceptional. The Viola-Jones Algorithm has two stages:

      1. Training

      2. Detection

    1. Training

      The algorithm reduces the image to 24 by 24 pixels and searches for the taught features. To be able to see features in their various and variable forms, it requires a large amount of facial image data. That is why, in order to train the algorithm, we must provide a large amount of facial image data. Viola and Jones fed 4,960 photos to their algorithm (each manually labeled). You can transmit the mirror image of a given image to a computer, which would be brand-new information.

      You'll also need to give the algorithm non-facial images so it can tell the two classes apart. Viola and Jones provided 9,544 non-facial images to their system. Some of the photo- graphs may appear to be features of a face, but the algorithm will recognise which features are more likely to be on a face and which features are clearly not.

      1. Adaboost (Adaptive Boosting)

        The algorithm learns from the photographs we provide it with and can identify false positives and genuine negatives in the data, making it more accurate. Once we've looked at all of the conceivable places and combinations of those traits, we'll have a highly accurate model. Because of all the possible possibilities and combinations you'd have to verify for each single frame or image, training might be quite extensive.

      2. Cascading

        Cascading is another hack we may use to improve our model's speed and accuracy. So we take a sub-window and look for our most important or finest feature within that sub- window to see if it is there in the image within the sub- window. We don't even look at the sub-window if it's not in the sub-window; we just dismiss it. If it's present, we'll look at the sub-second window's feature. If it isn't present, the sub- window is rejected. We keep going till we get to the amount of features, and then we reject the sub-windows that don't have the feature. Although evaluations may take only a few seconds, doing so for each characteristic can take a long time. This process is accelerated by cascading.

    2. Detection

      Viola-Jones was created for frontal faces, therefore it can recognise frontal faces better than sideways, upwards, or downwards faces. The image is transformed to grayscale be- fore detecting a face since it is easier to work with and there is less data to process. The Viola-Jones algorithm discovers the position on the coloured image after detecting the face on the gray scale image. Viola-Jones draws a box and searches within it for a face. It's basically looking for haar-like charac- teristics, which will be detailed later. After cycling through all of the tiles in the picture, the box moves a step to the right. A number of boxes identify face-like traits (Haar-like fea- tures) in smaller phases, and the data from all of those boxes combined aids the algorithm in determining where the face is.

      1. Haar-like Features

        Characteristics similar to those of Haar are named after Alfred Haar, a 19th-century Hungarian mathematician who invented the Haar wavelet concept (kind of like the ancestor of haar-like features). The features below depict a box with a light and dark side, which the machine uses to determine what the feature is. As in the edge of a brow, one side may be lighter than the other at times. The Center area of the box may be shinier than the surrounding boxes, which can be mistaken for a nose.

        The integral picture aids us in doing these time- consuming computations fast so that we may determine whether a feature of a group of features meets the criteria.

        The total of the highlighted region in the regular im- age is used to determine the green box in the integral image. We#39;d have a sequence flowing through the grid if we did this for each box, and it might look like Fig 2.

      2. Control Area Networks

        A Controller Area Network (CAN bus) is a reliable vehi- cle bus standard that enables microcontrollers and devices to interact with each other's applications without the need for a host computer. It's a message-based protocol that was initially developed to conserve copper by multiplexing electrical wir- ing in vehicles, but it's now utilised in a variety of different applications. The data in a frame is broadcasted sequentially for each device, but in such a way that if many devices transmit at the same time, the highest priority device can con- tinue while the others back off. All devices, including the sending device, get frames.

        In 1985, Bosch created the Controller Area Network (CAN) for in-vehicle networks. Automotive manufacturers used point-to-point wiring methods to link electronic gadgets in automobiles in the past. Manufacturers began to incorpo- rate more electronics into their automobiles, resulting in enormous wire harnesses that were both heavy and costly. They then used in-vehicle networks to replace dedicated wir- ing, which lowered wiring cost, complexity, and weight. The CAN protocol, which is a high-integrity serial bus system for networking intelligent devices, has become the industry standard in-vehicle network. CAN was immediately em- braced by the automotive industry, and in 1993, it was desig- nated as an international standard known as ISO 11898. CAN has formed the basis for numerous higher-level protocols, such as CAN open and Device Net, since 1994. These new protocols, which are now industry norms, have been widely adopted in other areas. Vehicle signals are passed using mod- ern May networks, and we can readily access them from a CAN.

        Integral Image

        Fig 1 Haar like feature application


    M. Sabet and colleagues introduced a unique technique for detecting driver tiredness and distraction [8]. A new mod- ule based on visual information and Artificial Intelligence is proposed here for autonomous driver sleepiness detection.

        1. Warning System Design Face Detection

          The suggested system employs the Viola and Jones (VJ) approach for face detection.

        2. Object Tracking

          They use the Ning et al. approach [9] for object tracking. The mean shift is used in this procedure. This meth- od used the LBP pattern in addition to a colour histogram and proposed a joint color-texture histogram method, which rep- resents a very distinct and effective target. The LBP operator is used to label pixels in a picture. This procedure was carried out by,

          Fig 2 Integral images

          Fig.3 Block diagram

          The function s(x) is defined as follows: where gc is the grey level of the centre pixel and gp is the grey level of P pixels around the centre with radius R.

        3. Feedback System

          After the reference and output LBP pictures have been com- puted, the discrepancies between the two images can be measured using two-dimensional correlation. The lower the divergence, the higher the number.

          When the divergence exceeds the threshold, the feedback system sends control to the detection system, which uses it to determine the true face position and re-initialize the object tracking system.

        4. Decision Rules

          The decision box always passes control to the detection system until the face is found at the start of the proposed sys- tem and when the first frame is loaded into the system. If the detection system fails to find the face after passing this level, the decision box sends control to the next section and then initialises the tracking system with the latest face location.

          If the assumption of mild head movement is correct, and the face detection system outputs a location that is far from the previous position in each frame, we can assume the face detection system made a mistake and the output is a false alarm. As a result, the decision box transmits control to the next section, which utilises the face's most recent position.

        5. Eye Detection

          After obtaining the position of the face, it is possible to locate the eye with greater precision. Adaboost is a programme that detects the presence of eyeballs.

        6. Eye State Analysis

          The LBP operator was used to extract eye characteristics when the eye was discovered. Because LBP is not affected by light, eye state analysis can be more precise. The feature vec- tor obtained from the sub-window is used in the next stage. As follows, compute feature vector d from LBP image G(x,

          y) with sub-window size 5 by 6:.

        7. Distraction

          The driver's face should be investigated to detect this trait because the attitude of the face carries information about one's attentiveness, gaze, and level of exhaustion. The following approach has been implemented to verify driver distraction. The following equation is used to calculate face orientation using the eye position:

          In this study, a new approach for monitoring and detect- ing driver tiredness and distraction is provided. This system makes use of cutting-edge computer vision and artificial in- telligence technology. To avoid divergence and losing the target, we used a feedback system based on LBP for face tracking. The detection and tracking modules can work to- gether thanks to the feedback mechanism.

          SVM was used to analyse the eye condition, and the LBP operator was used to extract features. For variable light, background changes, and facial orientation, the suggested method for face tracking and eye state analysis proves to be resilient and accurate. It's also been noted that the system generates agreeable outcomes.

          S. Maralappanavar and colleagues suggested a method for detecting driver distraction using gaze estimation to estimate the driver's attention and evaluate whether the vehicle is dis- tracted or not [10]. The driver's look direction is used to gauge his attentiveness. The driver's gaze is estimated by detecting the look with the help of the face, eye, pupil, and eye corners, and then categorising the discovered gaze as distracted or not.

          Fig. 4 Proposed Algorithm for Drivers Eye Detection Systems

          Different modules are used to implement the drivers' gaze direction estimation. Face detection is done first, fol- lowed by accurate eye detection using the template matching method, and then gaze estimation. The driver's gaze is used to determine how distracted he or she is. Face Detection Using

          Viola-Jones Approach is used a flowchart of the suggested algorithm is given

        8. Eye Detection with Template Matching Method

          The following is how the template matching is done: An eye template of size w x h is created, and an input image of size W x H is created. By overlapping the template picture on the input image, the normalized cross-correlation is dis- covered. The size of the input image is adjusted to fit the template. In our situation, three templates are used: one for the left, one for the right, and one for the centre. The maxi- mum value for the matched place is returned by template matched. The highest value for each of the templates is com- pared to the highest value obtained by combining the three. The white box indicating the template matched output sur- rounds the matched places. The eye region is accurate thanks to the template-matched output.

        9. Gaze Direction Estimation

          Horizontal distance is calculated using the distance be- tween the pupil boundary and the eye corners. On both eye corners, a line segment is drawn. The distance between one pupil boundary and one eye corner line segment is then cal- culated as d, while the distance between the other pupil boundary and the other eye corner is calculated as d1. The position of the pupil and the placement of the eye corner are utilised to determine the direction in which the person is gaz- in. The calculated distance is used to determine whether or not the driver is distracted. The eye corner segments and pu- pil boundaries are depicted in Fig. 5..

          Fig. 5 Segmentation of Eye

          Face detection is performed using the viola jones algo- rithm, followed by eye region localization using the integral projection function, pupil detection, and gaze categorization in their suggested system.

          L. Alam and colleagues proposed a real-time distraction detection system [11] based on the driver's visual characteris- tics. Its goal is to create a method for detecting distraction in real time by assessing a driver's visual features in the facial region.

          Fig 6. Schematic diagram of drivers attention monitoring system

          The suggested method extracts essential information from visual cues such as eye and head movement to detect driver attention states and classify them as either attentive or dis- tracted. In this strategy, the deviation of the eye centre and head from their normal positions over a period of time is con- sidered to be useful signs for identifying lack of attention.

          Face detection is conducted first, followed by the extrac- tion of the region of interest (ROI) – the eye and head region – using facial landmarks, and finally the detection of head and eye movements to identify the attention state. They conduct- ed an experiment in a real-world driving setting with people with various characteristics to assess the system's perfor- mance. In all of the cases they evaluated, their system detect- ed the attention state with an average of 92 percent accuracy.

        10. Eye Movements Detection

          The centre of the eyes is detected to estimate the rate of departure of eye centres from the standard position in order to identify eye movement. The location of the eye's centre has been discovered in order to do so at first. Some preprocessing on both the left and right eye pictures found by six (x, y)- coordinates from the preceding modules is performed indi- vidually to facilitate the determination of the eye centre. The sites relating to the ocular region that were used to identify eye movements are shown in Fig. 7.

          Fig 7. Eye Region Detection

          The video stream taken from a web cam-era installed on the dashboard is used in this work to develop a system that detects distraction in real-time. The system may detect dis- traction in the driver based on a range of face features (such as facial features, haircuts, and accessories). For the entire situation, the system had an accuracy of roughly 92 percent. Our tests revealed that our system for detecting eye and head movement is reliable. The addition of a warning system to

          alert the driver if distraction is detected is being maintained as a future project.

          No distraction detection by in-vehicle signal processing was carried out by S. Im and colleagues [12]. The techniques and experimental findings for detecting distraction without deliberate distraction using in-vehicle signals. Normal and distracted driving features can be classified in real-world driving circumstances by combining two types of machine learning algorithms – unsupervised learning and supervised learning.

        11. Signal Selection and Data Processing

          Many in-vehicle signals are available, including steering angle, steering speed, acceleration/brake pedal value, yaw rate, longitudinal/lateral acceleration, and so on. Finding sig- nals relating to driving style is critical. These signals should also have characteristics that distinguish between normal and distracted states. They discovered signals that met these crite- ria. These signals were chosen and analysed in order to derive characteristics of both regular and distracted driving.

        12. Data Processing

          Before being used as inputs to the learning system, select- ed signals should be preprocessed. First and foremost, they employed signals that indicate a vehicle speed of more than 60 km/h. They assumed, like other automakers' products, that the algorithm works on roads. When the vehicle speed is modest, it's difficult to tell the difference between regular and distracted driving.

          Fig 8. Correlation coefficient of in-vehicle signal combinations

          Table 1. Statistical Analysis

        13. Distraction Detection Algorithm Using Machine Learning

          In terms of learning strategy, the driver distraction detec- tion algorithm works in two steps: first, distraction detection using only normal driving patterns, then, if specific condi-

          tions are met, distraction detection using both normal and distracted driving patterns.

          Because it uses only normal driving data to generate the model, the distraction detection strategy employing the GMM model cannot create the decision border between normal and distracted driving. In most experiments, supervised learning systems were utilised, which required not just regular driving data but also inattentive driving tendencies. Their study like- wise used supervised learning, but the distracted driving pat- tern was obtained in a different way. They did not need driv- ers to perform any additional duties, but they did see a dis- tracted driving tendency. The statistical disparity between normal driving data and test data is high when drivers are preoccupied while driving. To get a more reliable pattern, more thresholds of statistical distance are needed.

          They devised an algorithm that combines two approaches and learning schemes: normal/distracted driving pattern learning (GMM) and normal/distracted driving pattern learn- ing (NDPL) (ANN). They demonstrated that distracted driv- ing data can be collected naturally on real roads. The algo- rithms' two phases were compared, and it was discovered that the way that uses two types of driving data performed better. They have no clue about the state of the driver other than how he employs an accelerator, brake, and so on, thus not only quantitative but also qualitative evaluation is vital.

          Fig 9. An example of ANN output

          As a new technique to detecting distracted car drivers, F. Mi- zoguchi and et al. construct new rules for determining wheth- er or not a driver is distracted, using acquired data about the driver's eye movement and driving data through learning [13]. To generate the rules, they employ a learning technolo- gy called a support vector machine (SVM). In addition, they explored the relationship between a qualitative model of a driver's cognitive mental load and the driver's distraction in a previous study. We validate the driver's eye movements and driving data that are contradictory with the model during the investigation.

        14. SVM learning

    Figure 10 shows data for the case where the SVM learns the collected parameters. One saccade corresponds to the parameters on a single line. The number on the left (+1 or -1) indicates whether you are not distracted (+1) or distracted (- 1). (-1). After scaling, the SVM can learn these parameters, and learning data is generated. For SVM learning, they used the Gaussian kernel as a kernel function.

    Fig 10. Parameters used in SVM learning

    Through a new approach to detect distracted driving that incorporates an SVM learning tool, their study creates new rules for visually dis-tracted driving utilising acquired data on the driver's eye movement as well as driving data. They can ensure that precision can be reached when driving on a road with moderate traffic by confirming the rules established via the learning process.

    Furthermore, they analyse the relationship between a qualitative model of a driver's cognitive mental load pro- duced in a previous study and the driver's distraction. They look for any eye movement or driving data that contradicts the model during the examination. As a result, they can detect visually distracted driving and confirm that the frequency of saccades per unit time during driving has decreased. This behaviour is in line with the characteristics of cognitive dis- traction It is possible to detect distraction during operation by noting the mental-load qualitative model, which reveals that there are ways beyond interpreting the driving condi- tions.

    Fig 11. Flow Chart for DRIVERs Eye Algorithm


    Proposed methodology is to introduce an alarming system that alert driver when he is distracted, we use the eye move- ment pattern analysis data along with in vehicular signal pat- tern analysis data to evaluate the same, our system gets trained with collected eye and in vehicular signal patterns which then can evaluate real time patterns of both.

    The proposed model uses viola jones algorithm for face data analysis, then combines it with in-vehicular signal analy- sis data using a new algorithm

    By using both data the model can form two different trained models then the results of in vehicular data model can be used to decide the thresholds for face data analysis. Eg. When we see a very low speed(<20kmph) we can allow the driver to look away for a little more time than usual.

    Fig 12. DFD Diagram

    Fig 13. Use Case Diagram

    We can even make use of open source pre-trained models that are already good enough for face data evaluation. E.g. shape_predictor_68_face_landmarks (Dlib).

    For the In-vehicular data analysis model described in [12] some additions with respect to Indian roads can be done. Both data are evaluated with their own haar like features then o/p of both models took together to decide the thresholds for facial feature monitoring.


A driver eye monitoring system has been proposed in this paper to alert the driver when his face is distracted from the normal state. Real time analysis of drivers eye movements and in vehicular signals like signals from break, accelerator, steering etc. helps in deciding whether a driver is in a distracted state or in a normal state. Both eye movement and vehicular signal data are used for converging a better solution for identifying distraction in the driver. The pro- posed model can be built with drivers eye movements as input from the camera and the state of driver can be evaluated using a vehicular system (CAN). The alarming system will receive a signal when the drivers state changes to distraction and produces a sound to alert the driver. By this way the driver can promptly return to the normal state from the dis- tracted one.

Additional features can be added to improve the perfor- mance of this proposed model. If the cause of distraction is uncommon like, distracted by seeing an accident, misbehav- ior from co-passenger, road problems etc. the driver can in- form such things to concerned authorities. This can be im- plemented by adding add-on features to the proposed model of EYE monitoring and alerting system. The proposed model can even check whether the driver is minor or not. If the driv- er is a minor such information can be passed to the concerned authority.

The proposed system combines two of the most common methods and uses more refined algorithms. Hence the model is expected to produce precise and faster results.


[1]. S. K. Singh, Road Traffic Accidents in India: Issues and Challeng- es,Transportation Research Procedia, vol. 25, pp. 4708-4719,2017.

[2]. Y. Darma, M. R. Karim and S. Abdullah An analysis of Malaysia road traffic death distribution by road environment, Sadhana, vol. 42, pp. 16051615, 2017.

[3]. G. Liu , S. Chen, Z. Zeng, H. Cui, Y. Fang and, Risk factors for extremely serious road accidents: Results from national Road Accident Statistical Annual Report of China, PLoS ONE, vol. 13, no. 8, 2018.

[4]. H. V. Chand, J. Karthikeyan, Recent Survey on Traffic Optimization in Intelligent Transportation System, Journal of Advanced Research in Dynamical and Control Systems, vol. 9, no.15 ,pp. 89-100, 2017.

[5]. J. Zhang, Y. Yu and Y. Lei, The study on an optimized model of traffic congestion problem caused by traffic accidents, 2016 Chinese Control and Decision Conference (CCDC), pp. 688-692, 2016.

[6]. E. C. Eze, S. Zhang, E. Liu and J. C. Eze, Advances in vehicular ad- hoc networks (VANETs): Challenges and road-map for future devel- opment, International Journal of Automation and Computing, vol. 13, pp. 1-18, 2016.

[7]. F. DAmico, A. Calvi, C. Ferrante and L. B. Ciampoli, Assessment of Driver Distraction Caused by Social Networking Activities Using the Smartphone: A Driving Simulator Study, In: Stanton N. (eds) Advanc- es in Human Aspects of Transportation, Advances in Intelligent Sys- tems and Computing, vol. 1212, Springer, Cham, 2020.

[8]. M. Sabet, R. A. Zoroofi, K. Sadeghniiat-Haghighi and M. Sabbaghi- an,A New System for Driver Drowsiness and Distraction Detection, 20th Iranian Conference on Electrical Engineering (ICEE2012), pp. 1247-1251, 2012.

[9]. J. Ning, L. Zhang, D. Zhang and C. Wu, Robust mean-shift tracking with corrected background-weighted histogram, IET Computer Vision, vol. 6, no. 1, pp. 62-69, 2012.

[10]. S. Maralappanavar, R. Behera and U. Mudenagudi, Drivers Distrac- tion Detection based on Gaze Estimation, 2016 International Confer- ence on Advances in Computing, Communications and Informatics (ICACCI), pp. 2489-2494, 2016.

[11]. L. Alam and M. M. Hoque, Real-Time Distraction Detection Based on Drivers Visual Features, International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 1-6, 2019.

[12]. S. Im, C. Lee ,S. Yang, J. Kim and B. You, Driver Distraction Detec- tion By In-Vehicle Signal Processing, 2014 IEEE Symposium on Computational Intelligence in Vehicles and Transportation Systems (CIVTS), pp. 64-68, 2014.

[13]. F. Mizoguch, H. Nishiyama and H. Iwasaki, A New Approach to Detecting Distracted Car Drivers Using Eye-Movement Data, 2014 IEEE 13th International Conference on Cognitive Informatics and Cognitive Computing, pp. 266-272, 2014.

Leave a Reply

Your email address will not be published. Required fields are marked *