Facial Emotion and Object Detection for Visually Impaired & Blind Persons

DOI : 10.17577/IJERTV10IS090108

Download Full-Text PDF Cite this Publication

Text Only Version

Facial Emotion and Object Detection for Visually Impaired & Blind Persons

Based on Deep Neural Network Concept

Daniel D

PG Student Department of Computer Science,

Impact College Of Engineering & Applied Sciences, Bangalore, India

Rekha M S

Assistant Professor Department of Computer Science,

Impact College of Engineering & Applied Sciences, Bangalore, India

Abstract Across the world there are more than millions of visually impaired and blind persons. Many techniques are used in assisting them such as smart devices, human guides and Braille literature for reading and writing purpose. Technologies like image processing, computer vision are used in developing these assisting devices. The main aspire of this paper is to detecting the face and the emotion of the person detected as well as identifying the object around the visually impaired person and alerting them through audio output.

KeywordsNeural network, datasets, Opencv, pyttsx, yolo, bounding box


    According to survey in the current world, more than 10 millions of entire populations are visually challenged. Many efforts are carried out in order to help and guide visually impaired and blind persons to lead their normal daily life.

    Language plays an important role in every ones life for communicating with each other but visually impaired and blind persons communicate through a special language know as Braille most of the schools came up in providing knowledge about Braille which makes visually challenged to read and write without others help. Other better way of communication is audible sound through which visually challenged can understand the things that are happening around them.

    Many assisting techniques such as assisting devices and applications are developed using the trending technologies such as image processing, Raspberry Pi, Computer Graphics and vision for guiding and providing assistance to visually impaired and blind in leading day to day life.

    The important aspect of this paper is identifying the person in front of the visually challenged along with their facial emotions and also identifying the object that is around the blind or visually impaired persons and alerting them with the audible sound.


    There exits various system in order to assist the visually challenged such as

    1. Braille literature: It is used to educate the visually challenged to read and write without third persons help.

    2. The wearable devices: Raspberry pi along with computer vision technology the content around the blind is noticed and informed through local language.

    3. Smart specs: It detects the printed text and produces the sound output through TTS method.

    4. Navigation System: An IoT based system is designed using Raspberry Pi and ultrasonic sensor in the waist belt which identifies the object and obstacles and produces the audio signals.


Datasets Collection

Dividing into training and testing set

Pre- processing data

Datasets Collection

Dividing into training and testing set

Pre- processing data

Test Model

Train Model

Build CNN


Test Model

Train Model

Build CNN


Predict Output

Predict Output

Figure 1: problem description


The Objectives of the proposed system are:

  1. Studying existing systems.

  2. Working with image acquisition system.

  3. Face and emotion recognition.

  4. Object identification.

  5. Classifying object using CNN

  6. Finding the position of an object in the input frame.

  7. Providing the sound alert about the position of the object identified.

The processing frame work includes blocks such as

  • A block responsible for basic preprocessing steps as required by the module objectives.

  • A block for identifying face, emotion and object.

  • A block for providing voice output.

    The steps involved in the process are as follows.

    Step 1: The dataset is collected from the different sources. Step 2: The collected datasets are divided into two parts that is 75% training and 25% testing.

    Step 3: The data is pre-processed for all the captured images. Step 4: The model is built using CNN.

    Step 5: Once the Testing and Training is done the output is produced as an audio alert.


    The modules used in designing the deep neural network model are

    1. Dataset collection

    2. Dataset Splitting

    3. Network Training

    4. Evaluation

      Dataset Collection:

      It is the basic stage in the construction of deep neural network the images along with the labels are used these labels are given based on known and unknown image categories The number of image should be same for both the categories Dataset splitting:

      The dataset that is collected and divided into training set and testing set. The training set is used to study hoe the

      Capturing image

      Loading Models

      Image Processing


      Face Emotion and Object Identification

      Output in Text form


      Audible sound form

      Figure 2: system architecture

      different categories of images looks like by using the input images. After training the performance is evaluated by testing the set.

      Network Training:

      The main idea of training network is to understand the recognition of labeled data categories. When there is mistake in the model it is corrected by itself.


      The model has to be evaluated. The predictions of the model are tabulated and compared with the ground label of the different categories and forms the report.















      Figure 3: Face and Emotion Detection

      Step for execution:

      Step1: creation of the face recognizer and obtaining the images of the faces that has to be recognized.

      Step2: The face that is collected has to be converted to the numerical data.

      Step3: The python script is written with the use of various libraries such as Opencv, NumPy etc.

      Step4: At last the Face and emotion of the person is recognized.

      Object Identification:

      Step1: Download the various models such as yolov3.weights, yolov3.cfg, coco.names.

      Step2: The parameters such as nmsThreshold, inpWidth, inpHeight are initialized. The bounding box is predicted and provided with confidence score.

      Step3: The different models and classes are loaded. The cpu is set as target and DNN backend as Opencv.

      Step4: In this stage the image is read, the frames are saved and the output bounding boxes are detected.

      Step5: The frames are processed and sent to blobFromImage function which converts it to neural network.

      Step6: getUnconnectedOutLayers() function is used to obtain the names of the output layers of the network.

      Step7: The network is preprocessed and bounding boxes are represented as a vecter of 5 elements.

      Step8: The filtered boxes from the non maximum suspension are drawn on the input frame. The object is identified and produced as an output through pyttsx.


      The window consisting of various buttons as shown in ig pops up when the python script is executed. This provides the buttons for capturing the face, training the captured face and recognizing the face and emotion and object detection.

      Figure 4: window with various buttons.

      Once the face is captured it as for the user name and trains the face data as show in the below figure.

      Figure 5: Training the face data

      The following figures show the recognition of face and different emotions.

      Figure 6: Face recognized with happy emotion

      Figure 7: Face recognized with Sad emotion

      Object identified with the position of the object as shown in the below figure. And using the quit button we can exit from the window.

      Figure 8: Identified object with its position


        In this paper we have designed a system using deep neural network techniques where at first using the face recognizer the image is captured and is divided training and testing set based on the known and unknown label categories. The face is trained and the face and emotion of the person is recognized. The object is detected using the yolo( you only look once) technology where the object is captured and the bounding box is predicted with different confidence score and the output is provided to the user about the object identified and its position through audio format. With this the visually impaired and blind persons can easily come across the face emotion of the person in front of them and the position of the objects around them.


  1. M.P. Arakeri, N.S. Keerthana, M. Madhura, A. Sankar, T. Munnavar, Assistive Technology for the Visually Impaired Using Computer Vision, International Conference on Advances in Computing, Communications and Informatics (ICACCI), Bangalore, India, pp. 1725-1730, sept. 2018.

  2. R. Ani, E. Maria, J.J. Joyce, V. Sakkaravarthy, M.A. Raja, Smart Specs: Voice Assisted Text Reading system for Visually Impaired Persons Using TTS Method, IEEE International Conference on Innovations in Green Energy and Healthcare Technologies (IGEHT), Coimbatore, India, Mar. 2017

  3. Ahmed Yelamarthi A Smart Wearable Navigation System for Visually Impaired, Research Gate 2016

  4. YogeshDandawateHardware Implementation of Obstacle Detection for Assisting Visually Impaired People in an Unfamiliar Environment by Using Raspberry Pi Research Gate 2016

  5. Renuka R. Londhe, Dr. Vrushshen P. Pawar, Analysis of Facial Expression and Recognition Based On Statistical Approach, International Journal of Soft Computing and Engineering (IJSCE) Volume-2, May 2012.

  6. AnukritiDureha An Accurate Algorithm for Generating a Music Playlist based on Facial Expressions : IJCA 2014.

  7. V. Tiponu, D. Ianchis, Z. Haraszy, Assisted Movement of Visually Impaired in Outdoor Environments, Proceedings of the WSEAS International Conference on Systems, Rodos, Greece, pp.386-391, 2009

Leave a Reply