Object Navigation for Visually Impaired People

DOI : 10.17577/IJERTCONV8IS14053

Download Full-Text PDF Cite this Publication

Text Only Version

Object Navigation for Visually Impaired People

Likhitesh R

Department of MCA

      1. College of Engineering Mandya, India

        Abstract There are too many aids system developed for visually impaired people which are mostly built for the single purpose. It may be navigation, Object detection and distance perceiving and also most of the deployed aid system use Indoor navigation which requires a pre-knowledge of the environment. In this technology the main aim is to detect the objects through prototype and navigate the impaired people to reach their destiny and to perform their day to day task. In this paper the developed prototype requests the navigation feature to get the input from the user about the target destination. The methodology we used here is CNN (convolution neural network). In CNN algorithm, mainly used for image processing. Calculate the virtual weights of the obtained objects.

        KeywordsObjectDetection,Navigation,CNN(convolutional nueral network),YOLO, image.


          Computer vision(CV) is the branch of the computer technological information which is resembling the human vision. Computer vision plays a vital role in recent days, so by using a computer vision we have implemented the object detection, distance perceiving and navigation for the visually impaired people.

          This prototype helps to perform day to day tasks for the visually impaired people. The system captures the objects through live streaming with the help the package open CV. Once the image get captured the system tells with the speech synthesizer that particular object has been found and request the navigation feature to get the input from the user to reach the target destination.

          The technological implementation of CV machine is higher than natural in many strategies like a virtual dig cam can come across moves quicker than the human eye. Due to this cause, CV has additionally determined its deployment in the region of healthcare for visually impaired people. Navigation systems are one form of beneficial resource systems which assist someone navigate in an unexpected environment effortlessly without being misplaced or getting damage.

          The object detected through open CV are classified into five cross five multiplication and get filtered into three cross three matrix and compare it to the virtual weights which is present in the Yolo and navigates the user to reach the desired destination.


          Luis A. Guerrero [1] et al presented a paper that shows the navigation only within the indoor system. Whereas floor planning of the indoor must be provided before. It shows only the direction of the particular indoor property. Only if the floor plan provided. Later on the Application guides the blind person to reach the destiny.

          Dr. M.N Veena

          Professor, Department of MCA

          P.E.S College of Engineering Mandya, India

          Drawback: – it is difficult to execute if there is no floor plan. Hsueh-Cheng Wang [2] et al presented a paper describes that the blind peoples will be provided with the wearable device for the hand. It navigates the blinds without detecting the objects. With the help of wearable device data will be read. Based on the input the process takes place.

          Jinqiang Bai [3] et al proposed a work when the blind people are in the road. There is cap provided for the blinds to capture the images. The cap consists of sonar sensor. With the help of sonar sensor blinds get movement guidance. It navigates without detecting the objects. It only alerts when the is an obstacles are sensed but it doesnt tell what exactly the obstacle is.

          Mekhalfi, M. L [4] et al presented a paper where they briefly discussed about that the indoor operations can be performed with the help of the portable camera. The major drawback of the prototype was scalability to adapt for objects and high processing power which means the system bulky had high power consuming.

          Andreas Hub [5] et al proposed a method where it makes use of enhanced processing power of mobile devices to allow real-time processing of 3D models and local sensors of orientation and object identification. This makes use of the local sensors of orientation and object identification. The flaw of the prototype was in accurate depth measurement which causes fatal results if deployed in the real world.

          Yelamarthi, K., & Laubhan, K. [6] et al describes that it is proposed RFID and GPS integrated navigation system for visually impaired which required a pre-developed infrastructure of RFID tags. It also uses sensor such as ultrasonic and infrared to identify the objects which works in very close and it will require user to be physically near the object in order to detect it which makes it infeasible in the real-life navigation systems as the visually impaired person can dash into the object before detecting it.

          Bourbakis, N, & Kavraki,D[7]et al describes

          2D vibration array for detecting dynamic 3D space changes d uring navigation is the Tyflos prototype unit. It is based on da ta and image data from the fusion range collected by the cam era and the 3D representation of the surrounding region.

          A sense of space and its changes to the consumer is given by the degree of Vibration.

          Kunhoth, J, Karkar, Al-Maadeed, S, & Al-Attiyah, A. [8] et alin this article it examines the performance and usability of t wo BLEbased systems for computer vision. The first is a CM Nav system that uses a deeply trained study model to identify locations and the second system, called QRNav, which uses visual markers (QR codes) to identify locations. During the th ree navigation systems a field study of ten blindfolded users was performed.

          Gupta, S., Arbelaez, P., Girshick, R., & Malik, J. [9] et al article suggest a standardized approach for long-range amodal completion and reliability of the surface compounding. We demonstrate that our system can mark each contour with its form, (depth, regular or reflectivity).

          Idrees, A., Iqbal, Z., & Ishfaq, M [10] et al describes allows navigation of predefined blind routes. Blind browsing is an accessibility tool which helps the blind to use an android smartphone in an simple way with audio instructio ns for indoor navigation. At a defined distance, QR codes are mounted on floor parts that serve as a point input to detect an d navigate currently. The device gets details about the current location while a QR code is scanned.

          R. Kapoor, M. A. Bharathi and M. Sushama More [11] et al screen exposure even at an early age has raised visual concer ns. To visually imparted people as well as tonumerous other u ses, the identification and recognition of text from natural ima ges is very useful. The proposed work uses a deep neural etw ork to introduce an easier and quicker text detection and reco gnition sysem compared to traditional handcrafted feature bas ed procedures.


          Blindness makes life rather difficult for people who suffer from this health problem, but the use of technology can help in some day-to-day tasks.

          The present work focus on photo to speech application for visually impaired .The project is called as camera reading for blind people and the ultimate purpose of this project is to read the text and get through the speech with the help of speech synthesizer TTS (text-to-speech).

          Going near the object and fetching them definitely a tough job and it requires a particular identity enjoy of devices.

          So we are developing a prototype to assist the visually impaired humans to choose out the gadgets and compute the gap and path on that foundatio and manual them.

          Fig :1 Block Diagram

          In above figure it represents entire system flowchart. This shows each and every process of the system in a step-by-step procedure. The first step tells that once user start this prototype is automatically captures the images through the live streaming by using the packets open CV. If there is no valid data it will automatically stops. The process repeats once the image gets captured. It extracts the image and the object gets detected later on processing takes place and

          compares the object weight configuration file to the virtual weight and it will check the input . Speech synthesizer request for the input from the user side .If the given input is in the frame the speech engine will automatically navigate the user to get through the object which has been found by the prototype.

          The evaluation is done by two processes, a) Data collection

          b) CNN algorithm.

          1. Data collection

            The data sets which are present is custom data set. They are built in custom which consists of virtual weights of some data which are designed .once the object is detected the configuration file of the particular object should match the virtual weights of the custom data sets.

          2. Working of CNN algorithm

            YOLO (you only look once) is the major component of the algorithm convolution neural network. It is also called as real time object detection algorithm. once the image get captured through live stream it will get converted into convolution layers and get filtered in the form of matrix multiplication. Soft max and Relu are the two networks which are used in this object classification technique.

            Fig 2: Working of CNN

            Here some of convolution layer are hidden and it consists of many process .

            It mainly divided into 2 types

            1. Feature Learning

            2. Classification

          Once the image gets captured through live stream the it will get converted into layers and matrix multiplication takes place then 5 cross multiplication takes place and it will do max pooling till it reaches the desired weight of the object and it will be taken to the next process the neural network Relu activation takes place to get the fully connected neural network and then it provides the desired output.

          Fig 3: Pooling

          Converting image into layers through Relu Networks .It is 5 cross matrix getting converted into max pool .It is responsible for transforming the summed weighted input from the node into the activation of the node or output for that input.

          Fig 4: Layers obtained

          filtered the obtained image with the help of softmax network. Softmax calculates a probability for all the positive labels but it does only for the negative labels and it also extends the idea into the classworld. It assigns the decimal probabilities to cn the multi-class problem and those probabilities must be addded up to 1.0. This constraint helps training converge more quickly .


          Fig 6: Live streaming and video capture

          In the above figure it describes the live streaming video capturing with the help of the open CV in this page the image gets captured and the object detection takes place and move on to further steps.

          Fig 5: Object Detection

          In the above figure it shows the object detected in the live streaming it shows the frame and request the input from the user that what object must be found .it detects the each and every objects present in front of the camera.

          Fig 7: Navigation

          In the above figure it shows the object which has been captured and calculated in the form of matrix multiplication through Relu and Softmax networks and once the user commands the input the speech synthesizer automatically navigates the user to fetch the object.

          Failure Cases:- If there is no image found in front of the camera it will be difficult to detect the object so the query is written to overcome the problem so that the engine says that no object is found and request the user to ask the input.

          While the user giving the input if the voice is not clear then the alert message will be sent to the user to give the input in clear audible voice.

        5. CONCLUSION

The system proposed here is a novel method for obstacle detection and identification. It can be easily commercialized and be made to benefit the visually impaired community. By using this prototype the blind humans can perform day to day task very easily .Unlike other existing models, it does not require a large database because of the pre-trained Cognitive Neural Network model.


  1. Guerrero, L. A., Vasquez, F., & Ochoa, S. F. (2012). An indoor navigation system for the visually impaired. Sensors, 12(6), 8236- 8258.

  2. Wang, H. C., Katzschmann, R. K., Teng, S., Araki, B., Giarré, L., & Rus, D. (2017, May). Enabling independent navigation for visually impaired people through a wearable vision-based feedback system. In 2017 IEEE international conference on robotics and automation (ICRA) (pp. 6533-6540). IEEE.

  3. Bai, J., Lian, S., Liu, Z., Wang, K., & Liu, D. (2018). Virtual- blind-road following-based wearable navigation device for blind people. IEEE Transactions on Consumer Electronics, 64(1), 136- 143.

  4. Mekhalfi, M. L., Melgani, F., Zeggada, A., De Natale, F. G., Salem, M. A. M., & Khamis, A. (2016). Recovering the sight to blind people in indoor environments with smart technologies. Expert systems with applications, 46, 129-138.

  5. Hub, A., Diepstraten, J., & Ertl, T. (2003). Design and development of an indoor navigation and object identification system for the blind. ACM Sigaccess Accessibility and Computing, (77-78), 147-152.

  6. Yelamarthi, K., & Laubhan, K. (2015). Space perception and navigation assistance for the visually impaired using depth sensor and haptic feedback. Int. J. Eng. Res. Innov, 7(1), 56-62.

  7. Bourbakis, N., & Kavraki, D. (2005, October). A 2D vibration array for sensing dynamic changes and 3D space for blinds' navigation. In Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'05) (pp. 222-226). IEEE.

  8. Kunhoth, J., Karkar, A., Al-Maadeed, S., & Al-Attiyah, A. (2019).Comparative analysis of computer-vision and BLE technology based indoor navigation systems for people with visual impairments. International Journal of Health Geographics, 18(1), 29.

  9. Gupta, S., Arbeláez, P., Girshick, R., & Malik, J. (2015). Indoor scene understanding with rgb-d images: Bottom-up segmentation, object detection and semantic segmentation. International Journal of Computer Vision, 112(2), 133-149.112(2), 133-149.

  10. Idrees, A., Iqbal, Z., & Ishfaq, M. (2015, June). An efficient indoor navigation technique to find optimal route for blinds using QR codes. In 2015 IEEE 10th Conference on Industrial Electronics and Applications (ICIEA) (pp. 690-695). IEEE.

  11. R. Kapoor, M. A. Bharathi and M. Sushama, "Deep convolutional Neural Network in Smart Assistant for Blinds," TENCON 2019 – 2019 IEEE Region 10 Conference (TENCON), Kochi, India, 2019, pp. 1697-1701, doi: 10.1109/TENCON.2019.8929695.

Leave a Reply