Portable Camera based Text Label Reading

DOI : 10.17577/IJERTV5IS020647

Download Full-Text PDF Cite this Publication

Text Only Version

Portable Camera based Text Label Reading

Ms. N. Naga Swathi Assistant Professor,

Electronics & Communication Department, Bapatla Engineering College,

Bapatla, India

Ms. U. Swathi Student,

Electronics & Communication Department, Bapatla Engineering College,

Bapatla, India

Mr. S. Surendra Student,

Electronics & Communication Department, Bapatla Engineering College,

Bapatla, India

Mr. M. Kowshik Student,

Electronics & Communication Department, Bapatla Engineering College,

Bapatla, India

Abstract-: Todays technology is increasing rapidly, but for the visually impaired people they could not lead their life confidently without depending on others. So we proposed this project for them to identify the text on the product which they hold in their hand. This project mainly consists of three steps first is to take the image of the object and second thing is to extract the required text from the image by character recognition algorithm and finally the extracted text is processed to audio speaker which is audible to the user. This helps to read out the text present on the product package.

Keywords-Optical Character Recognition;Rgb24;Ccd;


    Reading is obviously essential in today's society. Printing is everywhere in the form of reports, receipts, bank statements, restaurant menus, classroom, product packaging, instructions on medicine bottles, etc. And while optical devices, video magnifiers and screen readers can help blind and visually impaired to access the documents, there are few devices that can provide easy access to the handrail objects such as packaging print and text objects, such as bottles of prescription drugs. The ability of the blind people and a significant visual impairment to read the printed labels and product packaging will enhance the autonomy and encourage economic and social self sufficiency so here we will propose a system that is useful for the blind.


    There are systems that have some promise for mobile use, but they cannot handle the product labelling. For example, portable bar code readers designed to help blind people identify the different products in a vast product database may allow users who are blind access to information about these products by voice and Braille. But a major limitation is that it is very difficult for blind users to find the bar code position and properly target the barcode bar code reader. Some reading assistance systems such as pen scanners could be used in these and other situations. OCR software such systems are integrated to ensure the scan function and the recognition of text and other integrated voice output. However, these systems are generally designed for better performance and document images with simple backgrounds, the standard fonts, a small range of font sizes and characters well organized

    instead of commercial products boxes with several decorative models. Most prior art OCR software can not directly lead images of complex media scenes.

    A number of portable reading assistants have been designed specifically for the visually impaired "K-Mobile Reader" is executed on a mobile phone and allows the user to read mail, receipts, brochures and other documents. However, the document reading almost flat, placed on a light and dark area and mostly text. In addition, "K-Mobile Reader," Black says bed on a white background, but has trouble recognizing the text colour or text on a colour background. You cannot read the text from complex media. In addition, these systems require a blind user to manually locate the areas of the regions of interest and the text objects in most cases. Despite a participation rate of reading, they have been specially designed for the visually impaired, to our knowledge, no existing reading assistant can read the text of the types of difficult models and environments found in many commercial products of all days. Such as text information may appear on different scales, fonts, colours and directions.


    Proposed systems to overcome the problems identified in the problem definition and also to help the blind to read the text on objects with hard environments found in many commercial products in everyday objects manually operated system, we designed a framework of basic text cameras using reading to track the object of interest in the camera view and extract text information of the copy of the object. Proposed algorithm is used in this system can efficiently manage multiple, complex background patterns, and extract information from texts of the two portable objects near and signalling. To overcome the problem with reading technical support systems for the blind in the very difficult existing system for users to place the object of interest in the camera's vision centre. As of now, there are still acceptable solutions for the problem addressed in stages. The hand held object must be displayed in the camera view this project has used a camera with sufficiently wide angle to accommodate users who approximate target. Often this can lead to other text objects that appear to the eye of the camera (for example, when shopping in a supermarket). To delete the image from the

    camera portable object, the system will develop a popular movement method for obtaining a region of interest (ROI) of the object. It is a difficult problem to automatically locate objects and text regions of interest in images captured with complex media, such as text in images captured probably surrounded by several outliers background "noise" and text characters appear usually at multiple scales, fonts and colours. The guidance text, in this project assumes that the strings in scenes images of maintaining a substantially horizontal alignment. Many algorithms have been developed to locate the text regions in the images of the scene. We can divide them into two categories: Learning rule based in the solution of the task at hand, to extract information from image with text to complex environments with multiple text patterns and variables here to propose a localization algorithm that combines distribution rules on the basis of text analysis and the formation of text-based classifier learning, defining new function cards based on time peak orientations and distributions. These, in turn, generated text features and discriminant representative for distinguishing characters of text aberrant background.


    Within the system, it is composed of three functional components:

    • Capture Scene

    • Data processing

    • Audio output

    The image is captured by using portable camera. The live video is measured using a webcam and can make use of the Open CV libraries. The aspect ratio of the webcam is the RGB24 format. The video image is separated and subjected to a pre-treatment.

    The data processing component used to implement the algorithms proposed, including the following processes

    • Detection of Object region of interest carefully removes the image of the object held by the blind user of the cluttered background or other neutral objects in the camera view.

    Text on text image areas and text recognition to convert the text information based on images of readable codes.

    The audio output component is to inform the blind user code recognized text in the form of words or audio. A mini Bluetooth headset with microphone or headset is used for voice output.

    The steps which are followed in the proposed system arethe following

    1. Image acquisition:

      The images of the objects of interest are collected. Here a generally available low cost camera is used.

    2. Gray scale conversion:

      The system must be robust to noise conditions. For reducing noise the input is converted into Gray scale. Its purpose is to enhance and extract the useful information from input for further processing.

    3. Edge detection:

      Edge detection is a mathematical method which is useful for identifying point in an image at which image brightness changes. This process involves smoothing, finding gradients, suppression and tracking.

    4. Thresholding:

      This method is used for image segmentation. By using thresholding a binary image can be created. Variances of two groups of pixels are separated by using threshold operator. It assumes a binomial distribution of gray scale image.

    5. Automatic text extraction:

      Then an automatic text extraction algorithm is used. In this, all the complex background will be eliminated according to the intensity of the image.

    6. Optical character recognition:

      Text recognition is performed by off-the-shelf OCR prior to output of informative words from the localized text regions. A text region labels the minimum rectangular area for the accommodation of characters inside it, so the border of the text region contacts the edge boundary of the text character. However OCR generates better performance if text regions are first assigned proper margin areas and binarized to segment text characters from background. We propose to use Template matching algorithm for OCR. The output of the OCR is nothing but a text file containing the product label (its name) in textual form. Audio output component is to inform the blind user of recognize text code in the form of speech or Audio.

    7. Product identification:

      The text in output text file from OCR is matched with the Saved product names in the Database the matched product is identified.

    8. Final audio output containing text information:

      There are various audio files saved in the database, one for each product. Each audio file contains the complete information of specified product.

      Fig.1.Image of project


    1. Hardware system micro controller:

In this section, the unity of the entire control project is formed. This section is essentially a microcontroller with associated capacitors crystal circuits, reset circuits, pull up resistors (if necessary) and so on. The microcontroller is the heart of the project, because it controls the devices are interconnected and communicate with the devices according to the program you write.

Power supply

Power supply







D. Speaker:

The speaker driver is a single transducer that converts electrical energy to sound waves, usually as part of a speaker, a television device or other electronic devices. Sometimes the transducer itself is known as a speaker, especially when one is alone mounted in a housing of the device or surface mount (like a wall speaker, car audio, and so on). There are many different types of speakers. The most common are the woofer, midrange and treble and subwoofers that are becoming very common. Less frequent kinds of speakers are tweeters and woofers super rotation.


Voice module

Voice module

In this paper, we proposed a prototype system with print text objects held in hand operated to help blind people. In order to solve common problem for blind users, we proposed a technique to detect the object of interest, while the blind user can simply place the object in front of the camera for a few seconds. From this process we can actually distinguish the object of interest background or other objects in the camera view and extract the text regions from the image and that text will be converted into voice.









Fig.2.Block diagram

  1. Liquid crystal display (LCD):

    It is a flat screen, electronic visual display that uses the light modulating properties of liquid crystals. Liquid crystals do not emit light directly. LCD screens are available to display arbitrary images or still images that can be displayed or hidden, as words, predefined numbers and 7-segment displays as a digital clock. They use the same basic technology, except that arbitrary images are composed of a large number of small pixels, while others have screens of larger items.

  2. USB camera:

USB cameras imaging cameras via USB 2.0 or USB

3.0 image data transfer technology. USB cameras are designed to easily interact with the dedicated USB computer systems using the same technology found in most computers. The accessibility of USB technology in computer systems and transfer speed of USB 480 MB / 2.0, it is ideal for many USB cameras imaging applications. A selection of more of USB 3.0 cameras are also available with data transfer rates up to 5Gb/s. Edmund Optics offers a variety of USB cameras adapted to meet many imaging needs. EO USB cameras are available in both CMOS and types of CCDs which enables them across a wide range of applications. USB cameras contain features of the box for quick installation. USB cameras using the USB low power, such as a laptop, may require external power.


Our future work is to expand our location algorithm to process strings with characters unless three more robust design templates for the text block feature extraction. We will also extend our algorithm to handling of non- horizontal text string. In addition, we will solve the problems associated with important human interface with the text reading for blind users


  1. Text-Image Separation in Document Images Using Boundary/Perimeter Detection ACEEE Int. J. on Signal &Image Processing, Vol.03, No.01, Jan 2012.

  2. N.Nikolaou and N.Papamarkos, Colour reduction for complex document images, Int. J. Imaging Syst. Technol., vol. 19

  3. Feature Extraction & Image Processing by Mark Nixon &Alberto Aguado , 2nd edition

  4. International Workshop on Camera-Based Document Analysis and Recognition.

  5. L. Ma, C. Wang, and B. Xiao, Text detection in natural images based on multi-scale edge detection and classification, in Proc. Int. Image Signal Process., 2010, vol.4, pp. 19611965.

  6. C.Yi and Y. Tian, Assistive text reading from complex background for blind persons, in Proc. Int. Workshop Camera-Based Document Anal. Recognition, 2011.

Leave a Reply