Sign Language Recognition System

DOI : 10.17577/IJERTCONV2IS04054

Download Full-Text PDF Cite this Publication

Text Only Version

Sign Language Recognition System

Sign Language Recognition System

Mayuresh Keni1, Shireen Meher2, Aniket Marathe3, Prof. Amruta Chintawar4, Prof. Shweta Ashtekar5 Department of Electronics Engineering

Ramrao Adik Institute of Technology Dr. D.Y. Patil Vidyanagar

Nerul, Navi Mumbai-400706,2, 3,



    Dumb people are usually deprived of normal communication with other people in the society. Also normal people find it difficult to understand and communicate with them. These people have to rely on an interpreter or on some sort of visual communication. An interpreter wont be always available and visual communication is mostly difficult to understand. Sign Language is the primary means of communication in the deaf and dumb community. As a normal person is unaware of the grammar or meaning of various gestures that are part of a sign language, it is primarily limited to their families and/or deaf and dumb community. At this age of technology, it is quintessential to make these people feel part of the society by helping them communicate smoothly. Hence, an intelligent computer system is required to be developed and be taught. Researchers have been attacking the problem for quite some time now and the results are showing some promise. Interesting technologies are being developed for speech recognition but no real commercial product for sign recognition is actually there in the current market.


    The researches done in this field are mostly done using a glove based system. In the glove based system, sensors such as potentiometer, accelerometer etc. is attached to each of the finger. Based on their readings the corresponding alphabet is displayed.

    Christopher Lee and Yangsheng Xu developed

    a glove-based gesture recognition system that was able to recognize 14 of the letters from the hand alphabet, learn new gestures and able to update the model of each gesture in the system in online mode. Over the years advanced glove devices have been designed such as the Sayre Glove, Dexterous Hand Master and Power Glove. The main problem faced by this gloved based system is that it has to be recalibrate every time whenever a new user uses this system. Also the connecting wires restrict the freedom of movement. This system was also implemented by using Image Processing. In this way of implementation the sign language recognition part was done by Image Processing instead of using Gloves. But the only problem this system had was the background was compulsorily to be black otherwise this system would not work. Also some of the systems required color bands which were meant to be wore on the finger-tips so that the fingerstips are identified by the Image Processing unit. We are implementing our project by using Image Processing. The main advantage of our project is that it is not restricted to be used with black background. It can use with any background. Also wearing of color bands is not required in our system.


    In this paper we would present a robust and efficient method of sign language detection. Instead of using Datagloves for sign language detection, we would be doing the detection by image processing. The main advantage of using image processing over Datagloves is that the system is not required to be

    re-calibrated if a new user is using the system. Also by filtering skin pixels, the image is converted to Binary. This system can be used in any background and is not restricted to be used with Black or White Background.


    Step 1: Take picture of the hand to be tested using an IR webcam.

    Step 2: Then we filter the skin pixels from the image by converting its pixel values into hex values.

    Step 3: To convert it into binary image, the hex values are given white colour whereas the rest of the image is assigned black colour.

    Step 4: From the binary image, we generate the coordinates of the image.

    Step 5: These coordinates are then compared with stored co-ordinates in the database for the purpose of output generation using pattern matching technique.

    Step 6: We need to use a pattern matching algorithm for this purpose.(The algorithm is given below).

    Step 7: If the pattern is matched, the alphabet corresponding to the image is displayed.


    The algorithm section shows the overall architecture and idea of the system. The image is captured using a webcam which is mounted on the shoulders of the speech and hearing impaired person. The image thus captured is sent to the computer which does processing on it explained below and display the corresponding text. The image captures is in RGB form. This image is first converted into Grayscale which is then converted into binary form. The X and Y coordinates of the image are calculated from the Binary form of the image. These coordinates are then compared with the coordinates of the images existing in the database. A database of images is made previously by taking images of the gestures of the sign language. A corresponding Text is assign to the gestures. When the coordinates of the captured image match then the corresponding text is displayed.

    1. Camera Orientation, Interfacing and Image capturing

      The most important part of the project is the orientation of the camera. If the orientation of it is not done properly then this may lead to misinterpretation and the output will be wrong. Hence orientation of

      the camera should be done carefully. The camera is placed on the shoulders of the Speech and Hearing impaired (i.e. Dumb and Deaf) person. The camera will placed in such a way that it would be facing in the same direction as the users view.

    2. Image Processing

      The image capturing section handles just capturing the image and sending it to the image processing section which does the processing part of the project. The gesture captured through the webcam has to be properly processed so that it is ready to go through pattern matching algorithm. The image processing is done in the following ways.

      1. RGB Color Recognition:

        Red, Green and Blue are the primary colors. Using these three colors all the other colors are made. The gesture or image captured through webcam is in the color or RGB form. This image cannot be directly used for comparison as the algorithm to compare two RGB images would be very difficult. Also we have to remove all the background from the captured image. For this we will be filtering (recognizing) the skin pixels and converting their RGB values to hex values.

      2. Binary Conversion:

      Binary image is an image which consists of just two colors i.e White and Black or we can say just two Grey levels. It is important to convert the image into binary so that comparison of two images i.e the captured image and the images present in the database will be easy. The skin pixels are filtered by assigning them white colour whereas the background is made black.

    3. Pattern Matching and Text Conversion

      In this section the input image which is converted into binary form is compared with the images present in the database. The binary images consist of just two gray levels and hence two images i.e. the captured image and the image present in the data base can be compared easily. Images in the database are also binary images. We use comparison algorithm to compare captured image with all images in database. Also, a single gesture is captued from more than 2 angles so that the accuracy of the system can be increase. Pixels of captured image are compared with pixels of images in database, if 90 percent of the pixel values are matched then we display the text on LCD, else image is compared with next image in the database. This Process keeps on going till match is found. If no match is found then that image is

      discarded and next image is considered for pattern matching.

    4. Pattern Matching Algorithm

      In this method, input image is converted to binary image i.e. back and white image. Each image has its pixel values. Since the image is black and white, Pixel values will be 0(black) or 255(white). Images in the database are also binary images. We use comparison algorithm to compare captured image with all images in database. These images are compared by EX-NORING the pixels of captured image with pixels of images in database as shown in table 1. If 90 percent or more pixel values are matched i.e. output is 1 for more than 90 percent, then we display the text on LCD, else image is compared with next image in the database. This Process keeps on going till match is found. If no match is found then that image is discarded and next image is considered for pattern matching.

      Table 1:

      The first step in any recognition system is collection of relevant data. In this case the raw image information will have to be processed to differentiate the skin of the hand (and various markers) from the background. Once the data has been collected it is then possible to use prior information about the hand (for example, the fingers are always separated from the wrist by the palm) to refine the data and remove as much noise as possible. This step is important because as the number of gestures to be distinguished increases the data collected has to be more and more accurate and noise free in order to permit recognition. The next step will be to take the refined data and determine what gesture it represents.

      Any recognition system will have to simplify the data to allow calculation in a reasonable amount of time. Obvious ways to simplify the data include translating, rotating and scaling the hand so that it is always presented with the same position, orientation and effective hand-camera distance to the recognition system.

      Captured Image

      Database Image














    5. Database

    It is required to make a proper database of the gestures of the sign language so that the images captured while communicating using this system can be compared. For making the database, we would be capturing each gesture from more than 2 angles so that the accuracy of the system will be increase significantly. The more angles you take, the better is the accuracy and the more amount of memory is required. If the users hand alignment is different from the one stored in the database for the same gesture, then it would create an error in the system. This error is removed by taking pictures of same gesture from more than 2 angles.


    In order to detect hand gestures, data about the hand will have to be collected. A decision has to be made as to the nature and source of the data. Two possible technologies to provide this information are:

    -A glove with sensors attached that measure the position of the finger joints.

    -An optical method.

    An optical method has been chosen, since this is more practical (many modern computers come with a camera attached), cost effective and has no moving parts, so is less likely to be damaged through use.


    Real Time Functioning.

    The output of the sign language will be displayed in the text form in real time. This makes the system more efficient and hence communication of the hearing and speech impaired people more easy. The images captured through web cam are compared and the result of comparison is displayed at the same time. Thus this feature of the system makes communication very simple and delay free.


    When this entire project is implemented on Raspberry Pie computer, which is very small yet powerful computer, the entire system becomes portable and can be taken anywhere. This feature facilitates the user to take the system anywhere and everywhere and overcomes the barrier of restricting him/herself to communicate without a desktop or laptop.

    No need of calibration.

    In sign language recognition using sensors attached to the hands, the system needs to be calibrated every time the user is changed according to the hand of the user. But this is not the case when we implement the system using Image Processing. The output depends on the angles on the fingers and the wrist rather than size of hand.

    Does not get damage through use.

    As no special sensors are used in this system, the system is less likely to get damaged.


In future work, proposed system can be developed and implemented using Raspberry Pi. Image Processing part should be improved so that System would be able to communicate in both directions should be capable of converting normal language to sign language and vice versa. We will try to recognize signs which include motion. Moreover we will focus on converting the sequence of gestures into text i.e. word and sentences and then converting it into the speech which can be heard.


Our project aims to make communication simpler between deaf and dumb people by introducing Computer in communication path so that sign language can be automatically captured, recognized, translated to text and displayed it on LCD. There are various methods for sign language conversion.

Some of them use wired electronic glove and others use visual based approach. Electronic gloves are costly and one person cannot use the glove of other person. In vision based approach, different techniques are used to recognize and match the captured gestures with gestures in database. Converting RGB image to binary and matching it with database using a comparing algorithm is simple, efficient and robust technique. This technique is sufficiently accurate to convert sign language into text.


  1. Ms. Rashmi D. Kyatanavar, Prof. P. R. Futane, Comparative Study of Sign Language Recognition Systems, International Journal of Scientific and Research Publications, Volume 2,

    Issue 6, June 2012 1 ISSN 2250-3153

  2. Ravikiran J, Kavi Mahesh, Suhas Mahishi, Dheeraj R, Sudheender S, Nitin V Pujari, Finger Detection for Sign Language Recognition, Proceedings of the International MultiConference of Engineers and Computer Scientists 2009 Vol I IMECS 2009, March 18 – 20, 2009, Hong Kong.

  3. Rafael C. Gonzalez, Richard E. Woods.Digital Image Processing. Pearson (2008).

Leave a Reply