Hand Gesture to Speech Translation for Assisting Deaf and Dumb

DOI : 10.17577/IJERTV6IS050572

Download Full-Text PDF Cite this Publication

  • Open Access
  • Total Downloads : 146
  • Authors : Rajatha Prabhu, Harshitha B, Madhushree B, Dr. Nataraj . K. R
  • Paper ID : IJERTV6IS050572
  • Volume & Issue : Volume 06, Issue 05 (May 2017)
  • DOI : http://dx.doi.org/10.17577/IJERTV6IS050572
  • Published (First Online): 30-05-2017
  • ISSN (Online) : 2278-0181
  • Publisher Name : IJERT
  • License: Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License

Text Only Version

Hand Gesture to Speech Translation for Assisting Deaf and Dumb

Rajatha Prabhu

Department of ECE

Harshitha B

Department of ECE

Madhushree B

Department of ECE

Dr. K. R. Nataraj

Guide and HOD of ECE

SJBIT

SJBIT

SJBIT

SJBIT

Bangalore, India

Bangalore, India

Bangalore, India

Bangalore, India

Abstract-Communication is the integral part of life. About 360 million people in the world are suffering from hearing impairment and 32 million of these are children, and their life is not as easy as it is for human without barrier. This project presents the Sign Language Recognition systemcapa ble of1recognizing hand gestures by using MATLAB. The proposed technique has 4 modules such as: pre- processing, segmentation , feature extraction, gesture recognition and gesture to voice conversion. Different features are extracted such as Eigenvalues and Eigen vector s which are used in recognition . The Principle Component Analysis (PCA)algorithm is applied forgesture recognition a nd recognized gesture is convertedintotext and voice format. The proposed techniquehelps to minimize communication b arrier between deafmute and normal people.

  1. INTRODUCTION

    Sign language is a language which mainly uses manual communication to convey meaning, as opposed t oacoustically conveyed sound patterns. This can involv esimultaneously combining shapes of1hands, orientation a ndmovement of1the hands,arms or body & facial express ionto express a speaker's thoughts. In order to facili tatecommunication between hearing impaired & hea ringpeople, sign language interpreters are usually used.

    Suchactivities involve considerable effort on the part o f1theinterpreter, as sign languages are distinct natural lan guagewith their own syntax, varied from any spoken lan guage.

    MATLAB (matrix laboratory) is a multi- paradigm numerical computing environment and fourth- generation programming language. A proprietary program ming language developed by MathWorks, MATLA Ballows matrix manipulations, plotting of2functions da ta,implementation of2algorithms, creation of2graphical userinterfaces (GUI), and interfacing with programs writt en inother languages. Using MATLAB we can process v ideo with functions and system objects that read and write video files, perform feature extraction, motion esti mation and object tracking and display video.

  2. RELATED WORKS

    Nasser H.D et.al [1] considers this approach where the key features extracted are SIFT (Scale Invariant Feature Transform) key-points. They further constructed a grammar from a sequence of hand postures for detecting dynamic gestures.

    In [2] a basis for usage of9Hidden Markov Models (HMM) is established by drawing an analogous relationship between speech recognition and gesture recognition. HMMs can be used to model time series data, and here the movement of the hand along the coordinate axis is tracked and each direction is taken as a state. This paper makes use of a lexicon of forty gestures and achieves an accuracy of295 percent. It also states its disadvantage that as the lexicon grows the nee dto describe the hand configuration along with hand trajectory also will grow making the designing of2HMM more complex and time consuming. We needed a method to describe dynamic gesture in a simpler way.

    In [3] the system uses an intrinsic mobile camer a for gesture recognition and acquisition; gesture acquire d is processed with the help of Algorithms like HSV model (Skin Color Detection), LargeBlob Detection, Floo d Fill and Contour Extraction. The system is able to recognize one handed sign representation of the standard alphabets (A-Z) & numeric values(0- 9). The output of9this system is very efficient, consistent and of9high approximation of9gesture processing and speech analysis.

    The paper [4] focuses on vision based hand gesture recognition system by proposing a scheme using a databasedriven hand gesture recognition based upon skin color model approach and thresholding approach along with an effective template matching using PCA. Initially, hand region is segmented by applying skin color model inYCbCr color space. In the next stageotsut hresholding is applied to separate fore ground and background. Finally,template based matching technique is developed using Principal Component Analysis (PCA) for recognition.

    In [5] Human computer interaction (HCI) & sign language recognition (SLR), aimed at creating a virtual reality, 3D gaming environment, helping the deaf- mute people etc., extensively exploit the use of9hand gestures. Segmentation of9the hand part from the other body parts and background is the primary need of9any hand gesture based application system; but gesture recognition systems are usually plagued by different segmentation problems, and by the ones like coarticulation, recognition of9 similar gestures.

    The primary aim of the work [6] is to design & implement a low cost wired interactive glove, interfaced with a computer running MATLAB or Octave,

    with a high degree of9accuracy for gesture recognition. The glove maps the orientation of1the hand and fingers with the help of1bend sensors, Hall Effect sensors and anaccelerometer. The data is then transmitted to comput er using automatic repeat request as an error controlling scheme.

    The algorithm devised in [7] is capable of extracting signs from video sequences under minimally cluttered & dynamic background using skin color segmentation. It distinguishes between static and dynamic gestures & extracts the suitable feature vector which are classified using Support Vector Machines (SVM). Speech ecognition is built upon standard module -Sphinx.

    [8] This paper presents the Sign Language Reco gnition system capable of recognizing 26 gestures from t heIndian Sign Language (ISL) by using MATLAB. The proposed system having 4 modules such as: pre- processing and hand segmentation, feature extraction, sig nrecognition and sign to text and voice conversion. Segmentation is done by using image processing. Different features are extracted such as Eigen values and Eigen vectors which are used in recognition. The Principle Component Analysis (PCA) algorithm was used for gesture recognition & recognized gesture is converted into text and voice format.

    This paper [9] presents an algorithm of5Hand Gesture Recognition by using Dynamic Time Warping methodology. The system consists of5three modules: real time detection of5face region and two hands regions,tra ckthe hands trajectory both in terms of5direction among consecutive frames as well as distance from the centre o fthe frame and gesture recognition based on analyzing variations in the hand locations along with the centre of the face. The proposed technique overcomes not only t helimitations of5a glove based approach but also most of the vision based approach concern illumination condition, background complexity and distance from camera which is up to two meters by using Dynamic Time Warping which finds the optimal alignment between the stored database & query features, improvement in recognition accuracy is observed compared to conventional methods. In [10] a Wireless data glove which is a normal cloth d riving glove fitted with flex sensors is used along the length of5each finger and the thumb. Mute people can use the gloves to perform hand gesture and i twill be converted into speech so that normal people can understand their expression. A sign language usually provides sign for whole words. It can also provide sign for letters to perform words that dont have a correspon ding sign in that sign language. In this paper,Flex Senso r plays the major role, Flex sensors are the sensors who se resistance changes depending on the amount offlexion Here the device recognizes thesignlanguage Alphabets an d Numbers. It is in the processof5developing a prototype to reduce the communicationgap between differentiable and normal people.The program is in embedded C codin

    1. Arduino software is used to observe the working of8t he program in the hardware circuitry which is designed using microcontroller and sensors.

  3. DESIGN AND IMPLEMENTATION

In the project we generate the extract the skin color information from the video frames and use Principle Component Analysis (PCA) with the Euclidean distance as the classifier for the classification by using the Eigen values and vectors of the query image and the database images. Here in the proposed methodology uses MATLAB to process the frames of the query video to detect the gesture and by using indexing the particular hand gesture is recognized and the audio synthesizer would give the audio output for the detected action. Audio is pre-recorded for a particular hand gesture. The figure 1, shows the overview of the proposed system methodology for hand gesture recognition for providing artificial voice for the hearing impaired and mute people.

Figure 1 Proposed Methodology

    1. Resize the images

      The first step in the processing is to convert the query video frames into images and to resize it to a dimension, say, 280X280. Resizing of the image is necessary as the captured image would be of larger size and requires more memory to store the frames and to process it.

    2. Extraction of the skin pixels

      The resized images are then considered to extract the skin pixels from the image. The Grey world algorithm is used before the extraction of the skin pixels is done.

      1. Grey-world algorithm

        Color constancy is a technique for detection of color that does not depend on source of light. The source of light may add color casts in acquired images. To solve this problem a technique is to appraise the color of the predominant light and then, in the next stage, remove it. Once the color of light in individual channels is obtained each color pixel is normalized by a scaling factor.

        One of the most commonly used simple methodology for estimating the color of light is Grey-World. This method provided a good result in practice if the average scene color is grey.

      2. Grey-world assumptions

        The Grey World Assumption is a white balance method that assumes that the scene, on average, is a neutral grey. Grey- world assumption holds good if the scene has good distribution of colors. Assuming this condition, the average reflected color is considered to be the color of the light. Hence, we estimate the illumination color cast by considering

        the arithmetic mean color and compare it to grey. Grey world algorithm provides one with an estimate of illumination by computing the average of each channel of the image. One of the methods of normalization is that the mean of the three components is used as illumination estimate of the image. To normalize the image of a channel i ,the pixel value is scaled by using equation (1),

        the sensitivity to color and are found to be situated close to the middle region.

        The formula for transform RGB to YCbCr are given below, Y = 0.3007R + 0.58593G + 0.11328B

        Cb = 1280.17187R 0.33984G + 0.51171B Cr = 128+0.51171R 0.4296G 0.08203B

        The value range of Cb and Cr for skin colored pixel is given by,

        where avgi the channel mean and avg is the illumination estimate.

        Another method is by normalizing to the maximum channel by scaling by

        Another method is by normalizing to the maximum channel by using the equation(2) given below where mi is calculated as,

      3. RGB to YCbCr conversion

        After the application of grey-world algorithm the RGB images are converted to YCbCr as medical investigation has proven that the human eye has variable sensitivity to brightness and color. Hence the transformation of Red,Green and Blue color to YCbCr color space. The figure 2 shows the YCbCr color space.

        Figure 2 YCbCr color space

        Y signifies the Luminance component, similarly Cb signifies the Chrominance-Blue along with the Cr component signifying Chrominance-Red. Grayscale form is analogous to the Y component of the actual image. Cb value is high in portions of the image having the blue color, the Cb and Cr values are low in portions of image with green and Cr value is high in portions of instances having shades of red. Medical research on the concept of the eye has resulted with the count of rods as 120 million, which are highly sensitive when compared to the cones whose count is 6-7 million. The rods are insensitive to color, whereas the cones provide the eyes

        Cb >= 77 & Cb <= 127 & Cr >= 133 & Cr <= 173

        The given skin color tone information includes a wide range of skin colored pixels.

        Once the skin pixels are identified by using the range specified above, those pixels are marked with white pixels with a intensity of 255. The figure 3 shows the skin pixel extraction obtained.

        1. (b)

          Figure 3 Extraction of the skin pixels in the image

    3. Recognition of the gesture

      The first step in the recognition of the gesture is to specify the path of the database images and the query images obtained from the frames of the video.

      1. Principal Component Analysis (PCA)

        Principal Component Analysis is a statistical method for performing the investigation of the correlations between a set of variables to find the fundamental architecture of those variables. It is also known as factor analysis. It is a analysis with nonparametric components and the output is particular and does not depend on any of the hypothesis information distribution.

        The tasks that PCA can perform is forecasting, redundancy elimination, data compression, feature extraction, etc. Because

        PCA is a classical methodology which works well in the linear hostname, applications having signal processing, image processing,system control theory,communications,etc

        . as linear models are compatible, Principal component analysis decreases the dimensionality of digital image but also keeps the image information and provides a compact features or compact representation of a digital image. The aim of the PCA technique is to transform the gesture images into a set of characteristics feature images called eigengesture. In recognition a query image is casted onto the lower-dimension gesture space traversed by the eigengetsure and then classified either by using a classifier or statistical theorem.

        Figure 4 Example For PCA

        The 2 principal components are defined as follows:

        First principle component the direction which maximizes variability of the data when projected on that axis.

        Second principle component the direction, among those orthogonal to the first, maximizing variability.

        The principle components are eigenvectors of ATA and the eigenvalues are the variances.

      2. Eigen vectors and eigen values

        Eigenvalues and eigenvectors have being used widely in the matrices application in engineering and science. Image processing, vibration analysis, Control theory, electric circuits,

        and quantum mechanics are the areas of1application. Many of1the applications involving the use of1eigenvalue sand eigenvectors is the process of1transforming a given matrix into a diagonal matrix.

        When we obtain a set of data points we can break the set into eigenvectors and eigenvalues. Eigenvectors and values exist in pairs: all of the eigenvectors have a corresponding eigenvalue. An eigenvector is a directio n

        p>such as 45 degrees, vertical, horizontal,etc. An eigenvalu eis a number, which specifies how much variance is the re

        in the data in that direction, the eigenvalue is a numbe rspecifying how the data is distributed on the line. Th e principle component considered is the eigenvector with highest eigenvalue.

      3. Euclidean distance

        The Euclidean distance is the distance between the 2pixels that is a straight-line. The figure 5 illustrates the Euclidean distance metrics.

        Figure 5 Illustration of the Euclidean distance metrics

        Consider the two points P and Q in Euclidean spaces,tw odimensional and P with the coordinates (p1, p2), Q w iththe coordinates (q1, q2). The hypotenuse of7right ang ledtriangle is the line with the endpoints as P and Q. Thedistance between 2 points p & q is given by the squareroot of the sum of the square of7the differences b etween the corresponding coordinates of7the points.

        By Euclidean geometry in 2dimensional space, the Euclidean distance between the 2 points a = (ax, a

        y) & b = (bx, by) is given by equation (3):

      4. Euclidean distance algorithm

        The least distance between a procured of column vectors in the code book matrix and a column vector x is computed by Euclidean distance algorithm. The algorithm computes the least distance to x and finds the column vector in code book that is nearest to x.

        In one dimensional space, the distance between 2 points, x1 and x2, on a line is the absolute value of the difference between the 2 points by equation (4):

        In two dimensional space, the distance between P = (p1, p2) and q = (q1, q2) by equation (5):

      5. Euclidean function

        The input source data is a property class which will be converted within to a raster prior to the application of the Euclidean analysis. The figure 6 shows the illustration of Euclidean function.

        Figure 6 Euclidean Function

        Euclidean distance for Images:

        M X N images is easily analyzed in MN dimensional Euclidean space, called image space. The base e1, e2 ,

        ..,eMN are adopted to make up a coordinate system

        for the image space, where ekN+ l is for an ideal point sourcewith unit intensity at (k, l). Hence image x = (x1

        , x 2 ,.., x MN ) , where xkN+l is the grey level at the

        (k, l)th pixel, is indicated as a point in the image space. An image whose grey levels are zero at all the points is the origin of1the image space.

        The metric coefficients gij i, j =1, 2,..,MN , are given as

        where the pointed brackets is the indication of scalar product, and ij is the angle between ei and ej . Note that, if1 < ei ei > = < ej ej> =, that is all the base vectors have the same length, then gij depends totally on the angle ij. Given the metric coefficients, the Euclidean distance of two images x, y is given by

        where the symmetric matrix G = (gij)MNxMN .

        For images of5fixed size M by N, every MNth order an dpositive definite matrix G induces a euclidean distance.

        Calculating & comparing the Euclidean distance of database images from the test image recognizes the hand gesture.

    4. Audio synthesizer

      MATLAB supports an audio-player using an in-built function audioplayer by creating an audio object. It supports varies input arguments sampling rate, number of bits per sample for floating point signal whose valid values for are 8, 16, and 24 ( by default it is 16) and identifier to specify the selected audio output device which is -1 for the default audio output device.

      In our project we use pre-recorded audio for a particular hand gesture and by using the index of the database image the action is recognized

    5. Implementation

      The figure 7 shows the overview of the project,

      Figure 7 Overview of the project system

      1. Video frames to image conversion

        The step 1 of the implementation is video frames are converted to images by looping the frame2im( ) function, with one input argument the frame and imwrite( ) function with three input argument, the image to be written, the file name and format in which the image has to be written, for number of frame times individually by using two different for loops.

      2. Resize the image

        The step 2 of the implementation is to reduce the size of the images obtained from the step1 to a size of 280X280 in order reduce the memory required to store and process the images. MATLAB function imresize( ) which takes the input arguments as the image and the size in terms of number of rows and columns.

      3. Generation of the skin map

        The step 3 of the implementation is to read the image using imread( ) function and applying the Grey-world algorithm and applying RGB to YCbCr conversion, detecting the pixels that are within the skin tone range and replacing the intensity with white pixels intensity 255.

      4. Hand gesture recognition function

        In Step 4 of the algorithm the hand gesture recognition function is called with the input argument specifying the path of the query image.

        The steps in this function are as follows:

        Reorganise all two dimensional images in the database as training images into one dimensional column matrices. Then put these one dimensional column matrices in a row to build two dimensional matrix. Compute eigenHG, m and A to extract principle component analysis feature

        1. m – (MxN)x1 average of the images in training database.

        2. A – (MxN)xP Matrix of image vectors after each vector getting subtracted from the mean vector m.

        3. eigenHG-(MxN)xP', P' Eigenvectors of Covariance matrix (C) of training database X, where P' is the number of eigen values of C that best represent the feature set.

        For the considered [MxN] matrix, the highest count of eigen values with non-zero value that is the minimum of [M-1,N-1] is the size that its covariance matrix can posses.

        Since the count of pixels of individual image vector is at peak in comparision with count of query images, hence count of1eigen values with nonzero value of5C will be max P-1 where P is the count of5query images.

        Compute eigen values and eigenvectors of L = A'*A , with eigenvectors related to eigenvectors of C linearly.

        Eigenvectors are computed from non-zero eigen values

        of1C, the feature sets represented in such a manner is best. Kaiser's rule is used identify the eigenvector sthat is principle components to be considered.

        If computed eigen value is more compared to 1, then the eigenvector will be taken for creation of the eigenHG.

        eigenHG= A * L_eig_vec

        PCA features are extracted for the query image by computing eigenHG, A, m of query image.

        The comparison between the 2 gestures is done by projecting the gesture images onto gesture space and the Euclidean distance between them is measured.

        Computing and comparison of the Euclidean distance of all projected test from the projected train images helps us to recognize the gesture.

      5. Audio output

The step 5 of the algorithm uses the returned value of the hgrecog(test_img) function and by using the index as classifier the audio of the hand gesture is recognized. By using the function audioread( )with input argument, the pre-

recorded audio to read an audio file along with the format followed by audioplayer( ) function with the input arguments as the audio file read and to create an audio object. The audio object created is passed as input argument to the play( ) function to obtain the audio output.

IV RESULTS AND DISCUSSIONS

The output to be considered here is voice output as

the project aims towards it. In addition to that we also obtain intermediate results. Once the test images obtained by video are compared with the database images, we can see the recognized image of1that particular action.

Light illumination, background colour, distance between hands and camera are some of the facts tobe considered while video capturing. The proposed methodology was designed and tested with three set of actions. The actions considered are fire, yes, when respectively as shown in figure 8.

The input for feature extraction is the preprocessed gesture. Least Euclidean distance is computed between query and train images and gesture is recognized. Voice format output is obtained by the conversion of the recognized gesture.

The intermediate results we obtained will be resized image, image obtained after skin map generation and the images after comparison. We choose to show only the compared images in figure 9,10 and 11.

Once the correct compared images that is recognized and test image as shown in fig is obtained, we can get the corresponding voice output

Figure 8 Actions for fire, yes, when respectively

Figure 9. Output for recognition of yes

Figure 10. Output for recognition of when

Figure 11. Output for recognition of fire

  1. CONCLUSION

    An application executing the hand gesture recognition is Matlab based using Principle Component Analysis method was successfully implemented. The proposed technique gives text and audio output that aids to reduce the communication difference between mute & hearing impaired and normal people. Through this project, we have attempted to provide an artificial voice by recognizing the hand gesture. Action recognition can also be used for human to computer interaction. If we consider unfavorable and robust environments applicability will be more. We also need to deal with co-articulation that is accents of different people in different regions.

  2. FUTURE SCOPE

The future scope of this work can be an apparatus that is developed as an aid for the people with no seeing ability. In this project the obstacles that are present in front of the user are captured using a camera. The user is given the information about the distance between the camera and the user or the presence of any defects in the path by the camera from the computing equipment and even the clear view of the obstacles if present in the path can be given with the help of audio synthesizer. We can also implement any sign language using this project. Further it can be improved to get output for combination of many words.

REFERENCE

  1. Real-time handGesture detection and recognition using bag-of-features and support vector machine techniques. Nasser H. Dardas and Nicolas

    D. Georganas. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2011.

  2. Visual recognition of american sign language using hidden markov models. Thad Eugene Starner. Master's thesis, Massachusetts Institute of Technology, Cambridge MA, 2015.

  3. Hand-Gesture Recognition for Automated Speech Generation. Sunny Patel, Ujjayan Dhar, SurajGangwani, Rohit Lad, Pallavi Ahire. IEEE International Conference On Recent Trends In Electronics Information Communication Technology, May 20-21, 2016, India.

  4. Static Vision Based Hand Gesture Recognition Using Principal Component Analysis. Mandeep Kaur Ahuja & Amardeep Singh. 3rd International IEEE Conference on MOOCs, Innovation and Technology in Education (MITE) 2015.

  5. Hand Gesture Recognition of English Alphabets using Articial Neural Network. Sourav Bhowmick, Sushant Kumar and Anurag Kumar. IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS) 2015.

  6. Smart Glove With Gesture Recognition Ability For The Hearing And Speech Impaired. Tushar Chouhan, Ankit Panse, Anvesh Kumar Voona and S. M. Sameer. IEEE Global Humanitarian Technology Conference

    – South Asia Satellite (GHTC-SAS) September 26-27, 2014.

  7. Sign Language Recognition. Anup Kumar, Karun Thankachan and Mevin M. Dominic. 3rd InCI Conf. on Recent Advances in Information Technology I RAIT-2016.

  8. Real Time Sign Language Recognition using PCA. Shreyashi Narayan Sawant,M. S. Kumbhar. IEEE International Conference on Advanced Communication Control and Computing Technologies (lCACCCT) 2014.

  9. Vision Based Hand Gesture Recognition Using Dynamic Time Warping for Indian Sign Language. Washef Ahmed, Kunal Chanda, Soma Mitra. International Conference on Information Science (ICIS) 2016.

  10. Multiple Sign Language Translation into Voice Message. Hussana Johar R.B, Priyanka A, Revathi Amrut M S, Suchitha K, Sumana K J. International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 10, April 2014.

  11. Review in Sign Language Recognition Systems Symposium on Computer & Informatics. M. Ebrahim AI-Ahdal & Nooritawati Md Tahir, (ISCI), pp:52-57, IEEE ,2012.

  12. Sign Language to Speech Converter Using Neural Networks. Mansi Gupta, Meha Garg, Prateek Dhawan. International Journal of Computer Science & Emerging Technologies 14 Volume 1, Issue 3, October 2010.

  13. Embedded Based Hand Talk Assisting System for Deaf and Dumb. J. Thilagavathy, Dr.Sivanthi murugan, S. Darwin. International Journal of Engineering Research & Technology (IJERT), vol 3, issue 3, March 2014.

  14. Hand Gesture Recognition Systems: A Survey. Arpita Ray Sarkar, G. Sanyal, and S. Majumder, International Journal of Computer Applications, vol. 71, no.15, pp. 0975 -8887, May 2013.

Leave a Reply