Persian Sign Gesture Translation to English Spoken Language on Smartphone

Download Full-Text PDF Cite this Publication

Text Only Version

Persian Sign Gesture Translation to English Spoken Language on Smartphone

Muhammad Reza Jafari

Computer Science & Engineering Department Delhi Technological University (DTU)

Delhi, India

Dr. Vinod Kumar

Computer Science & Engineering Department Delhi Technological University (DTU)

Delhi, India

Abstract Hearing impaired and others with verbal challenges face difficulty to communicate with society; Sign Language represents their communication such as numbers or phrases. The communication becomes a challenge with people from other countries using different languages. Additionally, the sign language is different from one country to another. That is, learning one sign language doesnt mean learning all sign languages. To translate a word from sign language to a spoken language is a challenge and to change a particular word from that language to another language is even a bigger challenge. In such cases, there is necessity for 2 interpreters: One from sign language to the source-spoken language and one from the source language to the target language.

There is ample research done on sign recognition, yet this paper focuses on translating gestures from one language to another. In this study, we are proposing the use of smartphone as a platform for Sign Language recognition, because smartphones are available worldwide. Smartphones are limited in computational power and we propose a system of client- server application where most of processing tasks are done on the server side. In client-server application system, client could be a smartphone application that captures images of sign gestures to be recognized and sent to a server. In turn, the server processes the data and returns the translation Sign to client. On the server application side, where most of the sign recognition tasks take place, background of the sign image is deleted, and under Hue, Saturation, Value (HSV) color space is set to black. The sign gesture then is segmented by detecting the biggest connected component in the frame. Features extracted from the frame are the binary pixels, and Convolutional Neural Network (CNN) is used to classify sign images. After classification, the letter for a given sign is assigned, and by putting the sequence of letters, a word is created. The word translates to target language, in this case English, and the result returns to client application.

Keywords Sign Language, Gesture Recognition, Computer Vision, Image Processing, Machine Learning, hearing-impaired people, Convolutional Neural Network (CNN).

  1. INTRODUCTION

    A means of communication among people with verbal disability is sign language; sign language is used to represent what they want to share with one another, such as: words, phrases or numbers. Additionally, face and body gestures are also used. Hearing-impaired people have difficulty communicating because most people dont understand sing language and as sign. Addressing this gap, captures interest among researchers.

    Like natural language, gesture based communication also differs per nation or location [1]. Considering this difference

    Persian language is no exception. Figure 1 shows the Persian sign Language Alphabets.

    Figure 1 Persian Sign Language Alphabets

    Much research has been done on sign language using different systems and devices, such as gloves with sensors [2], and more complex systems with cameras and Kinect Devices to help capture acceleration movement [3]. Majority of the research implemented, used computer as main platform, which is not practical to carry around.

    Embedded sensors in gloves for tracking hand gestures has attracted most of the sign language recognition research, yet these gloves cannot be used in humidity and rain. Further, they are not portable because whenever they are needed a computer is required as well.

    With development of smartphones technology and improvement of computational capacity, sign language recognition system is easier to apply using smartphones. Moreover, the portability challenge would not be a concern. Some researchers have proposed using smartphones like

    systems [4], yet still smartphones suffer from computational power [5]. Proposal is a client-sever approach, which is much better in terms of computational power of smartphones.

    In this paper we propose a client-server system, which addresses these drawbacks using CNN classifier which has shown better performance [6][7]. Focus of this study is on Persian Sign Language (PSL) recognition system on a smartphone platform that can recognize Persian Sign language and create a word and translate the word to English.

    Gesture recognition applications include different stages such as: segmentation, feature extraction and classification. The aim of segmentation is to eliminate the noise in the background, leaving only Region of Interest (ROI), which is the main useful data in the picture. In feature extraction stage, ROI is the focused area to extract from image. These highlights can be edges, shapes, flows, corners, colors, surfaces, etc. The highlight of pictures basically is personality of each communication via gestures motion. In the next stage, the features experience classification is completed, which is used to train the system, and deicide to which group of gestures, the new sign belongs. [8]

    There is much work done on sign language system with computer platform; however, very limited research has been done on cell phones platforms. The past research on cell phones have shown big disadvantage in computation and resource limitation. [9]

    In the following sections, the existing literature is referenced, and subsequently, the proposed method, evaluation, conclusion, and future work.

  2. EXISTING LITERATURE

    Sign language has been part of communication medium in human life. The usage of signs or movements of body are not fixed to age, ethnicity, or sexual orientation [10]. Many researchers proposed different approaches in sign language recognition.

    Paper [11], introduces a system that uses mobile phone to recognize sign gesture for skin detection it use three color space RGB, YCbCr, and HSI. For recognizing process it has step it use histogram matching, and out of that go to ORB algorithm and achieve 69.41% accuracy.

    In paper [12], an Android system translating sign language is developed, for hand detection using OpenCV. For classification K-NN is used, the system detected up to 50cm away from palm of a hand for recognition of gesture.

    S. M. Halawani [10] has proposed Arabic Sign Language Translation System (ArSL-TS). His model uses a smartphone to translate Arabic text into Arabic Sign language.

    In paper [8], a method for better segmentation can recognize 32 Persian static sign gestures. Their method is used YCbCr color space, sign Gaussian model and Bayes rule. In order to recognize the sign gestures by help of radial distance and Fourier Transform sign gesture extraction, and by help of Euclidean distance to find similarity between hand gesture and training database. The accuracy of the system is 95.62%.

    Cheok Ming Jin [4], proposed a smartphone platform for ASL (American Sign Language) recognition. He implements Canny edge detection plus seeded region growing in order to segment hand gesture in the picture, for extraction of feature Speeded Up Robust Features (SURF) algorithm and SVM

    (Support Vector Machine) is used for classification. The accuracy of the system for 16 class of ASL is 97.13%.

    In paper [13], static Persian Sign Language gesture for recognizing some word is presented. It uses a digital camera for aking pictures as an input. The system uses Discrete Wavelet Transform and Neural Network for feature extraction and classification. The classification accuracy is 98.75%.

    [14] This paper presents a system to recognize Persian static sign gesture. A digital camera is used for taking input pictures, feature extraction and classification uses Wavelet Transform and Neural Network. The accuracy of the system is 94.06%.

    Sakshi Lahoti [4], proposed a smartphone approach in order to recognize the American Sign Language (ASL), YCbCr system used for segmentation of skin in pictures captured by smartphone, for feature extraction they use HOG, and finally for classification SVM is used, the accuracy of system is 89.54%.

    Promila Haque [15], proposed two-Hand Bangla Sign Language Recognition system, the system has three phases formation, training and classification that can recognize 26 sign gestures. Principal Component Analysis (PCA) is used to extract image principle component and for classification K- Nearest Neighbors is used. He used 104 images for testing and achieved Precision 98.11%, Recall 98.11% and F-Measure 90.46%

    In [6], a comparison classification between CNN and SVM show, CNN has better performance compared to SVM. The accuracy of CNN is 90%.

    In [7], as shown in the research experiment the CNN improves the performance of classification.

    Abbas Muhammad Zakariya, [5] proposed Arabic Sign Language (ArSL) recognition; based on client server approach which client is a smartphone. They use HSV color space for background elimination and SVM for classification and achieved accuracy of 92.5%.

  3. PROPOSED METHOD

    A client server recognition system is developed as shown in figure 2, as on the client side is a smartphone; user in client side could interact directly with smartphone application. The Android application catches sign gesture pictures as an input to the system, and sends it through Application Programming Interface (API) to server. On the other hand, on server side receives the picture from the client-server. After predicting and translating the text from sign gesture, server API sends it back to the client API responsible to show the text on the screen of smartphone

    Figure 2 Machine Translation from Persian Sign to English on Smartphones

    Client has two main responsibilities: first captures the sign gesture image as input for server and displays the prediction text on the screen of smartphone, while server has three main responsibilities: first preprocesses the input images, second, classifies and finally translates the predicted text to the target language.

    1. Smartphone Application

      A smartphone application is developed by android studio using Volley library as client to capture images and sent it over to sever for further processes. After that shows the result to user. The picture is saved in server directory and then the server reads these images from the directory. After processing and translating the images in to Persian word and translate it to English, the results are sent back to the client. Finally, the smartphone application displays the results to user. Some screenshots from the mobile application show in figure 5.

    2. Background Elimination

      The background of input picture which was sent from smartphone is detected and set to black. The picture transforms from one color space to another in this case from RGB to HSV, so the skin color is detected and series of dilation and erosion using elliptical kernel is made. The final frame is created by combining effect of two masks as shown in figure 3.

      1. (b)

        Figure 3 (a) Raw Picture, (b) Picture after background elimination

    3. Background Elimination

      The picture from previous process where the background was changed to black is first converted into grayscale, although the color of original picture is lost, this process increase the robustness of system to variety of lighting conditions, after that pixels which is non-black pixels change to white (binarize) and the rest of pixels remain black, then the segmentation of hand gesture is begun by removing all connected component in the picture and only allow the largest connected components that is the sign gesture, resizing the image into 64*64 pixels. The whole process is shown in figure 4.

      1. (b)

        Figure 4 (a) Binarize, (b) Segment hand gesture and resize

    4. Feature Extraction

      The sign language gesture images are normalized and scaled to 64*64px, binary pixels of the image are what we use as features. We found out that scaling to 64 pixels will contain good enough features to classify the Persian Sign Language (PSL) gestures efficiently. By using 64*64px we will be having 4096 number features.

    5. Classification

      A Convolutional Neural Network (CNN) is what we use to classify our sign gesture datasets extracted from the pictures. CNN is a multilayered neural network; it has especial architecture in order to discover complex features in data. Its mostly used in image recognition, powering vision in robots and self-driving vehicles. CNN has three main steps: 1. Convolution, 2. ReLu, 3. Pooling, 4. Flattening, and 5. Full connection.

    6. Translation to English

      The words recognized by CNN classifier are placed in an array in order to construct a word, then by using a bilingual dictionary, in this case Google-translate, library of Python, translating a Persian word to an English word. The word that is translated is sent to the client (Smartphone), as shown in figure 5.

      The word which is used to translate is combination of 10 Alphabetic letters A(), B (), C (), D (), Gh (), K (), N (), O ( ), T (), and Y ().

      1. (b)

    Figure 5 (a) input gesture sign, (b) Translation in English

  4. EVALUATION

    For each of the 10 Persian Sign Language the model is evaluated, which have the following Alphabetic: A(), B (), C (), D (), Gh (), K (), N (), O ( ), T (), and Y (). We

    have used a total of 2000 images to train the Convolutional Neural Network (CNN) classifier. For evaluation system performance, we split our training image to 20% testing and 80% training, and we have obtained an accuracy of 98% Table 1 shows detailed precision, recall and F-Measures for each class.

    TABLE 1 PRECISION, RECALL, F-MEASURE

    Letter

    Precision

    Recall

    F-Measure

    Support

    A

    0.98

    1.00

    0.99

    40

    B

    1.00

    1.00

    1.00

    40

    C

    1.00

    1.00

    1.00

    40

    D

    1.00

    1.00

    1.00

    40

    GH

    1.00

    1.00

    1.00

    40

    K

    1.00

    0.88

    0.93

    40

    N

    1.00

    0.97

    0.99

    40

    O

    0.87

    0.97

    0.92

    40

    T

    1.00

    1.00

    1.00

    40

    Y

    1.00

    1.00

    1.00

    40

    Accuracy

    0.98

    400

  5. CONCLUSION AND FUTURE WORK There are many Computer base systems for sign language

recognition as mentioned in this paper, but they are not, practical because not possible to carry around for sign gesture recognition, that is needed for any time at any point to communicate with others. The best way to address this gap is smartphone, which are portable, available, and easy to use.

In conclusion, this paper discusses methods of Persian Sign Language gestures translation to English text on smartphones. As stated earlier, the major problem with smartphones is computational power [5], which in this case a client-server system proposes to overcome this constraint. To improve the performance of system a CNN classifier is used, and to translate from Persian to English a bilingual dictionary is used.

Carrying out the research, I propose, to extract features from sign gesture images by normalizing and rescaling the image to 64*64 pixels and for robustness of system binary pixels used as features and using CNN for classification. We have used 10 Persian Sign language gestures only, achieving the accuracy of 98% which is better than any other work mentioned in this paper.

In future work will be on improve of model recognize more alphabetic features and to achieve higher accuracy.

REFERENCES

  1. Fakhr Kambiz. (1397/2018), Amozeshe zabane nasenawayan, ertebat ba afrade kar va lal, The research team professor Fakhr.

  2. Mohandes, M., & Deriche, M. (2013, April). Arabic sign language recognition by decisions fusion using Dempster-Shafer theory of evidence. In 2013 Computing, Communications and IT Applications Conference (ComComAp) (pp. 90-94). IEEE.

  3. Jalilian, B., & Chalechale, A. (2014). Persian sign language recognition using radial distance and Fourier transform. Int. J. Image, Graphics and Signal Processing, 1, 40-46.

  4. Lahoti, S., Kayal, S., Kumbhare, S., Suradkar, I., & Pawar, V. (2018, July). Android based American Sign Language recognition system with skin segmentation and svm. In 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1-6). IEEE.

  5. Zakariya, A. M., & Jindal, R. (2019, July). Arabic Sign Language Recognition System on Smartphone. In 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1-5). IEEE.

  6. Shin, Y., & Balasingham, I. (2017, July). Comparison of hand-craft feature based SVM and CNN based deep learning framework for automatic polyp classification. In 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (pp. 3277-3280). IEEE.

  7. Li, Y., Zhang, H., Xue, X., Jiang, Y., & Shen, Q. (2018). Deep learning for remote sensing image classification: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(6), e1264.

  8. Jalilian, B., & Chalechale, A. (2014). Persian sign language recognition using radial distance and Fourier transform. Int. J. Image, Graphics and Signal Processing, 1, 40-46.

  9. Cheok Ming Jin, Zaid Omar, Mohamed Hisham Jaward. A Mobile Application of American Sign Language Translation via Image Processing Algorithms 2016 IEEE Region 10 Symposium (TENSYMP), Bali, Indonesia.

  10. S. M. Halawani. "Arabic Sign Language Translation System on Mobile Devices", International Journal of Computer Science and Network Security (IJCSNS).

  11. Mahesh M, Arvind Jayaprakash, Geetha M Sign language translator for Mobile Platforms IEEE.

  12. Setiawardhana, Rizky Yuniar Hakkun, Achmad Baharuddin Sign Language Learning based on Android for Deaf and Speech Impaired People International Electronics Symposium (IES).

  13. Sarkaleh, A. K., Poorahangaryan, F., Zanj, B., & Karami, A. (2009, November). A Neural Network based system for Persian sign language recognition. In 2009 IEEE International Conference on Signal and Image Processing Applications (pp. 145-149). IEEE.

  14. Karami, A., Zanj, B., & Sarkaleh, A. K. (2011). Persian sign language (PSL) recognition using wavelet transform and neural networks. Expert Systems with Applications, 38(3), 2661-2667.

  15. Haque, P., Das, B., & Kaspy, N. N. (2019, February). Two-Handed Bangla Sign Language Recognition Using Principal Component Analysis (PCA) And KNN Algorithm. In 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE) (pp. 1-4). IEEE.

Leave a Reply

Your email address will not be published. Required fields are marked *