Sign Language Recognition using Machine Intelligence for Hearing Impaired Person

G. Nalina Keerthana; Sahana. T; Roshan Shabiha. A; Sana Zaffira

doi:10.17577/IJERTCONV10IS08009

ETEDM - 2022 (Volume 10 - Issue 08)

Sign Language Recognition using Machine Intelligence for Hearing Impaired Person

DOI : 10.17577/IJERTCONV10IS08009

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 171
Authors : G. Nalina Keerthana, Sahana. T, Roshan Shabiha. A, Sana Zaffira
Paper ID : IJERTCONV10IS08009
Volume & Issue : ETEDM – 2022 (Volume 10 – Issue 08)
Published (First Online): 30-07-2022
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Sign Language Recognition using Machine Intelligence for Hearing Impaired Person

Mrs. G. Nalina Keerthana

Assosiate Professor

M.I.E.T. Engineering college Trichy, India

Roshan Shabiha. A

Computer science and Engineering

M.I.E.T. Engineering College Trichy, India

Abstract People with impaired speech and hearing use Sign language as a form of communication. Disabled People use sign language gestures as a tool of non-verbal communication to express their own emotions and thoughts to other common people. Conversing with people having a hearing disability is a major challenge. To overcome these issues, systems that recognize different signs and convey the information to normal people are needed. But these common people find it difficult to understand their expression, thus trained sign language expertise are needed during medical and legal appointment, educational and training session. Over the past few years, there has been an increase in demand for these services. To address this problem, we can implement artificial intelligence technology to analyse the users hand with finger detection. In this proposed system we can design the vision based system in real time environments. And then using deep learning algorithm named as Convolutional neural network algorithm to classify the sign and provide the label about recognized sign .

Keywords Sign Language Recognition, Convolutional Neural Network, Image Processing, Segmentation.

INTRODUCTION

Machine learning is an application of artificial intelligence (AI). It enables the system to automatically learn and improve from experience without being programmed explicitly. Machine learning aims on the development of computer programs that can access data and use it to learn for themselves. The process of learning begins with observations or data, such as examples, direct experience, or instruction, in order to look for patterns in data and make better decisions in the future based on the examples that we provide. The primary aim is to allow the computers learn automatically without human intervention or assistance and adjust actions accordingly. Machine learning algorithms are often categorized as supervised or unsupervised. Supervised algorithms require a datascientist or data analyst with machine learning skills to provide both input and desired output, in addition to furnishing feedback about the accuracy of predictions during algorithm training. Once training is complete, the algorithm will apply what was learned to new data. Unsupervised algorithms do not need to be trained with desired outcome data. Instead, it uses an iterative approach called deep learning to review data and arrive at

Sahana.T

Computer Science and Engineering

M.I.E.T. Engineering college Trichy, India

Sana Zaffira

Computer Science and Engineering

M.I.E.T. Engineering College Trichy, India

conclusions. These algorithms are also called neural networks are used for more complex processing tasks than supervised learning systems, including image recognition, speech-to-text and natural language generation. These neural networks work by combing through millions of examples of training data and automatically identifying often subtle correlations between many variables. Once trained,the algorithm can use its bank of associations to interpret new data. These algorithms have only become feasible in the age of big data.

Supervised machine learning algorithms can apply what has been learned in the past to new data using labeled examples to predict future events. Starting from the analysis of a known training dataset, the learning algorithm produces an inferred function to make predictions about the output values. The system is able to provide targets for any new input after sufficient training. The learning algorithm can also compare its output with the correct, intended output and find errors in order to modify the model accordingly

In contrast, unsupervised machine learning algorithms are used when the information used to train is neither classified nor labeled. Unsupervised learning studies how systems can infer a function to describe a hidden structure from unlabeled data. The system doesnt figure out the right output, but it explores the data and can draw inferences from datasets to describe hidden structuresfrom unlabeled data.

Semi-supervised machine learning algorithms fall somewhere in between supervised and unsupervised learning, since they use both labeled and unlabeled data for training typically a small amount of labeled data and a large amount of unlabeled data. The systems that use this method are able to considerably improve learning accuracy. Usually, semi-supervised learning is chosen when the acquired labeled data requires skilled and relevant resources in order to train it / learn from it. Otherwise, acquiring unlabeled data generally doesnt require additional resources.

Reinforcement machine learning algorithms is a learning method that interacts with its environment by producing actions and discovers errors or rewards. Trial and error search and delayed reward are the most relevant

characteristics of reinforcement learning. This method allows machines and software agents to automatically determine the ideal behavior within a specific context in order to maximize its performance. Simple reward feedback is required for the agent to learn which action is best; this is known as the reinforcement signal.

Machine learning enables analysis of massive quantities of data. While it generally delivers faster, more accurate results in order to identify profitable opportunities or dangerous risks, it may also require additional time and resources to train it properly. Combining machinelearning with AI and cognitive technologies can make it even more effective in processing large volumes of information.
EASE OF USE LITERATURE REVIEW
1. Sign Language Recognition Using Multiple Kernel Learning: A Case Study Of Pakistan Sign Language
  
  Sign language is the way of communication and interaction for deaf people all around the world. This kind of communication is accomplished over some hand gestures, facial expressions, or movement of arm/body. The sign language recognition system aims to enable the deaf community to communicate with normal society appropriately. It is a highly structured symbolic set that provides the human computer interaction . Sign language is very beneficial as a communication tool, and every day millions of deaf people around the world use sign language to communicate and express their ideas. This facilitation and assistance to deaf persons enable and encourage them to be a healthy part of society and integrate them into society. The dataset is obtained from the sign language videos. At a later step, four vision- based features are extracted. The extracted features are individually classified using Multiple kernel learning (MKL) in support vector machine (SVM). A voting scheme is adopted for the final recognition of PSL. The performance of the proposed technique is measured in terms of accuracy, precision, recall, and F-score. Thesimulation results are promising as compared with existing approaches.
2. Asl-3dcnn: American Sign Language Recognition Technique Using 3-D Convolutional Neural Networks
  
  Hand gestures are used as a way for people to express thoughts and feelings, it serves to reinforce information delivered in our daily conversation. Sign language is a structured form of hand gestures involving visual motions and signs, which are used as a communication system. For the deaf and speech-imaired community, sign language serves as useful tools for daily interaction. Sign language involves the use of different parts of body namely fingers, hand, arm, head, body and facial expression to deliver information. However, sign language is not common among the hearing community, and fewer are able to understand it. This poses a genuine communication barrier between the deaf community and the rest of the society, as
  
  a problem yet to be fully solved until this day. There are growing numbers of emerging technology such as EMG, LMC, and Kinect which capture gesture information more readily. The common pre-processing method used are Median and Gaussian filter as well as downsizing of images prior to subsequentstages. Skin color segmentation is one of the most commonly used segmentation method. Color space which are generally more robust towards illumination condition are CIE Lab, YCbCr and HSV. More recent research utilizes combination of several others spatial features and modelling approaches to improve segmentation performance.
3. Comprehensive Study On Deep Learning- Based Methods For Sign Language Recognition
  
  Spoken languages make use of the vocal – auditory channel, as they are articulated with the mouth and perceived with the ear. All writing systems also derive from, or are representations of, spoken languages. Sign languages are different as they make use of the corporal
  
  – visual channel, produced with the body and perceived with the eyes. SLs are not international and they are widely used by the communities of the Deaf. They are natural languages since they are developed spontaneously wherever the Deaf have the opportunity to congregate and communicate mutually. A comparative experimental assessment of computer vision based methods for sign language recognition is conducted. By implementing the most recent deep neural network methods in this field, a thorough evaluation on multiple publicly available datasets is performed. The aim of the present study is to provide insights on sign language recognition, focusing on mapping non-segmented video streams to glosses. For this task, two new sequence training criteria, known from the fields of speech and scene text recognition, are introduced. Furthermore, a plethora of pre-training schemes is thoroughly discussed. Finally, a new RGB+D dataset for the Greek sign language is created. Sign Language Recognition (SLR) can be defined as the task of inferring glosses performed by a signer from video captures. Even though there is a significant amount of work in the field of SLR, a lack of a complete experimental study is profound.
4. Independent Sign Language Recognition With 3d Body, Hands, And Face Reconstruction
  
  In this paper we investigated the extraction of 3D body pose, face and hand features for the task of Sign Language Recognition. These features are compared, to Openpose key-points, the most famous method for extracting 2D skeleton parameters and features from raw RGB frames and their optical flow that are fed in a state- of-the-art Deep Learning architecture used in action and sign language recognition. The experiments revealed the superiority of SMPL-X features due to the detailed and qualitative features extraction in the three aforementioned regions of interest. Moreover, SMPL-X to point out the significance of combining all these three regions for optimal results in SLR are exploited. Future work on 3D
  
  body, face and hands extraction for SLR includes further experiments in different independent datasets with more signers and varying environment. Furthermore, Applying SMPL-X in continuous SLR will give further prominence to this method, where facial expressions and body structure are even more crucial. Finally, applying SMPL-X in different action recognition tasks is an interesting experiment to examine the universality of SMPL-X success. In this work, SMPL-X, a contemporary parametric model that enables joint extraction of 3D body shape, face and hands information from a single image. This holistic 3D reconstruction for SLR, demonstrating that it leads to higher accuracy than recognition from raw RGB images and their optical flow fed into the state-of- the-art I3D-type network for 3D action recognition and from 2D Openpose skeletons fed into a RNN.
5. Explicit Quaternion Krawtchouk Moment Invariants For Finger-Spelling Sign Language Recognition
Sign language recognition (SLR) is an important aspect in human computer interaction applications, used by deafand hearing impaired. In fact, SLR are gestural languages which uses signs for communication without speaking. This task is a challenging problem, mainly because of the nature of the hidden computer vision problem, such as the visual analogy of specific signs and the complex articulations presented by the hand. Typically, there are three components to constitute a sign gesture: manual features which are gestures made with hands, nonmanual features such as facial expressions, body posture, and finger-spelling when words are spelt into alphabet. In this context, the importance of fingerspelling can be noticed when a concept lacks a specific sign, such as names, technical terms, or foreign words. Sign recognition is a difficult task due to the complexity of its composition which uses signs of different levels, words, facial expression, body posture and finger-spelling to convey meaning. With the development of recent technologies, such as Kinect sensor, new opportunities have emergedin the field of human computer interaction and sign language, allowing to capture both RGB and Depth (RGB-D) information. In the regard to feature extraction, the traditional methods process the RGB and Depth images independently. A robust static fingerspelling sign language recognition system adopting the Quaternion algebra that provide a more robust and holistical representation, based on fusing RGB images and Depth information simultaneously is proposed in this system. A new sets of Quaternion Krawtchouk moments(QKMs) and Explicit Quaternion Krawtchouk 19 Moment Invariants (EQKMIs) is proposed for the first time. The proposed system is evaluated on three well-known fingerspelling datasets, demonstrate the performance of the novel method compared to other methods used in the literature, against geometrical distortion, noisy conditions and complex background, indicating that it could be highly effective for many other computer vision applications.
EXISTING SYSTEM

The sign language is used widely by people who are deaf- dumb these are used as a medium for communication. A sign language is nothing but composed of various gestures formed by different shapes of hand, its movements, orientations as well as the facial expressions. There are around 466 million people worldwide with hearing loss and 34 million of these are children. Deaf people have very little or no hearing ability. They use sign language for communication. People use different sign languages in different parts of the world. Compared to spoken languages they are very less in number. In existing system, lack of datasets along with variance in sign language with locality has resulted in restrained efforts in finger gesture detection. Existing project aims at taking the basic step in bridging the communication gap between normal people and deaf and dumb people using Indian sign language. Effective extension to words and common expressions may not only make the deaf and dumb people communicate faster and easier with outer world, but also provide a boost in Developing autonomous systems for understanding and aiding them. The Indian Sign Language lags behind its American Counterpart as the research in this field is hampered by the lack of standard datasets.
SYSTEM ARCHITECTURE
PROPOSED SYSTEM

Instead of acoustic sound patterns, Sign Language is a gesture-based language that uses hand movements, hand orientation, and facial expression. This form of language is not universal and has different patterns depending on the people.However, because most individuals aren't familiar with sign language, Deaf-mute persons are finding it more difficult to communicate without the aid of a translation of some sort. They believe they are being shunned. Between deafmute people and normal people, Sign Language Recognition has become a commonly accepted communication approach. Computer vision- based and sensor-based systems are two types ofrecognition models. The camera is utilized for input in computer visionbased gesture recognition, and image processing of input motions is done before recognition. Following that, several algorithms such as region of interest algorithm and Neural

Network approaches are used to recognize the processed gestures. The fundamental disadvantage of a vision-based sign language recognition system is that the picture collecting process is subject to numerous environmental concerns, such as camera placement, background conditions, and lightning sensitivity. However, it is more convenient and cost-effective than employing a camera and tracker to collect data. However, for greater accuracy, Neural Network methods like the Hidden Markov Model are combined with camera data.
CONCLUSION

The ability to look, listen, talk, and respond appropriately to events is one of the most valuable gifts a human being can have. However, some unfortunate people are denied this opportunity. People get to know one another through

sharing their ideas, thoughts, and experiences with others around them. There are several ways to accomplish this, the best of which is the gift of "Speech."

Everyone can very persuasively transfer their thoughts and comprehend each other through speech. Our initiative intends to close the gap by including a low-cost computer into thecommunication chain, allowing sign language to be captured, recognised, and translated into speech for the benefit of blind individuals. An image processing technique is employed in this paper to recognise the handmade movements. This application is used to present a modern integrated planned system for hear impaired people. The camera-based zone of interest can aid in the user's data collection. Each action will be significant in its own right.

REFERENCE

[1] Bhanu.B and Kumar.A Deep Learning for Biometrics. Springer, 2017.

[2] Boyd.S and Vandenberghe.L Convex Optimization. Stanford University, 2004.

[3] Bubeck.S Convex optimization: Algorithms and complexity.

Foundations and Trends in Machine Learning, 2015.

[4] Chinese Academy of Sciences Institute of Automation. CASIA iris image database, Aug 2017.

[5] Daugman.J How iris recognition works? IEEE Transactions on Circuits and Systems for VideoTechnology, 2004.

[6] Daugman.J New methods in iris recognition. IEEE Transactions on Systems, Man and Cybernetics, 2007.

[7] Daugman.J Information theory and the iriscode. IEEE Transactions on Information Forensics and Security, Feb 2016.

[8] Daugman.J andDowning.C Searching for doppelgangers: assessing the universality of the iriscode impostors distribution. IET Biometrics, 2016.

[9] Franceschi.L, Frasconi.P, Salzo.S, Grazzi.R, and Pontil.M Bilevel programming for hyperparameter optimization and metalearning. In International Conference on Machine Learning (ICML), 2018.

[10] Gangwar.Aand Joshi.ADeepirisnet: Deep iris representation with applications in iris recognition and cross-sensor iris recognition. In IEEE International Conference on Image Processing (ICIP),Sep 2016.