- Open Access
- Authors : S. Shashi Pravardhan , Ch. Vamsi , T. Balaji Kumar , Md Saad Ur Rahman, Chandarapu Vijay Kumar
- Paper ID : IJERTV11IS060247
- Volume & Issue : Volume 11, Issue 06 (June 2022)
- Published (First Online): 30-06-2022
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
Hand Gesture Recognition for Mute People Using Machine Learning
Shashi Pravardhan1, Ch. Vamsi2, T. Balaji Kumar 3, Md Saad Ur Rahman4, Chandarapu Vijaya Kumar5
Tech (IV-CSE), Department of Computer Science and Engineering1, 2, 3, 4
Associate Professor, Department of Computer Science and Enginnering5 Ace Engineering College, Hyderabad, Telangana, India
Abstract: Physical, sensory, or intellectual disability affects around 500 million individuals globally. Physical and social limitations frequently obstruct their ability to fully participate in society and to enjoy equal rights and opportunities. Ma ny persons who are deaf or deafeningly deaf utilize sign language. Gestures, rather than sounds, are utilized to transmit meaning in sign language, and the form of the hand, hands, arms, body posture and movement, and facial expressions are all used to smoothly represent the speaker's thoughts. Words and phrases are communicated to the audience using signs. The goal of this project is to create a system that can detect hand motions with good accuracy, given that the pattern recognition system receives input from the hand. Recognize and show the pattern in text format. Open CV, Python, and Media Pipe are the most often utilized platforms for software development. We can create human-computer interaction by employing them.
Keywords: Physical, Hand gestures, Sign Language, Open CV, Media Pipe, Python, Hand Posture Recognition, Human Computer Interaction
A gesture is a physical or emotional expression sign. It consists of both body and hand gestures. There are two types of gestures: static gestures and dynamic gestures. The gesture of the hand or the posture of the body suggests a sign for the former. The latter sends signals through the movement of the body or the hand. The user intent is determined through gesture recognition, which recognizes the gesture or movement of the body or body components. Many academics have worked for decades to enhance hand motion detection technologies. Many applications, such as sign language recognition, augmented or virtual reality, sign language interpreters for the impaired, and robot control, benefit greatly from hand gesture recognition. Because of the new generation of gesture interface technology, the relevance of gesture recognition has grown at a rapid rate. Sign language is primarily a language for the deaf-mute people who are unable to communicate with others via spoken languages. Even though they can see, using hand signals to communicate becomes problematic if a common hand sign language is not followed. A standard sign language contains a well-defined collection of signs and their meanings, which makes it simple to comprehend. Different motions, usually the hand, are used in sign language to communicate. Sign languages differ from spoken languages in several ways. Because hand gestures transmit clearer messages and may be done spontaneously, hand gestures are used more frequently for interaction with people and machines than other body movements such as head and eyes. Deaf and dumb individuals are growing more outgoing these days, and unlike in the past, they no longer rely on others for communication. As a result, it is critical for the general populace around them to be able to comprehend what they are trying to say via sign language. For such persons, we created the hand gesture recognition programmer, which allows anybody to understand sign language even if they are unfamiliar with the meanings of the signals. As a result, there will be no communication barriers for the deaf and dumb.
Due to the rise of speech and gesture controls, any electronic gadget around us may now be managed without the need to touch it. The words uttered by a person are recognized by voice controls. When using gesture controls, we move our hand in a precise pattern to direct the device to complete a specific activity. A great deal of study has gone into developing a quick and effective hand motion detection system. Many experiments for identifying hand gestures have been conducted, and such programs are now being utilized as components of other programs for things like operating a gadget with hand gestures. A hand gesture can be recognized in a variety of ways from a picture.
The deaf and dumb communicate via sign language, which is difficult to decipher for those who are not familiar with it. As a result, there is a need to develop a device that can translate motions into text and voice. This will be a significant step in allowing deaf and dumb people to communicate with the broader population. Hand gesture categorization is required for sign language recognition. For American Sign Language (ASL) finger spelling alphabets and digits, a method for static hand gesture categorization has been developed. Skin color-based segmentation is used in the system, which requires little post-processing. As characteristics that describe hand motions, the averages of central moments of order 2 to 9 have been retrieved. For recognition, a neural network classifier was used, and it provided good classification results of 73.68 percent with a minimal feature vector size of 8 features. Hand gesture categorization is required for sign language recognition. For American Sign Language (ASL) finger spelling alphabets and digits, a method for static hand gesture categorization has been developed. Skin color-based segmentation is used in the system, which requires little post-processing. As characteristics that describe hand motions, the averages of central moments of order 2 to 9 have been retrieved. For recognition, a neural network classifier was used, and it provided good classification results of 73.68 percent with a minimal feature vector size of 8 features.  The hand area is recognized from the original photos from the input devices, which is the initial step in the work flow of hand gesture identification. Then, to define hand motions, various characteristics are extracted. Finally, hand gesture recognition is achieved by comparing the similarity of feature data. Normal camera, stereo camera, and ToF (time of flight) camera are some of the input devices that provide the original visual information. The depth information is also provided by the stereo camera and ToF camera, making it simple to partition the hand region from the backdrop in terms of the depth map.  Recognize various hand gestures and classify them efficiently in order to comprehend static and dynamic hand movements used in communication. Gesture recognition systems such as the Kinect, hand movement sensors, connecting electrodes, and accelerometers are used to collect static and dynamic hand motions. Hand gesture recognition methods such multivariate fuzzy choices tree, Hidden Markov Models (HMM), dynamic temporal warping framework, latent regression forest, support vector machine, and surface electromyogram are used to process these motions. These movements are analyzed for occlusion and finger near engagement in order to identify correct gesture capturing devices under proper lighting circumstances. These collected movements are analyzed for occlusion and finger near encounters in order to identify the proper gesture and classify it, while intermittent gestures are ignored. To identify just the intended gesture in real time, powerful algorithms like HMM are required. The efficacy of classified gestures is then compared to training and tested standard datasets such as sign language alphabets and KTH datasets. Some technologies, such as sign language recognition, robotics, television control, rehabilitation, and music orchestration, rely heavily on hand gesture recognition.  Gesture is the most rudimentary method of communication between humans. Today, in the era of modern innovation, gesture recognition has a variety of effects on the globe, ranging from physically challenged people to robotic management to virtual reality scenarios. Human hand motions provide a natural and beneficial nonverbal communication strategy with the computer interface. Hand gestures are large body movements using the hands, arms, or fingers. Hand gestures have distinct evidence levels, ranging from the static gesture with a complex basis to dynamic gestures that show human emotion and connect with computers or humans. The hand is used solely as a contribution to the machine; there is no requirement for an intermediary medium for verbal interchange or gesture identification. A deep convolutional neural network is suggested in this study to quickly classify hands.
For the detection of hand gestures, there are several types of hardware and sensors available. These approaches are neither accurate nor appropriate for the task. The system is difficult to operate due to the utilisation of several sensors and hardware. In addition, systems have relied on stereo cameras, which are more expensive and consume more resources. The current software for hand recognition is extremely sluggish and does not generate accurate results. This is a significant flaw in the existing system. Many consumers are having issues with the software they are utilising.
For deaf and dumb persons, we suggest converting hand gestures into text in this project. The main goal of our project is to recognize hand gestures, detect gestures, and display the results as text. In front of the camera, the end user must make hand motions. Our application will identify the motions as they are made by the user and will convert them into text in real time. The video obtained from the camera unit will be presented on the screen, and inside that video, alongside the hand, the required output will be displayed. Our project serves as a deaf and dumb translator. It solves a number of issues, including the necessity for a human translation. Our program will allow deaf and dumb individuals to express themselves. We'll use the camera to identify hand motions. To use a camera to detect these motions, we must first isolate the hand region, deleting any undesired sections from the video sequence collected by the camera. We count the fingers visible to the camera after segmenting the hand region to direct a program based on the finger count. As a result, the entire problem can be handled in five easy steps: We must first locate and segment the hand region in the video stream. Then, from the segmented hand region in the video sequence, count the number of fingers and the size of the palm. The segmented section will then be matched to the available dataset. Then, for quicker data selection, choose the most accurate data from the dataset and apply a weight for the next comparison. Finally, we'll translate the data we've gathered into text and display it. In front of the camera, the end user must make hand motions. Our application will identify the motions as they are made by the user and will convert them into text in real time. Our project serves as a deaf and dumb translator. It solves a number of issues, including the necessity for a human translation. Our program will allow deaf and dumb individuals to express themselves.
Python is a high-level, interpreted, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation. Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library.
CONVOLUTIONAL NEURAL NETWORK:
A convolutional neural network (CNN/ConvNet) is a type of deep neural network used to evaluate visual images in deep learning. When we think about neural networks, we usually think of matrix multiplications, but this isn't the case with ConvNet. It employs a method known as Convolution. Convolution is a mathematical operation on two functions that yields a third function that explains how the form of one is changed by the other.
OpenCV is a large open-source library for image processing, computer vision, and machine learning. Python, C++, Java, and more programming languages are supported by OpenCV. It can analyze photos and movies to recognize items, people, and even human handwriting. When it's paired with other libraries, such as Numpy, a highly efficient library for numerical operations, the number of weapons in your arsenal grows, since all of the operations that Numpy can do may be merged with OpenCV.
Face and hand landmarks are detected using the mediapipe python package. To detect the entire facial and hand landmarks, we'll use a Holistic model from mediapipe solutions. We'll also look at how to access distinct landmarks on the face and hands that may be utilized for computer vision applications like sign language recognition and sleepiness detection, among others.
Fetch Camera Image
Process Image with Mediapipe
Get Hand Landmarks
Process the Landmark
Get Gesture Prediction
Fig1: Flow Chart
Fig2: Call me
We looked at deep learning-based image captioning algorithms that used a camera in this research. We've included taxonomy of video captioning approaches, as well as a basic block diagram showing the primary groupings and their benefits and drawbacks. We employ a variety of approaches, including the CNN algorithm. MEDIA PIPE is another option. The media pipe is used to create a variety of graphs. The program is built on the PYTHON PROGRAMMING software platform. This project is also built with OPENCV (Open Source Computer Vision Library). There are about 2500 optimized algorithms in the collection. We created this project for mute and deaf people, sometimes known as physically disabled people, by merging Python with OpenCV. They will be able to converse without difficulty with this program. Hand gesture recognition will have a very broad reach in the future as the number of users grows every day. In order to find an alternative to using a person to transmit their sentiments, they are willing to utilize software like this. As a result, this initiative will be of more assistance to them.
We would like to thanks to our guide Associate Prof. Mr. CH. VIJAY KUMAR and Associate Prof. Mrs. Soopari. Kavitha for their continuous support and guidance. Due to their guidance, we can complete our project successfully. Also, we are extremely grateful to Dr. M. V. VIJAYA SARADHI, Head of the Department of Computer Science and Engineering, Ace Engineering College for his support and invaluable time.
REFERENCES G. R. S. Murthy, R. S. Jadon. (2009). A Review of Vision Based Hand Gestures Recognition,International Journal of Information Technology and Knowledge Management, vol. 2(2), pp. 405-410.  P. Garg, N. Aggarwal and S. Sofat. (2009). Vision Based Hand Gesture Recognition, World Academy of Science, Engineering and Technology, Vol. 49, pp. 972-977.  FakhreddineKarray, MiladAlemzadeh, Jamil AbouSaleh, Mo Nours Arab, (2008). Human-Computer Interaction: Overview on State of the Art, International Journal on Smart Sensing andIntelligent Systems, Vol. 11).  Wikipedia Website.  Mokhtar M. Hasan, Pramoud K. Misra, (2011). Brightness Factor Matching For GestureRecognition System Using Scaled Normalization, International Journal of Computer Science &Information Technology (IJCSIT), Vol. 3(2).  Xingyan Li. (2003). Gesture Recognition Based on Fuzzy C-Means Clustering Algorithm,Department of Computer Science. The University of Tennessee Knoxville.  S. Mitra, and T. Acharya. (2007). Gesture Recognition: A Survey IEEE Transactions on systems, Man and Cybernetics, Part C: Applications and reviews, vol. 37 (3), pp. 311- 324, doi:10.1109/TSMCC.2007.893280.  Simei G. Wysoski, Marcus V. Lamar, Susumu Kuroyanagi, Akira Iwata, (2002). A RotationInvariant Approach On Static-Gesture Recognition Using Boundary Histograms And NeuralInternational