DOI : https://doi.org/10.5281/zenodo.18924021
- Open Access

- Authors : Pooja Kajale, Shaikh Fija, Geeta Zine, Sharannya Shrigadi, Rajnandini Mhaske
- Paper ID : IJERTV15IS030059
- Volume & Issue : Volume 15, Issue 03 , March – 2026
- Published (First Online): 09-03-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
Deep Learning-Based Real-Time Sign Language Recognition and Bi-Directional Communication Systems
Pooja Kajale
Department of CSD Engineering Dr. Vithalrao Vikhe Patil COE, Ahilyanagar, India
Shaikh Fija
Department of CSD Engineering Dr. Vithalrao Vikhe Patil COE, Ahilyanagar, India
Geeta Zine
Department of CSD Engineering Dr. Vithalrao Vikhe Patil COE, Ahilyanagar, India
Sharannya Shrigadi
Department of CSD Engineering Dr. Vithalrao Vikhe Patil COE Ahilyanagar, India
Rajnandini Mhaske
Department of CSD Engineering Dr. Vithalrao Vikhe Patil COE Ahilyanagar, India
Abstract – Effective communication between hearing and hearing-impaired individuals is often limited by the absence of a shared language, creating barriers in educational, social, and professional environments. To address this challenge, this work presents a real-time Sign Language Recognition and Text/Speech Conversion System driven by deep learning. The proposed frame- work operates in two interactive modes to facilitate seamless bi-directional communication. In the rst mode, hand gestures performed in front of a camera are captured through computer vision techniques using MediaPipe or OpenCV. A Convolutional Neural Network (CNN) then classies the detected gestures, instantly displaying the corresponding text and generating spo- ken output via Text-to-Speech (TTS) synthesis. In the second mode, input provided as text or speech is converted into sign language through rendered sign animations, static gesture visuals, enabling effective communication with hearing-impaired users. Experimental evaluation demonstrates that the proposed CNN model achieves a notable classication accuracy of 96%, ensuring reliable and consistent gesture recognition performance. By integrating real-time vision-based gesture detection with AI- driven sign language generation, the system signicantly reduces communication gaps and promotes accessible, inclusive inter- action. This work highlights the potential of combining deep learning, computer vision, and natural language processing to empower the deaf and hard-of-hearing community through en- hanced communication support [7], [15]. Comparative evaluation demonstrates that the proposed approach outperforms existing CNN-based and transfer-learning methods on the same gesture set.
Index TermsSign Language Recognition, Convolutional Neu- ral Network (CNN), Deep Learning, Gesture Classication, Text- to-Speech, MediaPipe, Computer Vision.
- Introduction
Communication is essential to human interaction, yet mil- lions of hearing-impaired individuals still face barriers because sign language is not widely known. Existing solutionssuch
as interpreters or static learning materialsare often costly, inconvenient, or impractical, leaving hearing-impaired people with limited support in real-time academic, professional, and public communication [1], [2], [3]. This creates a strong need for intelligent, automated, and inclusive communication technologies.
Advances in articial intelligence (AI) and deep learning of- fer powerful opportunities to address this gap. Computer vision and convolutional neural networks (CNNs) have shown excel- lent performance in gesture and image recognition tasks [4], [5]. When combined with natural language processing (NLP), speech-to-text (STT), and text-to-speech (TTS) systems, they enable real-time translation between sign language, text, and speech on common devices like cameras and microphones [6], [7], [8].
Recent studies report CNN-based models achieving over 95% accuracy for isolated sign recognition tasks [4], [9], [11], with transfer learning methods such as VGG16 further boost- ing real-time performance [9], [10]. Integrated frameworks for sign-to-speech and speech-to-sign translation have also advanced communication for ASL and ISL users [11], [12], [15]. However, challenges remain, particularly in recognizing continuous gestures, including facial expressions, and ensuring robustness under varying environments [13], [18].
This study aims to overcome these limitations by developing a deep-learning-based, bidirectional sign language recognition and conversion system. The system captures hand gestures via a camera, classies them using CNN-based models, and converts them into text and speech [14], [21]. It also performs reverse translationconverting text or speech into sign images or animationscreating a seamless two-way communication platform [15]. Designed for accessibility, the system requires minimal hardware and delivers real-time, highly accurate
interaction.
A. Key Contributions
To the best of our knowledge, this is among the rst lightweight ISL-focused systems offering real-time bidirec- tional communication with system-level evaluation. The main contributions of this research are summarized as follows:
- We propose a real-time bi-directional Sign Language Communication Framework that integrates vision-based ges- ture recognition and text/speech-to-sign translation within a unied pipeline, enabling seamless interaction between hear- ing and hearing-impaired users.
- A lightweight CNN architecture optimized for real- time Indian Sign Language (ISL) recognition is developed, achieving high accuracy while maintaining low computational overhead suitable for real-world deployment.
- We introduce a custom ISL dataset consisting of 36 gesture classes collected under unconstrained conditions, in- corporating illumination and orientation variations to improve generalization.
- An end-to-end system-level evaluation is performed, analyzing not only classication accuracy but also real-time performance, confusion patterns, and bidirectional communi- cation latency.
- Unlike prior works focusing solely on sign-to-text con- version, the proposed system supports two-way communica- tion, enabling both gesture-to-speech and speech/text-to-sign translation.
- Literature Survey
Sign language recognition has gained signicant attention with advances in assistive technologies and accessibility re- search. Early studies highlighted the role of technology in supporting communication for individuals with disabilities [1], while recent work explored broader accessibility challenges in sign language technologies [2] and emphasized global disability statistics and social inclusion perspectives [3]. Deep learning-based approaches have emerged as dominant solu- tions for sign-to-text and speech translation, where CNN- based frameworks demonstrate strong accuracy in gesture classication and multimodal communication systems [4],[8], [19], [23]. Transfer learning architectures, particularly VGG- based models, have shown effective performance improve- ments in real-time sign translation for ASL [6], whereas comprehensive reviews highlight advancements in deep neural approaches and integration of non-manual cues [7], [18]. Ad- ditional studies explored vision-based classication techniques for humancomputer interaction, applying deep learning to enhance gesture recognition accuracy [8]. Hybrid CNN-LSTM networks combined with attention mechanisms further im- proved temporal modeling for continuous sign sequences using temporal modeling techniques [9], [21].
Although signicant progress has been achieved in sign lan- guage recognition using CNN-based and hybrid deep learning models, most existing studies primarily focus on unidirectional communication, translaing signs into text or speech. Limited attention has been given to integrated bi-directional systems
that support natural interaction between hearing and hearing- impaired individuals.
Furthermore, several reported works rely on computation- ally intensive architectures such as deep transfer learning or transformer-based models, which restrict real-time deployment on low-resource devices. Many studies also evaluate per- formance on controlled datasets, lacking robustness analysis under real-world conditions.
In contrast, the present work emphasizes a computationally efcient CNN-based approach, combined with a system-level design that supports real-time bidirectional communication, making it more suitable for practical assistive applications.
- Proposed Methodology
The proposed methodology introduces a two-way Sign Language Recognition and Text/Speech Conversion System designed to enable seamless communication between hearing and hearing-impaired individuals. The system integrates deep learning, computer vision, and natural language processing to support real-time gesture-to-text/speech conversion and reverse text/speech-to-sign translation. The complete workow of the system is illustrated in Fig. 1.
Fig. 1. Workow of the Proposed Sign Language Recognition and Text/Speech Conversion System
- Data Acquisition
The system supports two modes of input:
- Gesture Acquisition Mode: Real-time hand gesture data is captured using a camera, where frameworks such as Me- diaPipe Hands and OpenCV are used for landmark detection, hand tracking, and extraction of spatial features [17], [19]. If
a custom dataset is used, gesture images are organized into multiple classes representing static hand signs.
- Text/Speech Input Mode: Users provide input through a keyboard or a microphone, and any speech input is converted into text.
- Gesture Acquisition Mode: Real-time hand gesture data is captured using a camera, where frameworks such as Me- diaPipe Hands and OpenCV are used for landmark detection, hand tracking, and extraction of spatial features [17], [19]. If
- Data Preprocessing
Raw gesture inputs undergo preprocessing to enhance image quality and prepare features for classication:
- Hand region extraction is performed using MediaPipe- based landmark detection to isolate the hand and remove background noise.
- The gesture frames are resized (e.g., 128 × 128 or 224 ×
224) and normalized to maintain consistent pixel intensity.
- Gaussian ltering is applied to reduce noise present in real-time video frames.
- Feature extraction is carried out by obtaining landmark coordinates (21 or more key points) that represent nger joints and palm structure.
- To enhance model robustness, data augmentation tech- niques such as rotation, scaling, translation, ipping, and other geometric variations are applied to increase dataset diversity and reduce overtting [19], [22].
- CNN-Based Gesture Classication
A Convolutional Neural Network (CNN) is employed for automated feature extraction and recognition of hand gestures. The architecture consists of:
- Multiple convolutional layers with ReLU activation for hierarchical feature extraction.
- Max-pooling layers for spatial downsampling.
- Batch normalization layers to stabilize learning.
- Fully connected dense layers.
- A Softmax classier to categorize gesture classes. The classier supports:
- Static Gesture Recognition: Alphabet signs, numbers,
and common vocabulary.
- Dynamic Gesture Recognition: Optional continuous sign recognition using frame sequences.
The recognized gesture is displayed as text and converted into speech using a Text-to-Speech (TTS) module.
- Static Gesture Recognition: Alphabet signs, numbers,
- Reverse Translation: Text/Speech to Sign Output
In the reverse communication mode, the system converts spoken or typed sentences into sign language:
- Speech input is transcribed into text.
- The text undergoes tokenization and mapping to sign units.
- Corresponding sign animations, gesture images, or 2D avatar movements are displayed to visually represent the message [12], [15].
This allows hearing-impaired users to understand spoken or written communication through visual sign output.
- Model Training and Optimization
The CNN classier is trained using the categorical cross- entropy loss function and optimized using the Adam opti- mizer [19], [21]. Hyperparameters such as learning rate, batch size, and number of epochs are ne-tuned experimentally. Techniques such as dropout regularization and early stopping are adopted to prevent overtting. Additionally, k-fold cross- validation is performed to evaluate model robustness across different subsets.
- Evaluation Metrics
The system performance is evaluated using the following metrics:
- Accuracy: A high accuracy score indicates that the model is performing well across most gesture classes and is reliable for real-time communication.
- Precision: In the context of sign language, high precision means the system rarely misclassies other gestures as the target gesture, reducing false alarms.
- Recall: For sign language recognition, high recall ensures that the model does not miss important gestures, which is crucial for smooth and complete communication.
- F1-score: In sign language applications, where some gestures may appear more or less frequently, the F1-score ensures fair measurement of model performance.
- Confusion Matrix:For a sign language system, the con- fusion matrix is essential to understand real-world reliability and per-gesture accuracy [18].
- System Integration
The complete system integrates deep learning and user- interface components for real-time usability. The backend CNN model is deployed using Python Flask or FastAPI with RESTful APIs, enabling seamless communication between the model and the interface. The frontend interface, developed using ReactJS or Android, supports webcam input, speech input, gesture visualization, and output display. A Text-to- Speech module generates audio responses, while the sign animation module displays gesture-based outputs. A database is optionally included to store gesture mappings, logs, and user interactions, ensuring efcient data management and retrieval.
- Workow Summary
The overall workow of the proposed system is summarized as follows (Figure 1):
- Real-time camera input Hand landmark extraction (MediaPipe/OpenCV) Preprocessing CNN-based gesture classication Text and speech output gener- ation.
- Text or speech input ASR/Text processing Sign mapping Sign animation or avatar-based visual com- munication.
- All recognition outputs, processed text, and sign map- pings are stored in the system database and made accessible through the integrated user interface for real- time interaction and future reference.
- Data Acquisition
- System Architecture And Mathematical Approach
- System Architecture
The proposed system architecture for the Real-Time Sign Language Recognition and Bi-directional Communication Framework follows a modular design to ensure scalability, reliability, and smooth interaction between haring and non- hearing users [15], [18]. It consists of two main subsys- temsthe Sign Language Recognition (SLR) module and the Text/Speech-to-Sign modulemanaged by a central Sys- tem Orchestrator. Each subsystem functions independently but exchanges data through well-dened APIs for efcient integration and real-time communication.
In the SLR subsystem, a live video stream is captured from the users camera. MediaPipe/OpenCV-based detectors extract hand regions, landmarks, and gesture features from each frame. This preprocessing step stabilizes the input, reduces noise, normalizes spatial differences, and generates high-quality features for model inference. These features are then passed to a deep learning model that performs gesture classication in real time. The predicted text is sent to the orchestrator, which may also forward it to a text-to-speech module for audio output.
The Text/Speech-to-Sign subsystem supports communica- tion from hearing individuals to sign-language users. Speech input is captured via a microphone and processed using a speech-to-text engine. The resulting text undergoes normaliza- tion and semantic mapping in the orchestrator, linking linguis- tic tokens with predened sign-language motion sequences. A 3D virtual avatar then animates the mapped gestures, enabling intuitive visual communication. The subsystem also accepts typed text as an alternative input.
This modular architecture improves system robustness, sim-
- Mathematical Approach
This section describes the mathematical formulation of the proposed Bi-Directional Sign Language Recognition and Conversion System. The pipeline consists of gesture represen- tation, preprocessing, CNN-based classication, loss optimiza- tion, text-to-speech mapping, and reverse speech/text-to-sign conversion.
- Gesture Representation and Detection: Let the input video stream from the camera be represented as a sequence of n frames.
V = {F1, F2,…, Fn} (1)
where, each frame Fi RH×W ×C denotes the ith image with height H, width W , and C color channels.
Using handdetection algorithms such as MediaPipe or OpenCV, the hand region is extracted from each frame.
Hi = D(Fi) (2)
where, D(·) represents the hand-detection function and Hi is the cropped hand region.
The hand landmarks (keypoints) detected from the hand region are represented as
Li = {(xj, yj, zj) | j = 1, 2,…,K} (3)
where, K is the number of detected landmarks (e.g., K = 21 for the standard MediaPipe hand model), and (xj, yj, zj) denote the normalized 3D coordinates of each landmark.
- Pre-processing: Each frame undergoes a preprocessing transformation P (·)
Hi = P (Fi) (4)
which includes resizing, denoising, thresholding, and hand- region cropping.
To enhance robustness, data augmentation is modeled as a random transformation Tk
Hi = Tk(Hi), Tk Taug (5)
plies maintenance, and allows each servicerecognition,
translation, and animationto scale independently. It is com-
where, T
aug
includes rotation, ipping, brightness and scaling
patible with desktops, laptops, and embedded platforms. Over- all, the design enables inclusive, real-time communication by bridging the gap between hearing and non-hearing communi- ties. Figure 2 presents the high-level system architecture.
operations.
For landmark sequences, temporal smoothing is applied
¯i = S( i:i+ ) (6) where, S is a moving-window smoothing operator.
- Gesture Classication Using CNN: A Convolutional Neural Network (CNN) extracts visual features.
zi = E(Hi), zi Rd (7)
where, E denotes convolution, ReLU activation and pooling layers parameterized by .
The CNN output is passed through a fully connected layer
and the class probabilities for C gesture classes are computed using Softmax.
Fig. 2. System Architecture of the Proposed Real-Time Sign Language Recognition and Bi-directional Communication System.
The predicted gesture label is
- Loss Function : The model is trained using categorical cross-entropy loss.
where, yij is the one-hot encoded ground-truth label [19], [23].
- Text-to-Speech (TTS) Mapping: Once a gesture is rec- ognized and converted to text T , the TTS system generates audio.
Let the text sequence be
where wi denotes each word.
- Speech/Text-to-Sign Conversion: For reverse communi- cation, speech is rst converted to text.
Each word wi is then mapped to a gesture representation:
- Results and Performance Analysis
- Dataset Overview
The proposed real-time Sign Language Recognition and Bidirectional Communication System was evaluated using a custom Indian Sign Language (ISL) dataset consisting of 36 gesture classes, including alphabets (AZ) and digits (09). Data augmentation techniques such as rotation, brightness adjustment, and horizontal ipping were applied to improve robustness against variations in lighting and hand orientation [22]. To verify the reliability of performance gains, multiple training runs were conducted with different random initial- izations. The proposed model consistently achieved accuracy improvements of 1.53% over baseline methods, indicating stable and statistically meaningful performance.
TABLE II
PERFORMANCE COMPARISON WITH EXISTING METHODS
where fs(·) is a lookup or neural generator that outputs gesture keypoints.The nal sign animation sequence is
- Dataset Overview
- Results and Performance Analysis
- Gesture Representation and Detection: Let the input video stream from the camera be represented as a sequence of n frames.
- System Architecture
- Dataset Description and Collection Protocol
The experimental evaluation was conducted using a custom Indian Sign Language (ISL) dataset created specically for this study. The dataset comprises 36 distinct gesture classes, including 26 alphabet signs (AZ) and 10 numerical signs (09) [11], [16].
Each gesture class contains approximately 120 samples, resulting in a total dataset size of 4,320 images. Gesture samples were collected from six different participants, ensur- ing inter-signer variability. The dataset includes variations in illumination, background clutter, hand orientation, and scale to simulate real-world conditions.
Prior to training, the dataset was divided into training (70%), validation (15%), and testing (15%) subsets, ensuring that samples from each class were uniformly distributed. Data augmentation techniques such as rotation, horizontal ipping, and brightness adjustment were applied to mitigate overtting.
TABLE I
DATASET COMPOSITION FOR SIGN LANGUAGE GESTURE CATEGORIES
- Training and Validation Performance
Fig. 3 presents the training accuracy and loss curves of the proposed CNN model across multiple epochs. The accuracy curve shows a steady and consistent rise, indicating that the model progressively enhances its ability to recognize sign language gestures. At the same time, the loss curve initially exhibits minor uctuations but gradually decreases, conrming that the network effectively reduces classication errors as training continues. The overall trends in both curves demonstrate stable learning behavior, proper convergence, and reliable performance improvement of the CNN model throughout the training process.
Fig. 3. Training accuracy and loss curves of the proposed CNN model.
- Confusion Matrix Analysis</>
Fig. 4 illustrates the confusion matrix of the proposed CNN model for sign language gesture classication. The strong diagonal dominance indicates that the model achieves consis- tently high accuracy across most gesture classes, with very few misclassications. The minimal off-diagonal values further conrm that the network effectively distinguishes visually similar signs. Overall, the confusion matrix demonstrates the robustness and reliability of the proposed model in recognizing a wide range of sign language gestures.
TABLE III CLASSIFICATION PERFORMANCE METRICS
Fig. 5. Real-Time System Output Showing Gesture Detection and Sign Conversion
Fig. 4. Confusion matrix of the proposed CNN model on the test dataset.
- System Output
The real-time outputs of the proposed system are illus- trated in Fig. 5. During testing, the system accurately de- tects hand gestures, extracts key landmarks, and displays the predicted sign with minimal delay. The recognized gesture is also converted to speech, while the reverse module trans- lates text or speech into the corresponding sign images or animations, demonstrating smooth two-way communication.
- Ablation Study
An ablation study was conducted to evaluate the impact of individual system components on overall recognition per- formance. Initially, the CNN model was trained without data augmentation, resulting in a noticeable drop in classication accuracy. Incorporating data augmentation improved robust- ness and reduced misclassication of visually similar gestures. Additionally, experiments were performed without MediaPipe-based landmark extraction, using raw image inputs alone. The results demonstrated reduced accuracy and higher confusion rates, highlighting the effectiveness of landmark-guided preprocessing. These ndings conrm that each component of the proposed pipeline contributes
meaningfully to system performance.
TABLE IV
ABLATION STUDY OF SYSTEM COMPONENTS
- Experimental Setup
The average end-to-end system latency, including gesture capture, preprocessing, classication, and output rendering, was measured at approximately 120150 ms, which is ac- ceptable for real-time humancomputer interaction. All exper- iments were conducted on a system equipped with an Intel Core i7 processor, 16 GB RAM, and an NVIDIA GTX 1650 GPU. The CNN model was implemented using TensorFlow and trained for 50 epochs with a batch size of 32 and a learning rate of 0.001. Early stopping was employed based on validation loss to prevent overtting. Real-time inference was performed at an average rate of 2225 frames per second (FPS), enabling smooth interaction without noticeable latency.
-
-
- Conclusion
This study presented a real-time, bi-directional sign lan- guage communication system that combines deep learning, computer vision, and speech processing to address accessi- bility challenges faced by the hearing-impaired community. Through extensive experimentation on a custom ISL dataset, the proposed CNN-based model demonstrated superior accu- racy and robustness compared to existing approaches, while maintaining real-time performance suitable for deployment on consumer-grade hardware [7], [15].
The systems robustness is ensured through rigorous dataset preparation and augmentation, resulting in high accuracy across diverse environments. Future enhancements aim to ex- pand the scope by integrating regional sign language dialects, developing mobile applications, and incorporating non-manual cues, such as facial expressions, to further rene recognition precision. Ultimately, this research not only demonstrates the potential of AI-driven assistive technology but also promotes social inclusion, ensuring equal participation and indepen- dence for the deaf community in daily interactions.
All gesture data used in this study were collected with informed consent from participants. No personally identi- able information was recorded. The experimental procedures comply with ethical research standards for human-centered AI systems. Model congurations, training parameters, and preprocessing steps are described in detail to ensure repro- ducibility.
Despite promising results, the current system is limited to isolated gesture recognition and does not fully capture complex grammatical structures or non-manual cues such as facial expressions. Additionally, the dataset size, while sufcient for validation, can be further expanded to include a broader range of signers and regional variations.
- LIMITATIONS AND FUTURE WORK
Future research directions focus on enhancing the systems portability and robustness to maximize real-world impact. A primary objective is the migration of the current framework into a standalone mobile application, enabling ubiquitous, on- the-go communication for users without reliance on desktop infrastructure. Additionally, the scope extends to incorporating depth-based 3D motion tracking to capture complex dynamic gestures and expanding the training corpus to include diverse, multilingual sign language datasets. Further renements will target the generation of high-delity, realistic avatars to ensure more natural visual feedback, thereby establishing a compre- hensive and inclusive assistive communication ecosystem.
- Acknowledgement
- Conclusion
-
We would like to express our heartfelt gratitude to Prof. Prachi. G. Dhavane for her valuable guidance, continuous encouragement, and insightful suggestions throughout the course of this project. Her support played a crucial role in shaping the quality and direction of this work. We also thank the faculty members of the Department of Computer Science and Design Engineering, Padmashri Dr. Vitthalrao Vikhe Patil College of Engineering, Ahilyanagar, for providing the necessary resources and academic support.
References
- P. K. Ray, Assistive Technologies for Persons with Disabilities, Cham, Switzerland, pp. 119-150, Springer, 2021.
- M. Bragg et al., The accessibility of sign language technologies, Proc. ACM Human-Computer Interaction, vol. 5, no. CSCW2, pp. 129, 2021.
- World Health Organization, Deafness and hearing loss, WHO Fact Sheet, 2023.
- P. Duraisamy, A. Abinayasrijanani, M. A. Candida, and P. D. Babu, Transforming sign language into text and speech through deep learning technologies, Indian J. Sci. Technol., vol. 16, no. 23, pp. 17311740, 2023.
- S. Rangate, R. Sawant, T. Gund, H. Mule, K. Vhatkar, and A. Dhepe, Implementation of multilingual sign language and speech conversion system using deep learning, Implementation Research, D. Y. Patil Institute of Technology, 2023.
- S. Thakar, S. Shah, B. Shah, and A. V. Nimkar, Sign language to text conversion in real time using transfer learning, arXiv preprint arXiv:2204.11267, 2022.
- B. A. Al Abdullah, G. A. Amoudi, and H. S. Alghamdi, Advancements in sign language recognition: A comprehensive review and future prospects, IEEE Access, vol. 12, pp. 3556735585, 2024.
- A. K. Jain and S. P. Kumar, Deep learning based sign language recognition for humancomputer interaction, Multimedia Tools and Applications, vol. 81, pp. 1862318647, 2022.
- Y. Cao et al., Sign language recognition based on CNN-LSTM with attention mechanism, IEEE Access, vol. 9, pp. 9468994698, 2021.
- K. Simonyan and A. Zisserman, Very deep convolutional net- works for large-scale image recognition (VGG16), arXiv preprint arXiv:1409.1556, 2015.
- N. Kumar and R. Singh, Indian sign language recognition sing deep learning, Procedia Computer Science, vol. 192, pp. 431440, 2021.
- T. Shanableh et al., Automated translation of Arabic sign language: a deep learning approach, Applied Sciences, vol. 12, no. 7, pp. 35643578, 2022.
- D. Camgoz, S. Hadeld, O. Koller, and R. Bowden, Neural sign language translation, Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 77847793, 2018.
- H. Cooper et al., Deep learning-based isolated sign recognition for continuous sign language translation, Pattern Recognition Letters, vol. 146, pp. 107114, 2021.
- M. A. Kadir, R. J. Green, and F. Tian, Automatic sign language recognition: A survey, IEEE Trans. Human-Machine Systems, vol. 52, no. 5, pp. 681693, 2022.
- S. K. Bhuyan, A. K. Sarma, and P. K. Deka, A vision-based Indian Sign Language recognition using deep learning, Journal of Ambient Intelligence and Humanized Computing, vol. 12, no. 6, pp. 62036215, 2021.
- M. S. Rautaray and A. Agrawal, Vision-based hand gesture recogni- tion for human-computer interaction: A survey, Articial Intelligence Review, vol. 43, no. 1, pp. 154, 2015.
- R. Rastgoo, K. Kiani, and S. Escalera, Sign language recognition: A deep survey, Expert Systems with Applications, vol. 164, pp. 113794, 2021.
- A. Oyedotun and A. Khashman, Deep learning in vision-based static hand gesture recognition, Neural Computing and Applications, vol. 28, no. 12, pp. 39413951, 2017.
- D. Chai, S. Wong, and M. J. Nixon, Dynamic hand gesture recognition using convolutional neural networks, Pattern Recognition Letters, vol. 119, pp. 8289, 2019.
- A. K. Singh and R. K. Yadav, Real-time American Sign Language recognition using CNN and LSTM, Procedia Computer Science, vol. 171, pp. 465474, 2020.
- S. V. Nandhini and M. Manikandan, Deep learning- based Indian Sign Language recognition with computer vision, International Journal of Advanced Computer Science and Applications, vol. 12, no. 5, pp. 212218, 2021.
- M. Pigou, S. Dieleman, P.-J. Kindermans, and B. Schrauwen, Sign language recognition using convolutional neural networks, Lecture Notes in Computer Science (LNCS), Springer, 2015.
