GestureFlow : A Machine Learning Based Gesture Control System

Avni Saxena; Divyanshi Vishnoi; Adnan Salmani; Adit Verma

doi:10.17577/IJERTCONV14IS050029

IIRA 5.0 - 2026 (Volume 14 - Issue 05)

GestureFlow : A Machine Learning Based Gesture Control System

DOI : 10.17577/IJERTCONV14IS050029

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 21
Authors : Avni Saxena, Divyanshi Vishnoi, Adnan Salmani, Adit Verma, Amit Saxena
Paper ID : IJERTCONV14IS050029
Volume & Issue : Volume 14, Issue 05, IIRA 5.0 (2026)
Published (First Online) : 24-05-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

GestureFlow : A Machine Learning Based Gesture Control System

1Avni Saxena, 2Divyanshi Vishnoi, 3Adnan Salmani,

4Adit Verma, 5Amit Saxena

1,2,3,4,5 Moradabad Institute Of Technology, Moradabad

1avni19092002@gmail.com , 2vishnoidivyanshi30@gmail.com , 3writersalmani@gmail.com ,

4vadit2632@gmail.com ,

5er.amitsaxena79@gmail.com

Abstract

GestureFlow is an innovative machine-based system that allows you to check your computerfree computer using gestures and voice commands. By integrating computer recognition technologies and speech recognition, it offers a more affordable and user -friendly alternative to traditional input devices such as keyboards and mice. The system uses OpenCV to recognize gestures in real time and library of Python speech recognition to interpret voice commands. GestureFlow is particularly beneficial for individuals with mobility damage and provides greater independence from operating computers. The system is structured with multiple modules, including the gesture recognition module, the voice command recognition module, and the web interface for feedback and real -time fit. A Convolutionary neural network (CNN) is used to classify gestures, while speech commands are processed using the latest algorithms. GestareFlow has potential applications in availability, playing and intelligent home automation, which interants with the human computer more intuitive and inclusive. Power testing has shown high accuracy in recognizing predefined gestures and voice commands, which verifies the system efficiency.

Keywords:

Machine Learning, Gesture Recognition, Voice Command, Human-Computer Interaction (HCI), Computer Vision, Speech Recognition, Convolutional Neural Network (CNN), RealTime Processing, Smart Home Automation, Artificial Intelligence (AI).

Introduction

Technological progress significantly transformed the interaction with the human computer (HCI), making the device more intelligent and more intuitive. Since the first days of computer technology, where users rely on the command line, to the modern era of touch screens and voice assistants, the HCI continually evolved to meet the growing requirements of comfort and availability. Despite these progress, traditional input devices such as keyboards, mice and touch screens remain the primary means of interaction. Although these devices are effective

for most users, they represent significant challenges for individuals with disabilities and for those who work in a hands-free environment, such as in an automotive or industrial environment. GestureFlow is an alternative interface by combining machine learning to recognize gestures and process voice order. The integration of both modalities is to provide an inclusive experience from the hands-free computing. Gesture recognition systems interpret human hands' movements into commands that can be used by computer vision, while voice recognition systems convert spoken language into machines uncomfortable instructions. Unlike existing systems that specialize in gesture or voice recognition, GestareFlow offers an integrated solution that increases the user experience. For example, doctors can control medical devices in sterile environments using gestures, while players can experience absorbing interactions. Similarly, intelligent home automation systems can be operated by simple hand gestures and voice commands, which reduces the physical interfaces.

.

GestureFlow uses a standard webcam for visual input and microphone for audio input, eliminating the need for expensive hardware, such as leap movement or Kinect sensors. Machine learning models used for gesture classification are created using convolutional neural networks (CNN), known for their accuracy in image recognition tasks. The library of Python's speech recognition is used to process voice orders, which ensures robust recognition even in a noisy environment. In addition, the web interface provides users in real time to recognize gestures and voice, offers opportunities to customize and monitor the system. This document further examines the development, implementation and evaluation of gestures. The following sections include statements about the problem, objectives, literature overview, proposed system architecture, methodology, implementation details, results and conclusions. The aim is to demonstrate the feasibility and efficiency of GestreFlow as a practical solution

for interaction with the human computer, while identifying areas for future research and strengthening. Through GestreFlow, we imagine the future where interaction with the human computer is not only accessible to everyone but also more intuitive and absorbing. The modularity and adaptability of the system allow it to be deployed across different domains and provides users with trouble -free control over the digital environment using a natural combination of gestures and voice commands.
Problem Statement

Traditional input methods represent significant challenges for users with limited mobility. Keyboard and mice require fine motor skills, limiting availability for individuals with physical disorders. Existing gesture and voice recognition systems often work separately and lack the benefits of a unified interface. In addition, specialized hardware solutions are expensive and impractical for extended use. GestureFlow deals with these problems by providing a complex, cost -effective and customizable solution. By integrating gesture and voice recognition, it makes it easier to easily rely on a traditional input device by means of machine learning.
Literature Review

Implementation can be divided into four main steps: 1. Improved image and segmentation 2. Directional evidence 3. Extracting element 4 [1]. Classification. Although this work focused on more than four categories, the main limitation was the color change very quickly by changing different lighting conditions. For example, the presence of hand regions is not recognized due to insufficient light state, but the non-healing regions are confused with hand regions due to the same color [2]. It includes three main steps of the healing identification system. 1 Segmentation 2. Functional display 3. Detection technology. The system is based on recognizing hand gestures by modeling hands in spatial domains. The system uses a variety of 2D and 3D geometric and negative emmetric models for modeling. He used the fuzzy algorithm C-means, which gave him an accuracy of 85.83%. The main drawback of the system is the detection of temporary spatial gestures, i.e. gesture movements, and it is not possible to classify images with complex backgrounds. That is, if there are other objects in the scene that have manual objects [3]. This study focuses on detecting hand gestures using a variety of procedures, including data collection, preprocessing, and segmentation. You must select the appropriate input device for data acquisition. There are many input devices for data collection. Some of them are data gloves, brands, and hands (from webcam/Kinect 3D sensors). However, the limitation of this work is the changes in lighting, rotation, alignment, problems and special hardware scaling, which is very expensive [4]. The system implementation is divided into three phases: A limitation here is that the algorithms used here are less effective compared to neuronal networks. Here, what is considered here is very small and can be used to recognize very few character gestures. The system architecture consists of:

1. Image 2. Hand area segmentation. 3. Distance transforms the gesture method [5]. This

system limit includes 1. The number of recognized gestures is smaller. 2. The recognized gestures were not used to control the application [6]. This implementation uses three main algorithms: 1 AlgorithMmviola and Jones. 2. Convex fuselage algorithm. 3.. Adaboost-based learning algorithm. This work was performed by training many functions that represent local contour sequences. A limitation of this system is that it requires two images for classification. One of them is a positive sentence containing the desired image, and the other is a negative sentence containing the image [7]. The system implementation consists of three components:

1. Hand recognition 2. The following methodology has been implemented: 1. The input image is temporarily processed and the hand detector attempts to exclude the hand from the input. Figure 2. CNN is used to recognize gestures from processed images, while Kalman filters are used to estimate the position of Christ. 3. The results of detection and estimation are sent to the Control Center, which determines the measures. One limitation of this system is that it only recognizes static images [8]. This implementation focuses on detecting hand gestures using Java and Neuronal networks. It is divided into two phases. -1. Specified module using Java. Recognizes the video background and conversion into video-HSB by deriving it into video-HSB. 2. The second module is the prediction module. An annoying folding network is used. The entrance image is received by Java. The input image is transferred to a neural network and analyzed with respect to the data file image. One limitation of this system is that the connection between Java and Python modules requires base programming.

Proposed System

Gestureflow consists of four key modules:

. FRSTURE recognition module: captures hand gestures using a webcam and processes themusing OpenCV and Tensorflow. Model learning models classify real -time gestures.

. Voice command recognition module: uses the Python Speech Resognition Library library to interpret voice commands. The commands are mapped for predefined actions.

. Applications for applications: developed using HTML, CSS and JavaScript, providing users in real time about recognized gestures and voice commands. Users can customize command mapping.

. Integration and implementation of actions: ensures suitable system actions based on recognized gestures and commands.
Methodology
1. Data Collection: To ensure accurate model training, a comprehensive data set of gestures of hands and voice commands will be collected. The data file will include a wide range of gestures representing different commands captured from users of different age groups, leather tones and backgrounds. Similarly, voice command samples will be collected across different accents, languages and noise environments. Each data sample will be marked and annotated using tools such as labelimg for images and audacities for audio files. The data file will be divided into training, verification and testing of subset to ensure the robust power of the model.
2. Training training: Convolutional neural networks (CNN) will be used to recognize hands gestures due to their exceptional ability in image classification. The CNN pre trained model, such as MobileNeTV2 or Inceptionnet, will be tuned using the gesta data file. Transfer learning will be used to speed up training and improve accuracy. For voice recognition, speech samples will be processed using the Python Library. The model will be tuned using supervisory learning, ensuring reliable command recognition even in noisy environment. Regular inspections of validation and accuracy will be performed to optimize both models.
3. Real-time processing: OpenCV will process video processing in real time by capturing frames from the webcam and converting to gray for better computing efficiency. The images will be changed and have undergone CNN for classification. At the same time, the recognition of speech captures and converts the sound into text using speech algorithms to the text. The models will work in parallel, shorten the response time and provide trouble -free feedback in real time. Optimized Tensorflow features ensure low latency predictions.
4. Integration: PyQt is a Python library for creating GUI applications using the Qt framework. Install it with pip install PyQt5. Create a QApplication instance, design windows using QMainWindow or QWidget, add widgets like QLabel or QPushButton, and launch with app.exec_(). For drag-and-drop UI design, use Qt Designer and convert .ui files to Python with pyuic5. Its ideal for building modern, cross-platform desktop apps.
  
  Overall, GestureFlow's methodology ensures accurate recognition, real-time responsiveness, and an intuitive user experience, making it a comprehensive solution for hands-free computer control.
Implementation

GestureFlow is designed as a modular system with four main components that work in synchronization to deliver a seamless and interactive user experience:
1. Gesture Recognition: The system captures live webcam input using OpenCV and processed the video frames using a Convolutional Neural Network (CNN) model. The CNN classifies the gestures in real-time, recognizing predefined actions such as pointing, swiping, or signaling stops. The systems training data includes various hand gestures collected from diverse users to ensure accuracy across the different hand shapes, sizes, and skin tones.
2. Voice Recognition: Microphone input is captured using Pythons SpeechRecognition library, which employs speech-to-text conversion algorithms. Advanced noise reduction and the natural language processing (NLP) techniques ensure accurate the voice command recognition, even in noisy environment. The recognized commands are then mapped to specific system actions, providing the hands-free interaction experience.
3. Web App: A user-friendly web application acts as the interface for GestureFlow. Built using PyQt, it displays real-time feedback of recognized gestures and voice commands. Users can configure custom actions by assigning specific gestures or voice commands to the desired task. The web app also offers diagnostic information, allowing users to monitor the system performance and adjust settings as needed.
4. Action Execution: Once gestures and the voice commands are recognized, and the corresponding actions are executed through system-level integrations. These actions may includes opening applications, controlling multimedia playback, adjusting system volume, or navigating through files and using APIs and background services, GestureFlow ensures that commands are executed with minimal latency.

Here are the some gestures :

Thumbs Up Increase brightness
Thumbs Down Decrease brightness
Open Palm Pause and play Media
Fist Open Notepad
Peace Sign Open cmd
OK Sign Open browser
L Sign (Thumb + Index Finger) Open File Explore
Rock Sign Open music player.
Both Hands Open Maximize window.
Clap Gesture Take a screenshot
Hands in a Triangle Shape Open Task Manager.
Both Hands Making a 'V' Shape (Victory Pose) Launch Voice Assistant.
Hands Moving Apart (Like Zoom Out Gesture) Zoom out.
Hands Moving Closer (Like Zoom In Gesture) Zoom in.

Results and Discussion

Initial testing shows GestureFlow achieving over 95% accuracy in gesture recognition under well-lit conditions. Voice command recognition remains reliable in moderate noise environments. Real-time feedback from the web app ensures users can easily track their inputs and make adjustmnts.
Conclusion and Future Scope

GestureFlow provides an accessible and intuitive computer control system by integrating gesture recognition and voice command technology. Thanks to its robust architecture and flexible design, it is suitable for various applications, including accessibility aids, games and intelligent home automation. As with any system, however, there is room for improvement and expansion.

One of the primary areas for future improvements is the robustness of the system under low lighting conditions. While GestureFlow works well in well -lit environments, its accuracy can

decrease in poor lighting. Implementation of advanced image preliminary processing techniques such as a histogram alignment, adaptive thresholds and algorithms to improve low

-light image, can alleviate these challenges. In addition, the integration of infrared sensors or deep cameras can further increase the accuracy of recognition in the dark environment.

In addition, GestareFlow can be expanded to specialized applications, including Augmented Reality (AR) and Virtual Reality (VR). By providing gestures and voice control in a absorbing environment, users can go through digital worlds with greater ease and realism. This technology can benefit from benefiting from healthcare facilities, educational simulations and distant cooperation platforms.

Finally, the implementation of storage and cloud -based processing can increase the scalability of GestureFlow. Cloud infrastructure would allow users access to personalized gestures and voice command settings on multiple devices. In addition, machine -based machine -based models can be constantly updated to improve performance and adapt to the developing needs of users.

In conclusion, GestureFlow is a promising solution for inclusive and intuitive interaction with the human computer. Its continuing development will not only deal with current restrictions, but also unlock new possibilities of availability, productivity and absorbing experiences. Through continuous research and user feedback, GestureFlow has the potential to become a leading HCI innovation, emphasizing users around the world.
References

[1]. M. Panwar and P. Singh Mehra, Hand gesture recognition for human computer interaction, 2011 International Conference on Image Information Processing, Shimla, pp. 17, 2011.

[2]. Rafiqul Zaman Khan and Noor Adnan Ibraheem. Comparitive Study of Hand Gesture Recognition System. International Conference of Advanced Computer Science & Information Technology, 2012.

[3]. Arpita Ray Sarkar, G. Sanyal, S. Majumder. Hand Gesture Recognition Systems: A Survey. International Journal of Computer Applications, vol. 71, no.15, pp. 25-37, May 2013.

[4]. Manjunath AE, Vijaya Kumar B P, Rajesh H. Comparative Study of Hand Gesture Recognition Algorithms. International Journal of Research in Computer and Communication Technology, vol 3, no. 4, April 2014.

[5]. Dnyanada R Jadhav, L. M. R. J Lobo, Navigation of PowerPoint Using Hand Gestures, International Journal of Science and Research (IJSR) 2015.

[6]. Ruchi Manish Gurav, Premanand K. Kadbe, Real time finger tracking and contour detection for gesture recognition using OpenCV, IEEE Conference May 2015, Pune India.

[7]. Pei Xu, Department of Electrical and Computer Engineering, University of Minnesota, A Real-time Hand Gesture Recognition and Human-Computer Interaction System, Research Paper April 2017.

[8]. P. Suganya, R. Sathya, K. Vijayalakshmi. Detection and Recognition of Gestures to Control the System Applications by Neural Networks. International Journal of Pure and Applied Mathematics, vol. 118, no. 10, pp. 399-405, January 2018.

GestureFlow : A Machine Learning Based Gesture Control System

Abstract

Keywords:

Introduction

Problem Statement

Literature Review

Proposed System

. FRSTURE recognition module: captures hand gestures using a webcam and processes themusing OpenCV and Tensorflow. Model learning models classify real -time gestures.

. Voice command recognition module: uses the Python Speech Resognition Library library to interpret voice commands. The commands are mapped for predefined actions.

. Applications for applications: developed using HTML, CSS and JavaScript, providing users in real time about recognized gestures and voice commands. Users can customize command mapping.

. Integration and implementation of actions: ensures suitable system actions based on recognized gestures and commands.

Methodology

Implementation

Thumbs Up Increase brightness

Thumbs Down Decrease brightness

Open Palm Pause and play Media

Fist Open Notepad

Peace Sign Open cmd

OK Sign Open browser

L Sign (Thumb + Index Finger) Open File Explore

Rock Sign Open music player.

Both Hands Open Maximize window.

Clap Gesture Take a screenshot

Hands in a Triangle Shape Open Task Manager.

Both Hands Making a 'V' Shape (Victory Pose) Launch Voice Assistant.

Hands Moving Apart (Like Zoom Out Gesture) Zoom out.

Hands Moving Closer (Like Zoom In Gesture) Zoom in.

Results and Discussion

Conclusion and Future Scope

References