DOI : 10.17577/IJERTCONV14IS040029- Open Access

- Authors : Meenakshi Yadav, Sukhdanshi Varma, Simran Arya, Sheelu, Sonam
- Paper ID : IJERTCONV14IS040029
- Volume & Issue : Volume 14, Issue 04, ICTEM 2.0 (2026)
- Published (First Online) : 24-05-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
Real-Time Air Gesture Recognition Using Computer Vision for Touchless Human-Computer Interaction
Meenakshi Yadav1, Sukhdanshi Varma2, Simran Arya3, Sheelu4, Sonam5 Department of Computer Science & Engineering
Moradabad Institute of Technology
1meenakshiyadav2309@gmail.com, 2sukhdanshi@gmail.com, 3sarya0558@gmail.com, 4shelukumari91@gmail.com, 5sonamsingh9045267458@gmail.com
Abstract
The Air Gesture system presents a touchless human-computer interaction (HCI) framework that enables users to control computer operations through dynamic hand gestures performed in mid- air. The system removes dependency on conventional input devices such as a mouse, keyboard, or touchscreen by utilizing real-time hand gesture recognition through a monocular webcam. This approach provides a low-cost and hardware-efficient alternative for natural interaction.
The proposed system employs computer vision and machine learning techniques for hand detection, tracking, and gesture classification. Key stages include video frame acquisition, hand region segmentation, feature extraction based on hand landmarks and motion trajectories, and gesture recognition using trained machine learning and deep learning models. Temporal information is captured to accurately recognize dynamic gestures, ensuring robustness against variations in hand orientation, scale, and illumination. The recognized gestures are mapped to system-level commands such as cursor movement, clicking, scrolling, virtual typing, drawing, and multimedia control.
Real-time inference is achieved with low latency, enabling smooth and responsive interaction. Experimental evaluations demonstrate reliable gesture recognition performance under diverse environmental conditions. The system is hygienic, scalable, and user-friendly, making it well suited for applications in smart classrooms, interactive presentations, public interfaces, and assistive technologies. The proposed air gesturebased interaction model highlights the potential of vision-driven intelligent interfaces in advancing next-generation humancomputer interaction systems.
Keywords: Air Gesture Recognition, HumanComputer Interaction, Computer Vision, Hand Landmark Detection, Touchless Interface, Real-Time Systems
-
Introduction
Humancomputer interaction (HCI) has undergone significant evolution, transitioning from traditional physical input devices such as keyboards and mice to more natural and intuitive interaction paradigms [1]. With increasing demand for touchless and hygienic interfaces, air gesturebased interaction has gained attention as an effective solution for contactless human computer communication [2]. Air gesture technology allows users to interact with computing systems through hand movements performed in mid-air, without requiring physical contact or wearable devices [3].
Recent advancements in computer vision and machine learning have enabled real-time hand detection and tracking using conventional cameras [4]. Vision-based air gesture systems typically employ hand landmark detection, finger position analysis, and motion tracking to recognize predefined gestures and map them to computer operations [5]. Such systems provide flexible and user-friendly alternatives to conventional input methods and have shown promising applications in smart classrooms, presentations, virtual reality environments, and assistive technologies [6].
-
Literature Survey
Early research in gesture recognition primarily relied on specialized hardware devices such as data gloves, magnetic sensors, and depth cameras to capture hand movements and gestures. While these systems provided accurate motion tracking, they significantly increased system cost, computational complexity, and user discomfort, thereby limiting their widespread adoption in real-world applications [7]. Moreover, the dependency on dedicated hardware reduced portability and restricted usage to controlled environments.
To overcome these limitations, vision-based gesture recognition approaches using standard RGB cameras have gained substantial attention. These methods leverage advances in computer vision and machine learning to detect and interpret hand gestures without requiring additional sensors. Vision-based systems are cost-effective, portable, and easier to deploy, making them suitable for a wide range of applications including humancomputer interaction, virtual environments, and assistive technologies [8].
Recent studies emphasize the use of landmark-based hand tracking techniques, which represent the hand using key anatomical points such as fingertips, joints, and palm centers. Among these approaches, MediaPipe Hand Tracking has emerged as a robust and efficient framework capable of detecting 21 three-dimensional hand landmarks with high accuracy and low latency in real time. MediaPipe employs a lightweight deep learning pipeline that combines palm detection and hand landmark regression, enabling stable tracking even under varying lighting conditions and complex backgrounds [9].
Several researchers have demonstrated that landmark-based hand tracking significantly improves gesture recognition accuracy and responsiveness compared to traditional feature- based methods. These techniques allow precise modeling of finger movements and hand orientation, making them particularly effective for real-time air gesture recognition and touchless interaction systems [10], [11]. Consequently, landmark-driven vision-based approaches have become a preferred solution for developing scalable, user-friendly, and efficient gesture recognition frameworks.
-
Methodology
-
Objective
The primary objective of this project is to design and implement a real-time air gesture recognition system that enables users to control computer functions without physical contact. The system aims to provide an intuitive and touchless humancomputer interaction mechanism by capturing live video input through a standard webcam, detecting and tracking hand landmarks, recognizing predefined gestures, and executing corresponding system-level commands. Emphasis is placed on achieving high accuracy, low latency, and ease of use while maintaining compatibility with commonly available hardware.
-
Gesture Identification Process
The gesture identification process is based on analyzing the spatial relationships and relative positions of hand landmarks detected in each video frame. Using a landmark-based hand tracking model, key points such as fingertips, joints, and palm center are extracted to represent hand posture and motion. Simple geometric computations, including Euclidean distance, relative angles, and positional thresholds, are applied to determine the state of individual fingers (extended or folded).
Specific gestures are defined using these spatial features. For instance, cursor movement is controlled by continuously tracking the position of the index fingertip, while clicking actions
are triggered by detecting pinching gestures formed by reducing the distance between the thumb and index finger below a predefined threshold. This rule-based approach enables efficient and interpretable gesture recognition without requiring complex model training.
-
System Responsiveness
To ensure real-time performance, the system processes each video frame independently and performs gesture recognition with minimal computational overhead. The use of lightweight computer vision operations and optimized hand landmark detection enables smooth interaction with low latency. This design allows the system to function efficiently even on devices without dedicated graphics processing units (GPUs). As a resul, the proposed system delivers responsive and seamless user interaction, making it suitable for real-time applications such as presentations, smart classrooms, and assistive interfaces.
-
Features of the System
-
The proposed Air Gesture system provides multiple touchless interaction features that enable users to control various computer functions efficiently and intuitively using hand gestures. The key features of the system are described below:
-
irtual Keyboard
The virtual keyboard feature allows users to type text without physical contact with a keyboard. Hand gestures are used to select and confirm characters displayed on a virtual on-screen keyboard. This feature is particularly useful in public or shared environments where hygiene is a concern, as well as for users with physical disabilities.
-
irtual Mouse
The virtual mouse enables complete cursor control through hand movements. Cursor navigation is achieved by tracking the index finger position, while clicking, dragging, and scrolling actions are performed using predefined gestures such as pinching or finger combinations. This feature effectively replaces a conventional mouse, offering a natural and flexible mode of interaction.
-
irtual Painter
The virtual painter feature allows users to draw or write on a digital canvas using finger gestures. By tracking the movement of the index finger, the system enables freehand drawing, sketching, or annotation. This feature is useful for educational purposes, presentations, and creative applications.
-
Virtual Brightness Control
The virtual brightness control feature enables users to adjust the screen brightness using hand gestures. By varying the distance or orientation between specific fingers, the system increases or decreases brightness levels in real time. This touchless control enhances user convenience and reduces dependency on physical controls.
-
irtual Volume Control
The virtual volume control feature allows users to adjust system audio levels through simple hand gestures. Gesture-based control, such as increasing or decreasing the distance between fingers, is mapped to volume adjustment, enabling seamless multimedia interaction without physical input devices.
-
Virtual Keyboard
The virtual mouse module allows users to control the keyboard keys using hand movement.
Figure 1: Virtual Keyboard using Hand Gestures
-
Virtual Mouse
The virtual mouse module allows users to control the cursor using hand movement.
-
Cursor Movement: Index finger tip
-
Left Click: Thumb and index finger
-
Right Click: Thumb and middle finger
-
Scroll: Vertical finger motion
Figure 2: Virtual Mouse using Hand Gestures
-
-
Virtual Painter
The virtual painter feature allows users to draw in the air by tracking fingertip movement.
Figure 3: Painter Using Hand Gestu
-
Virtual Brightness Control
Screen brightness is adjusted by measuring the distance between the thumb and index finger.
Figure 4: Gesture-Based Brightness Control
-
Virtual Volume Control
System audio volume is controlled dynamically using finger distance gestures.
Figure 5: Gesture-Based Volume Control
-
Applications
The proposed air gesture recognition system can be applied in a wide range of real-world scenarios requiring touchless interaction. In smart classrooms, the system enables gesture-based control of presentations, digital boards, and learning content, enhancing interactivity and hygiene. It is also suitable for touch-free public systems such as kiosks, information terminals, and ticketing machines, where minimizing physical contact is essential.
The system supports assistive technologies by providing an alternative interaction mechanism for users with physical impairments, improving accessibility and ease of use. Additionally, it can be utilized in creative and design applications, including digital drawing and sketching through virtual painter functionality. The system is further applicable to interactive presentations, allowing seamless control of multimedia content using intuitive hand gestures. These applications demonstrate the effectiveness of air gesturebased interfaces for modern touchless human computer interaction environments.
-
Advantages
-
The proposed air gesture recognition system offers several advantages over traditional input methods.
-
It enables touchless and hygienic interaction, reducing the need for physical contact with devices.
-
The system operates using a standard webcam, requiring no additional hardware, which lowers deployment complexity.
-
It is a low-cost and portable solution, making it accessible for widespread use.
-
The system is easy to use and user-friendly, allowing intuitive interaction without extensive user training.
-
-
Limitations
Although the proposed air gesture recognition system demonstrates effective real-time performance, certain limitations remain. The accuracy of hand detection and gesture recognition may degrade under poor or uneven lighting conditions and cluttered backgrounds. Additionally, rapid or abrupt hand movements can occasionally result in gesture misclassification due to motion blur or temporary loss of landmark tracking. These factors may affect system robustness in uncontrolled environments.
-
Future Scope
Future work will focus on enhancing the robustness and scalability of the air gesture recognition system. Advanced deep learning models, such as convolutional and recurrent neural networks, can be integrated to improve gesture classification accuracy under varying lighting conditions and complex backgrounds. Temporal modeling of gestures may further reduce misclassification caused by rapid hand movements.
The system can also be optimized for edge and embedded platforms to support deployment on low-power devices. Additionally, extending the framework to support multimodal interaction, including voice commands and depth sensing, can improve usability and reliability. Expanding the gesture vocabulary and incorporating user-adaptive learning are further directions to enhance personalization and application scope in real-world environments.
-
Conclusion
This paper presented an air gesture recognition system that demonstrates the effective use of computer vision techniques as an alternative to traditional input devices. By enabling touchless interaction through a standard webcam, the system provides an efficient, intuitive, and cost- effective approach to humancomputer interaction. The proposed framework supports real-time gesture recognition and control of multiple system functions without the need for additional
hardware. Overall, the results highlight the potential of air gesturebased interfaces to enhance accessibility, usability, and hygiene in modern interactive computing environments.
References
-
S. A. Brewster, The evolution of humancomputer interaction, IEEE Computer Graphics and Applications, vol. 24, no. 1, pp. 4445, 2004.
-
Y. Wu and T. S. Huang, Vision-based gesture recognition: A review, International Gesture Workshop, Springer, pp. 103115, 1999.
-
J. O. Wobbrock, H. H. Aung, B. Rothrock, and B. A. Myers, Maximizing the guessability of symbolic input, Proc. SIGCHI Conference on Huan Factors in Computing Systems, pp. 18691878, 2005.
-
Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, OpenPose: Realtime multi-person 2D pose estimation using part affinity fields, IEEE TPAMI, vol. 43, no. 1, pp. 172186, 2021.
-
F. Molchanov, S. Gupta, K. Kim, and J. Kautz, Hand gesture recognition with 3D convolutional neural networks, CVPR Workshops, pp. 17, 2015.
-
R. Rautaray and A. Agrawal, Vision based hand gesture recognition for human computer interaction: A survey, Artificial Intelligence Review, vol. 43, no. 1, pp. 154, 2015.
-
Dipietro, L., Sabatini, A. M., & Dario, P. (2008). A survey of glove-based systems and their applications. IEEE Transactions on Systems, Man, and Cybernetics, 38(4), 461482.
-
Mitra, S., & Acharya, T. (2007). Gesture recognition: A survey. IEEE Transactions on Systems, Man, and Cybernetics, 37(3), 311324.
-
Zhang, F., Bazarevsky, V., Vakunov, A., et al. (2020). MediaPipe Hands: On-device real-time hand tracking. arXiv preprint arXiv:2006.10214.
-
Wachs, J. P., Kƶlsch, M., Stern, H., & Edan, Y. (2011). Vision-based hand-gesture applications. Communications of the ACM, 54(2), 6071.
-
Molchanov, P., Gupta, S., Kim, K., & Kautz, J. (2016). Hand gesture recognition with 3D convolutional neural networks. IEEE CVPR.
