DOI : 10.17577/IJERTCONV14IS040009- Open Access

- Authors : Kanchan Rani, Shubham Sahu, Sakshi Srivastava, Sumit Saini, Sarika Vishvkarma
- Paper ID : IJERTCONV14IS040009
- Volume & Issue : Volume 14, Issue 04, ICTEM 2.0 (2026)
- Published (First Online) : 24-05-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
FitVision AI: A Web-Based Real-Time Deep Learning System for Exercise Pose Detection and Correction
Kanchan Rani1, Shubham Sahu 2, Sakshi Srivastava3, Sumit Saini4, Sarika Vishvkarma5 Computer Science and Engineering Department, MIT, Moradabad, India
1kanchansinghcs@gmail.com 2sahushubham2098@gmail.com
3sakshisrivastava1406@gmail.com 4sainisumit6575@gmail.com 5sarika.vishv@gmail.com
ABSTRACT
The need for intelligent supervision to prevent exercise-related injuries caused by poor posture has been brought to light by the growing trend of home-based fitness regimens. Real-time form correction is frequently absent from current fitness applications, which mostly rely on wearable sensors, pre-recorded videos, or manual input. This study introduces FitVision AI, a web-based, markerless exercise tracking system that uses computer vision and deep learning to provide real-time pose estimation, posture correction, repetition counting, and performance analytics with just a regular webcam. To assess exercise quality and notify users of improper movement patterns, the system combines TensorFlow-based joint-angle analysis, MediaPipe-based skeletal landmark detection, and a lightweight real-time feedback engine.
A fully accessible, sensor-free fitness coaching solution is provided by a modular architecture that integrates on- browser inference, cloud data logging, and customized exercise recommendations. Improved exercise consistency and fewer posture errors are confirmed by user studies, and experimental evaluation shows high accuracy in joint detection and form classification. FitVision AI makes safe and intelligent fitness training available to individuals, organizations, and wellness platforms by offering an effective, scalable, and affordable substitute for conventional trainers.
KEYWORDS: Computer vision, pose estimation, deep learning, MediaPipe, TensorFlow.
-
INTRODUCTION
The need for intelligent, accessible, and individualized exercise supervision has increased due to the global shift toward digital fitness platforms and at-home workout regimens. Although wearable technology and fitness tracking apps have become more and more popular, they frequently lack the capacity to offer real-time posture assessment, form correction, and context-aware analytical insightsall of which are crucial for reducing the risk of musculoskeletal injuries and increasing training effectiveness. When performing exercises like squats, push- ups, or shoulder presses, poor posture can result in long-term health issues, decreased performance, and chronic strain. Therefore, in contemporary fitness ecosystems, sophisticated computational systems that can track human movement and deliver prompt corrective feedback are becoming more and more crucial.
With just a regular camera, precise joint detection and motion analysis are now possible thanks to recent developments in artificial intelligence (AI), deep learning, and computer vision, particularly markerless human pose estimation. Effective on-device inference is now possible thanks to frameworks like MediaPipe, OpenCV, and TensorFlow.js, which enable quick, private real-time posture assessment via a web browser.
These technological developments provide an opportunity to democratize fitness coaching by replicating core functions of a human trainer, including posture correction, rep counting, and personalized planning, without requiring expensive sensors or gym equipment.
In FitVision AI, an intelligent, web-based virtual personal trainer that incorporates AI-powered pose correction, exercise tracking, progress analytics, gamification, and tailored nutrition advice. It is driven by these technological trends and the shortcomings of current solutions. The system tracks human movement, calculates joint angles, identifies departures from ideal form, and gives the user immediate visual and auditory feedback using webcam- based pose detection. Through automated repetition counting, performance dashboards, goal tracking,
achievement badges, and weekly challenges, FitVision AI further improves user engagement. The platform also includes an AI-powered meal planner that creates customized nutrition plans and calorie recommendations by analyzing user goals, physical metrics, activity levels, and dietary preferences.
-
LITERATURE REVIEW
Increased interest in automated exercise monitoring, computer vision-based pose estimation, calorie and nutrition analysis systems, and intelligent health coaching platforms has resulted from the quick development of digital fitness technologies.
This section examines previous research in four key areas related to FitVision AI:
-
Machine learning-based exercise analysis
-
Metabolic and nutrition modelling
-
Marker less pose estimation and movement tracking
-
Contemporary web technologies for real-time humancomputer interaction.
-
Marker less Pose Estimation and Human Movement Tracking
Markerless pose estimation has become a widely adopted method for analysing human movement without requiring sensors or motion-capture hardware. Traditional computer vision pipelines using OpenCV and NumPy support preprocessing, geometric computations, and frame-level analytics, enabling lightweight deployment in real-time applications [3].
Googles MediaPipe Pose, known for its device-optimized neural network models, extracts 33 anatomical landmarks with high accuracy and is designed for cross-platform systems, making it suitable for browser-based environments such as FitVision AI. Studies show Mediapipe performs reliably in complex activities such as yoga, sports, and resistance training [6], and demonstrates robustness in multi-environment conditions similar to OpenPose [8] and PersonLab [18].
Comprehensive surveys highlight that modern markerless approaches significantly reduce user friction, increase accessibility, and eliminate the need for specialized hardware [7], although lighting variation and partial occlusions remain challenges. Benchmark analyses of multi-person and 3D pose estimation further illustrate the evolution of high-precision skeletal analytics in real-time fitness systems [14], [15], [19].
-
Machine Learning and Deep Learning Pose Correction Recognition
Recent years have witnessed major advancements in ML-driven exercise recognition, leveraging frameworks like TensorFlow and Scikit-learn. Earlier approaches primarily depended on rule-based joint-angle thresholds, but contemporary work integrates geometric features with ML classifiers for more reliable posture detection [4].
Modern systems use CNNs, LSTMs, and hybrid deep architectures to evaluate dynamic motion sequences, yielding significantly improved detection of improper form during exercises such as push-ups, squats, and deadlifts [1], [2]. Studies by Roggio et al. demonstrate the effectiveness of real-time error classification and posture feedback for gym-based training [1].
Additionally, deep learningbased temporal modeling has been shown to outperform static angle analysis, as time- series models capture transition phases across movement states [20]. Optimization algorithms such as Adam [12] and hyperparameter search strategies [13] further enhance model training and generalization capability.
FitVision AI aligns with these research trends by implementing:
-
A push-up classifier trained using movement sequences
-
State-machine logic for detecting reps and transitions
-
Hybrid ML + DL architectures for enhanced accuracy
These features reflect recent literature emphasizing AI-driven feedback for injury prevention and imprved form adherence [1], [2], [4].
-
-
Nutrition Science, Metabolic Modelling, and Intelligent Meal Planning
Nutrition personalization is an increasingly important element of digital wellness ecosystems. The MifflinSt Jeor equation remains the most validated formula for calculating Basal Metabolic Rate (BMR) across populations [9], while FAO/WHO guidelines outline Total Daily Energy Expenditure (TDEE) estimation for safe caloric
planning [10]. Historically foundational work such as The Biology of Human Starvation emphasizes the physiological risks associated with extreme caloric deficits [11].
Modern nutrition models use dynamic macro distribution, goal-specific energy segmentation, and safety-bounded diet recommendations. Recent studies highlight the importance of integrating automated meal planning algorithms into digital fitness platforms to support sustainable health outcomes (e.g., intelligent diet generation and metabolic prediction engines)
FitVision AI contributes uniquely to this domain by:
-
Implementing a dual-unit calories calculator
-
Providing a seven-scenario metabolic prediction model
-
Auto-generating balanced meal plans with built-in safety limits (12004000 kcal)
-
Offering dynamic macro distribution tailored to user goals
-
Auto-prefilling user information for seamless workflow
This integrated metabolic ecosystem surpasses traditional static meal planners and aligns with modern adaptive nutrition research.
-
-
BMI and Health Risk Assessment Tools
The BMI continues to be one of the most used measures in public health for the classification of weight status and possible comorbidities. Previous studies reinforce the usefulness of classification according to BMI in early detection: obesity, underweight conditions, and general metabolic risk.
Modern enhancements include color-coded risk visualization, interactive scales, and recommendations given within context; such features have shown to enhance user understanding and engagement. FitVision AI integrates these established best practices into an interactive, visually intuitive BMI analyzer that classifies instantaneously and provides personalized health insights.
-
Web Technologies for Real-Time Fitness Applications
The rise of client-side machine learningparticularly using TensorFlow.jshas enabled real-time human-computer interaction directly within the browser. Literature highlights key advantages of this approach, including lower latency, enhanced privacy, and reduced server load [5], [6].
Modern web stacks for AI-enabled fitness interfaces typically include:
-
React.js for efficient rendering and modular UI development
-
TailwindCSS for rapid, responsive styling [17]
-
MongoDB for flexible data storage across user profiles, workout logs, and analytics
Research confirms that browser-based AI systems significantly improve scalability and accessibility compared to native mobile or wearable-based applications.
-
-
Research Gap and Contribution
Despite the significant body of work on pose estimation, exercise analysis, calorie prediction, and nutrition planning, few systems have combined these into a single, unified, web-based real-time fitness coaching platform. Most of the previous solutions suffer from one or more of the following limitations: no instant posture correction, it relies on inaccurate or highly oversimplified repetition counters, and/or lacks personalized meal planning that correlates nutritional counseling with exercise performance. Additionally, most provide very limited features for user engagement or progress analytics, reducing the chances of long-term adherence by users. Several of these methods are overly reliant on mobile applications or wearable devices, leading to reduced accessibility and convenience for users. Lastly, most systems allow only a small set of exercises and lack deep learningbased form evaluation, which is crucial for identifying minute aberrations in movement. These lacunae create a compelling case for an intelligent, comprehensive, and browser-accessible system like FitVision AI.
FitVision AI addresses these limitations through the integration of a varied set of intelligent fitness and wellness components on a common platform.
-
-
METHODOLOGY
In order to create a cohesive AI-driven fitness ecosystem, FitVision AI uses a multi-stage pipeline that combines computer vision, deep learning, statistical health analysis, and contemporary web technologies.
-
Capturing and Preprocessing Videos in Real Time
The WebRTC API is used by the browser to record the user's webcam stream. Preprocessing is applied to every frame to guarantee excellent analysis:
-
OpenCV-based frame extraction
-
Using NumPy to convert colour from BGR to RGB
-
Normalising and resizing frames to ensure model compatibility
-
Temporal smoothing as an optional jitter reduction technique
The input is ready for real-time pose estimation thanks to this preprocessing.
-
-
Pose Estimation Using MediaPipe
MediaPipe Pose, which extracts 33 skeletal keypoints representing major joints like the shoulders, elbows, hips, knees, and ankles, is used in the core posture detection mechanism.
Procedure:
-
MediaPipe pre-trained BlazePose model Frame.
-
Locating landmark coordinates (x, y, visibility).
-
Reducing jitter by smoothing temporal coordinates.
-
Converting 2D joint points into pose vector representations.
For browser-based deployment, this method provides fast inference.
-
-
Joint-Angle Calculation and Movement Quality Scoring
To evaluate correctness of exercise form, joint angles are computed using geometric relationships between keypoints.
Angle Calculation Formula:
For three points A, B, C (joint at B):
( )
= cos 1 ( )
This is implemented using NumPy vector operations for efficient real-time computation.
Movement Scoring:
The system evaluates posture by comparing calculated joint angles to predetermined thresholds unique to each exercise.
To guarantee proper depth and alignment throughout the exercise, the knee and hip flexion angles are tracked during squats.
The algorithm assesses torso alignment and elbow extension for push-ups in order to confirm full range of motion and appropriate body alignment. Elbow flexion is closely examined during bicep curls to identify improper or insufficient curling form.
In a similar vein, the system evaluates shoulder elevation consistency in the shoulder press to guarantee symmetrical and controlled lifting. Reliable feedback generation and precise, context-aware posture assessment are made possible by these exercise-specific angle thresholds.
A movement quality score is assigned to each frame, indicating whether the posture is proper or not.
-
State Machine Logic for Repetition Counting
FitVision AI uses a Finite State Machine (FSM) to track user movement during exercises such as push-ups, squats, and bicep curls.
Example: Push-Up Logic
-
UP_STATE: Elbow > 150°
-
DOWN_STATE: Elbow < 100° Rep Counting Condition:
A valid rep is recorded only if:
This eliminates false counts caused by partial or inconsistent motion.
For each exercise, threshold angles are dynamically adjusted based on user height, arm length, and real-time velocity to improve robustness.
-
-
Deep Learning Model for Exercise Form Validation
Beyond geometric heuristics, FitVision AI uses a TensorFlow deep-learning modelherein called pushup_model.keras.
Training Strategy:
The following deep learning model of exercise form classification has been trained on a curated dataset of real sequences of push-ups, shoulder presses, bicep curls, and squats, labeled as Correct Form or Bad Form.
The dataset was split in an 80/20 manner to assure reliability in performance evaluation. Training used the Binary Cross-Entropy loss function and optimization done through the Adam optimizer provides fast and stable convergence.
A batch size of 16 was chosen as a trade-off between computational efficiency and gradient stability. The maximum number of epochs run was 100, but Early Stopping was enabled to avoid overfitting in case the validation performance stopped improving. Model checkpointing was activated to save the best version in terms of validation accuracy automatically to ensure that the results were robust and reproducible.
Features Extracted:
Temporal angle sequences
-
Pose embedding differences
-
Limb trajectory smoothness
This hybrid approach-which incorporates rule-based with a DL model-significantly improves the accuracy of identifying subtle form errors that are not detectable through static angles alone.
-
-
Real-Time Feedback Generation
Based on angles, states, and model inference, FitVision AI generates multilayered feedback:
-
Visual Feedback
The system uses a variety of real-time interface components to offer clear visual guidance. Users can see their body alignment in relation to the identified landmarks by directly rendering a skeleton overlay onto the live video feed. In addition, real-time angle indicators show important joint measurements to help users comprehend how they move during each exercise. Additionally, users can quickly correct improper form by using colour-coded posture warnings that highlight it in an understandable way. Clarity, usability, and training efficacy are all greatly improved by this layered visual feedback.
-
Textual Feedback
Examples:
-
Hips Too Low!
-
Straighten Your Back
-
Go Down / Push Up
-
-
-
Calories Calculator Methodology
The Calories Calculator uses the Mifflin-St Jeor equation, validated in clinical nutrition research as the most accurate for BMR estimation.
BMR Calculation:
Where
W = weight (kg), H = height (cm), A = age (years).
TDEE Computation:
= 10 + 6.25 5 + 5
= 10 + 6.25 5 161
= ×
Each scenario automatically computes macro targets (Protein, Fats, Carbs).
-
BMI Calculation and Health Categorization
BMI is computed using standard formulas in metric and imperial units. The BMI module categorizes users into four standard health groups:
Underweight (BMI < 18.5), Normal weight (18.524.9), Overweight (2529.9), and Obese ( 30).
The system uses a colour-coded BMI bar that highlights the user's category and a dynamic pointer that accurately shows their position on the scale to provide visual and contextual feedback to improve interpretability. Additionally, based on the user's BMI classification, tailored health advice is given, providing concise and practical suggestions to assist in making well-informed decisions and encourage healthier lifestyle choices.
-
Algorithm Used
-
Video Preprocessing & Frame Pipeline
This is the pipeline for real-time video processing, optimized for speed and model readiness. First, video frames are captured using WebRTC/getUserMedia and further converted from BGR to RGB with OpenCV.js or native JavaScript, depending on model requirements. Each frame is resized by the model's input resolution-e.g., 256×256-and normalized into either a 0-1 range or using mean-standard deviation values. In order to enhance low-light conditions, enhancement methods such as histogram equalization or gamma correction are applied to increase the clarity of the video. Preprocessing at this level ensures that every frame that goes into the pose estimation model is clean, consistent, and ready for accurate real-time inference.
-
Pose Estimation (MediaPipe / BlazePose)
Purpose: detect 33 skeletal landmarks (x, y, z, visibility) per frame.
Algorithm: Pretrained BlazePose network (proprietary architecture under MediaPipe). On-device inference via TensorFlow.js / MediaPipe JS.
Input / Output
-
Input: preprocessed RGB frame
-
Output: landmarks = { (x_i, y_i, z_i, v_i) } for i=1..33
-
-
Temporal Smoothing (Jitter Reduction)
Purpose: reduce jitter of detected keypoints across frames.
Algorithms used
-
Sliding-window moving average OR
-
Exponential Moving Average (EMA)
EMA formula:
= + (1 )1
where is raw point at time t, smoothed value, 0 < 1(e.g., 0.3).
-
-
Joint-Angle Computation (Geometric)
Purpose: compute joint angles (e.g., elbow, knee, hip) used for posture scoring and rep detection.
Mathematical formula
For three points A, B, C with joint at B:
( ) ( )
= cos 1 ( )
-
-
Data Storage and Analytics
Exercise sessions, repetition logs, accuracy score trends, BMI, metabolic calculations like BMR and TDEE, generated meal plans, user profiles, and weekly performance summaries are all stored in the system's database. The platform's advanced features, like progress dashboards, personal record (PR) tracking, engagement analytics, and tailored exercise or nutrition recommendations, are made possible by this extensive historical dataset. The system can provide significant insights, boost user motivation, and facilitate long-term fitness advancement by keeping thorough longitudinal data.
-
Frontend Integration (React.js + TailwindCSS)
-
React is used by the frontend to handle state-driven UI updates, allowing for dynamic progress dashboard presentation, seamless real-time rendering of the live video feed, and easy navigation between system modules. By offering responsive layouts, simple semantic colour-coding, and contemporary visual aesthetics that improve
the user experience overall, TailwindCSS enhances this functionality.
Furthermore, the incorporation of AOS (Animate on Scroll) transitions facilitates more seamless interface interactions, resulting in an application that is both aesthetically pleasing and easy to use.
-
RESULT
Figure 2. Meal Planner
figure 1. Website Homepage
Visual Output of the Exercise Pose Correction System
Pose Correction
Correct Pose
Figure 3. Calories Calculator
Pose Correction
Correct Pose
-
CONCLUSION
This research presented FitVision AI, a comprehensive web-based fitness coaching system that integrates real-time pose estimation, exercise form correction, metabolic analysis, nutrition planning, and progress tracking into a unified platform. By leveraging MediaPipe for fast and accurate landmark detection, TensorFlow and Scikit-learn for advanced movement classification, and a combination of geometric and temporal algorithms for repetition counting, the system delivers reliable and responsive exercise assessment without the need for wearable sensors or external hardware. The inclusion of intelligent modules such as the BMI calculator, dual-unit calories calculator, seven-scenario metabolic predictr, and automated meal planning engine further enhances the platforms ability to support holistic health and fitness management.
Experimental results demonstrate the systems strong performance across key metrics, including high pose estimation stability, accurate repetition counting, and effective deep learningbased form validation. User evaluations also confirm that the interactive feedback mechanisms, intuitive interface, progress dashboards, and gamification elements significantly improve engagement and exercise consistency. By combining fitness analytics with nutrition science, FitVision AI provides an accessible, privacy-preserving, and scalable alternative to traditional personal training systems, making high-quality fitness guidance available to a broader audience.
-
FUTURE WORK
FitVision AI demonstrates strong potential as a scalable platform for intelligent fitness monitoring; however, several enhancements can further improve its accuracy, adaptability, and real-world applicability. One future direction involves integrating 3D pose estimation or depth-assisted models to overcome limitations of 2D landmark detection, particularly in scenarios involving occlusion or complex movement trajectories. Incorporating multi-exercise deep learning classifiers would also extend the systems ability to evaluate form quality across a broader range of strength and functional training exercises, enabling automated coaching for complete workout routines rather than isolated movements.
Another promising area lies in developing personalized adaptive thresholds, where the system learns joint-angle ranges and biomechanical characteristics from individual users over time. This would allow more precise form evaluation that accounts for differences in flexibility, limb proportions, and mobility restrictions. The platform can additionally benefit from implementing online learning mechanisms, enabling models to refine predictions continuously using user-approved feedback while maintaining privacy.
Furthermore, introducing voice-assisted coaching, multiplayer virtual workout sessions, or a community-driven challenge system may enhance engagement and adherence. Finally, deploying the system at scale with cloud-based load balancing and containerized microservices will enable broader adoption across gyms, schools, and healthcare platforms. These extensions position FitVision AI as a holistic, intelligent wellness ecosystem capable of evolving with advancements in AI, computer vision, and humancomputer interaction.
-
REFERENCES
-
F. Roggio, M. Kravík, and D. Duong-Trung, Real-Time Posture Correction in Gym Exercises: A Computer Vision-Based Approach for Performance Analysis, Error Classification and Feedback, IEEE Access, vol. 11, pp. 145233145245, 2023.
-
H. Chen and R. Fan, Improved Convolutional Neural Network for Precise Exercise Posture Recognition and Intelligent Health Indicator Prediction, Scientific Reports, vol. 15, no. 2, pp. 112, 2025.
-
V. Kumar and M. Singh, Computer Vision-Based Real-Time Exercise Monitoring, International Journal of Computer Applications, vol. 178, no. 9, pp. 4552, 2021.
-
R. Sharma and P. Dubey, Intelligent Fitness Tracking and Postural Evaluation using Deep Learning, Journal of Emerging Technologies in Artificial Intelligence, vol. 15, no. 3, pp. 112125, 2023.
-
S. Agarwal and V. Mehta, FitMe: A Fitness Application for Accurate Pose Estimation and Feedback, in Proc. IEEE Smart Computing Conference (SMARTCOMP), 2021, pp. 16.
-
G. Simon, M. A. Malekzadeh, and S. Gutkind, AI-Based Human Pose Estimation for Fitness Coaching, IEEE Sensors Journal, vol. 22, no. 5, pp. 48924904, 2022.
-
F. Zhang et al., Human Posture Estimation and Action Recognition Based on Skeleton Data: A Comprehensive Review,
Alexandria Engineering Journal, vol. 69, pp. 11241140, 2024.
-
Z. Cao, G. Hidalgo, T. Simon, S. Wei, and Y. Sheikh, OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields, IEEE TPAMI, vol. 43, no. 1, pp. 530545, 2021.
-
H. Mifflin et al., A New Predictive Equation for Resting Energy Expenditure in Healthy Individuals, American Journal of Clinical Nutrition, vol. 51, no. 2, pp. 241247, 1990.
-
FAO/WHO/UNU Expert Consultation, Human Energy Requirements, Food and Nutrition Technical Report Series, FAO, Rome, 2004.
-
A. Keys et al., The Biology of Human Starvation, University of Minnesota Press, 1950.
-
D. Kingma and J. Ba, Adam: A Method for Stochastic Optimization, in Proc. ICLR, 2015.
-
J. Bergstra and Y. Bengio, Random Search for Hyper-Parameter Optimization, Journal of Machine Learning Research, vol. 13, pp. 281305, 2012.
-
L. Pishchulin et al., DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation, in Proc. IEEE CVPR, 2016, pp. 49294937.
-
Y. Sun et al., Integral Human Pose Regression, in Proc. ECCV, 2018, pp. 529545.
-
M. Andriluka et al., 2D Human Pose Estimation: New Benchmark and State of the Art Analysis, in Proc. IEEE CVPR, 2014, pp. 36863693.
-
TailwindCSS Developers, TailwindCSS: Utility-First CSS Framework, Tailwind Labs, 2024. [Online]. Available: https://tailwindcss.com/
-
K. Papandreou et al., PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up Model, in Proc. ECCV, 2018, pp. 269286.
-
J. Martinez et al., A Simple Yet Effective Baseline for 3D Human Pose Estimation, in Proc. IEEE ICCV, 2017, pp. 26402649.
-
J. K. Aggarwal and L. Xia, Human Activity Recognition from 3D Data: A Review, Pattern Recognition Letters, vol. 48, pp. 7080, 2014.
-
A. Arnab et al., Conditional Human Pose Estimation, IEEE Transactions on Image Processing, vol. 29, pp. 273286, 2020.
-
Smith, L., Gupta, R., AI-Based Personalized Nutrition and Meal Automation Systems, Journal of Digital Health, vol. 5, no. 1, pp. 5570, 2023.
-
WHO, Obesity and Overweight: Key Facts, World Health Organization, 2022.
-
Patel, K., Roy, P., Web-Native AI Coaching Systems: A Review of Browser-Based ML Applications, ACM Computing Surveys, 2024.
