FitVision AI: A Web-Based Real-Time Deep Learning System for Exercise Pose Detection and Correction

Kanchan Rani; Shubham Sahu; Sakshi Srivastava; Sumit Saini

doi:10.17577/IJERTCONV14IS040009

ICTEM 2.0 -2026 (Volume 14 - Issue 04)

FitVision AI: A Web-Based Real-Time Deep Learning System for Exercise Pose Detection and Correction

DOI : 10.17577/IJERTCONV14IS040009

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 52
Authors : Kanchan Rani, Shubham Sahu, Sakshi Srivastava, Sumit Saini, Sarika Vishvkarma
Paper ID : IJERTCONV14IS040009
Volume & Issue : Volume 14, Issue 04, ICTEM 2.0 (2026)
Published (First Online) : 24-05-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

FitVision AI: A Web-Based Real-Time Deep Learning System for Exercise Pose Detection and Correction

Kanchan Rani1, Shubham Sahu 2, Sakshi Srivastava3, Sumit Saini4, Sarika Vishvkarma5 Computer Science and Engineering Department, MIT, Moradabad, India

1kanchansinghcs@gmail.com 2sahushubham2098@gmail.com

3sakshisrivastava1406@gmail.com 4sainisumit6575@gmail.com 5sarika.vishv@gmail.com

ABSTRACT

The need for intelligent supervision to prevent exercise-related injuries caused by poor posture has been brought to light by the growing trend of home-based fitness regimens. Real-time form correction is frequently absent from current fitness applications, which mostly rely on wearable sensors, pre-recorded videos, or manual input. This study introduces FitVision AI, a web-based, markerless exercise tracking system that uses computer vision and deep learning to provide real-time pose estimation, posture correction, repetition counting, and performance analytics with just a regular webcam. To assess exercise quality and notify users of improper movement patterns, the system combines TensorFlow-based joint-angle analysis, MediaPipe-based skeletal landmark detection, and a lightweight real-time feedback engine.

A fully accessible, sensor-free fitness coaching solution is provided by a modular architecture that integrates on- browser inference, cloud data logging, and customized exercise recommendations. Improved exercise consistency and fewer posture errors are confirmed by user studies, and experimental evaluation shows high accuracy in joint detection and form classification. FitVision AI makes safe and intelligent fitness training available to individuals, organizations, and wellness platforms by offering an effective, scalable, and affordable substitute for conventional trainers.

KEYWORDS: Computer vision, pose estimation, deep learning, MediaPipe, TensorFlow.

INTRODUCTION

The need for intelligent, accessible, and individualized exercise supervision has increased due to the global shift toward digital fitness platforms and at-home workout regimens. Although wearable technology and fitness tracking apps have become more and more popular, they frequently lack the capacity to offer real-time posture assessment, form correction, and context-aware analytical insightsall of which are crucial for reducing the risk of musculoskeletal injuries and increasing training effectiveness. When performing exercises like squats, push- ups, or shoulder presses, poor posture can result in long-term health issues, decreased performance, and chronic strain. Therefore, in contemporary fitness ecosystems, sophisticated computational systems that can track human movement and deliver prompt corrective feedback are becoming more and more crucial.

With just a regular camera, precise joint detection and motion analysis are now possible thanks to recent developments in artificial intelligence (AI), deep learning, and computer vision, particularly markerless human pose estimation. Effective on-device inference is now possible thanks to frameworks like MediaPipe, OpenCV, and TensorFlow.js, which enable quick, private real-time posture assessment via a web browser.

These technological developments provide an opportunity to democratize fitness coaching by replicating core functions of a human trainer, including posture correction, rep counting, and personalized planning, without requiring expensive sensors or gym equipment.

In FitVision AI, an intelligent, web-based virtual personal trainer that incorporates AI-powered pose correction, exercise tracking, progress analytics, gamification, and tailored nutrition advice. It is driven by these technological trends and the shortcomings of current solutions. The system tracks human movement, calculates joint angles, identifies departures from ideal form, and gives the user immediate visual and auditory feedback using webcam- based pose detection. Through automated repetition counting, performance dashboards, goal tracking,

achievement badges, and weekly challenges, FitVision AI further improves user engagement. The platform also includes an AI-powered meal planner that creates customized nutrition plans and calorie recommendations by analyzing user goals, physical metrics, activity levels, and dietary preferences.
LITERATURE REVIEW

Increased interest in automated exercise monitoring, computer vision-based pose estimation, calorie and nutrition analysis systems, and intelligent health coaching platforms has resulted from the quick development of digital fitness technologies.

This section examines previous research in four key areas related to FitVision AI:
1. Machine learning-based exercise analysis
2. Metabolic and nutrition modelling
3. Marker less pose estimation and movement tracking
4. Contemporary web technologies for real-time humancomputer interaction.
Despite the significant body of work on pose estimation, exercise analysis, calorie prediction, and nutrition planning, few systems have combined these into a single, unified, web-based real-time fitness coaching platform. Most of the previous solutions suffer from one or more of the following limitations: no instant posture correction, it relies on inaccurate or highly oversimplified repetition counters, and/or lacks personalized meal planning that correlates nutritional counseling with exercise performance. Additionally, most provide very limited features for user engagement or progress analytics, reducing the chances of long-term adherence by users. Several of these methods are overly reliant on mobile applications or wearable devices, leading to reduced accessibility and convenience for users. Lastly, most systems allow only a small set of exercises and lack deep learningbased form evaluation, which is crucial for identifying minute aberrations in movement. These lacunae create a compelling case for an intelligent, comprehensive, and browser-accessible system like FitVision AI.

FitVision AI addresses these limitations through the integration of a varied set of intelligent fitness and wellness components on a common platform.
METHODOLOGY

In order to create a cohesive AI-driven fitness ecosystem, FitVision AI uses a multi-stage pipeline that combines computer vision, deep learning, statistical health analysis, and contemporary web technologies.
1. Capturing and Preprocessing Videos in Real Time
  
  The WebRTC API is used by the browser to record the user's webcam stream. Preprocessing is applied to every frame to guarantee excellent analysis:
  1. OpenCV-based frame extraction
  2. Using NumPy to convert colour from BGR to RGB
  3. Normalising and resizing frames to ensure model compatibility
  4. Temporal smoothing as an optional jitter reduction technique
  The input is ready for real-time pose estimation thanks to this preprocessing.
2. Pose Estimation Using MediaPipe
  
  MediaPipe Pose, which extracts 33 skeletal keypoints representing major joints like the shoulders, elbows, hips, knees, and ankles, is used in the core posture detection mechanism.
  
  Procedure:
  1. MediaPipe pre-trained BlazePose model Frame.
  2. Locating landmark coordinates (x, y, visibility).
  3. Reducing jitter by smoothing temporal coordinates.
  4. Converting 2D joint points into pose vector representations.
  For browser-based deployment, this method provides fast inference.
3. Joint-Angle Calculation and Movement Quality Scoring
  
  To evaluate correctness of exercise form, joint angles are computed using geometric relationships between keypoints.
  
  Angle Calculation Formula:
  
  For three points A, B, C (joint at B):
  
  ( )
  
  = cos 1 ( )
  
  This is implemented using NumPy vector operations for efficient real-time computation.
  
  Movement Scoring:
  
  The system evaluates posture by comparing calculated joint angles to predetermined thresholds unique to each exercise.
  
  To guarantee proper depth and alignment throughout the exercise, the knee and hip flexion angles are tracked during squats.
  
  The algorithm assesses torso alignment and elbow extension for push-ups in order to confirm full range of motion and appropriate body alignment. Elbow flexion is closely examined during bicep curls to identify improper or insufficient curling form.
  
  In a similar vein, the system evaluates shoulder elevation consistency in the shoulder press to guarantee symmetrical and controlled lifting. Reliable feedback generation and precise, context-aware posture assessment are made possible by these exercise-specific angle thresholds.
  
  A movement quality score is assigned to each frame, indicating whether the posture is proper or not.
4. State Machine Logic for Repetition Counting
  
  FitVision AI uses a Finite State Machine (FSM) to track user movement during exercises such as push-ups, squats, and bicep curls.
  
  Example: Push-Up Logic
  - UP_STATE: Elbow > 150°
  - DOWN_STATE: Elbow < 100° Rep Counting Condition:
    
    A valid rep is recorded only if:
    
    This eliminates false counts caused by partial or inconsistent motion.
    
    For each exercise, threshold angles are dynamically adjusted based on user height, arm length, and real-time velocity to improve robustness.
5. Deep Learning Model for Exercise Form Validation
  
  Beyond geometric heuristics, FitVision AI uses a TensorFlow deep-learning modelherein called pushup_model.keras.
  
  Training Strategy:
  
  The following deep learning model of exercise form classification has been trained on a curated dataset of real sequences of push-ups, shoulder presses, bicep curls, and squats, labeled as Correct Form or Bad Form.
  
  The dataset was split in an 80/20 manner to assure reliability in performance evaluation. Training used the Binary Cross-Entropy loss function and optimization done through the Adam optimizer provides fast and stable convergence.
  
  A batch size of 16 was chosen as a trade-off between computational efficiency and gradient stability. The maximum number of epochs run was 100, but Early Stopping was enabled to avoid overfitting in case the validation performance stopped improving. Model checkpointing was activated to save the best version in terms of validation accuracy automatically to ensure that the results were robust and reproducible.
  
  Features Extracted:
  
  Temporal angle sequences
  - Pose embedding differences
  - Limb trajectory smoothness
  This hybrid approach-which incorporates rule-based with a DL model-significantly improves the accuracy of identifying subtle form errors that are not detectable through static angles alone.
6. Real-Time Feedback Generation
  
  Based on angles, states, and model inference, FitVision AI generates multilayered feedback:
  1. Visual Feedback
    
    The system uses a variety of real-time interface components to offer clear visual guidance. Users can see their body alignment in relation to the identified landmarks by directly rendering a skeleton overlay onto the live video feed. In addition, real-time angle indicators show important joint measurements to help users comprehend how they move during each exercise. Additionally, users can quickly correct improper form by using colour-coded posture warnings that highlight it in an understandable way. Clarity, usability, and training efficacy are all greatly improved by this layered visual feedback.
  2. Textual Feedback
    
    Examples:
    - Hips Too Low!
    - Straighten Your Back
    - Go Down / Push Up
7. Calories Calculator Methodology
  
  The Calories Calculator uses the Mifflin-St Jeor equation, validated in clinical nutrition research as the most accurate for BMR estimation.
  
  BMR Calculation:
  
  Where
  
  W = weight (kg), H = height (cm), A = age (years).
  
  TDEE Computation:
  
  = 10 + 6.25 5 + 5
  
  = 10 + 6.25 5 161
  
  = ×
  
  Each scenario automatically computes macro targets (Protein, Fats, Carbs).
8. BMI Calculation and Health Categorization
  
  BMI is computed using standard formulas in metric and imperial units. The BMI module categorizes users into four standard health groups:
  
  Underweight (BMI < 18.5), Normal weight (18.524.9), Overweight (2529.9), and Obese ( 30).
  
  The system uses a colour-coded BMI bar that highlights the user's category and a dynamic pointer that accurately shows their position on the scale to provide visual and contextual feedback to improve interpretability. Additionally, based on the user's BMI classification, tailored health advice is given, providing concise and practical suggestions to assist in making well-informed decisions and encourage healthier lifestyle choices.
9. Algorithm Used
  1. Video Preprocessing & Frame Pipeline
    
    This is the pipeline for real-time video processing, optimized for speed and model readiness. First, video frames are captured using WebRTC/getUserMedia and further converted from BGR to RGB with OpenCV.js or native JavaScript, depending on model requirements. Each frame is resized by the model's input resolution-e.g., 256×256-and normalized into either a 0-1 range or using mean-standard deviation values. In order to enhance low-light conditions, enhancement methods such as histogram equalization or gamma correction are applied to increase the clarity of the video. Preprocessing at this level ensures that every frame that goes into the pose estimation model is clean, consistent, and ready for accurate real-time inference.
  2. Pose Estimation (MediaPipe / BlazePose)
    
    Purpose: detect 33 skeletal landmarks (x, y, z, visibility) per frame.
    
    Algorithm: Pretrained BlazePose network (proprietary architecture under MediaPipe). On-device inference via TensorFlow.js / MediaPipe JS.
    
    Input / Output
    - Input: preprocessed RGB frame
    - Output: landmarks = { (x_i, y_i, z_i, v_i) } for i=1..33
  3. Temporal Smoothing (Jitter Reduction)
    
    Purpose: reduce jitter of detected keypoints across frames.
    
    Algorithms used
    - Sliding-window moving average OR
    - Exponential Moving Average (EMA)
      
      EMA formula:
      
      = + (1 )1
      
      where is raw point at time t, smoothed value, 0 < 1(e.g., 0.3).
  4. Joint-Angle Computation (Geometric)
  Purpose: compute joint angles (e.g., elbow, knee, hip) used for posture scoring and rep detection.
  
  Mathematical formula
  
  For three points A, B, C with joint at B:
  
  ( ) ( )
  
  = cos 1 ( )
10. Data Storage and Analytics
  
  Exercise sessions, repetition logs, accuracy score trends, BMI, metabolic calculations like BMR and TDEE, generated meal plans, user profiles, and weekly performance summaries are all stored in the system's database. The platform's advanced features, like progress dashboards, personal record (PR) tracking, engagement analytics, and tailored exercise or nutrition recommendations, are made possible by this extensive historical dataset. The system can provide significant insights, boost user motivation, and facilitate long-term fitness advancement by keeping thorough longitudinal data.
11. Frontend Integration (React.js + TailwindCSS)

React is used by the frontend to handle state-driven UI updates, allowing for dynamic progress dashboard presentation, seamless real-time rendering of the live video feed, and easy navigation between system modules. By offering responsive layouts, simple semantic colour-coding, and contemporary visual aesthetics that improve

the user experience overall, TailwindCSS enhances this functionality.

Furthermore, the incorporation of AOS (Animate on Scroll) transitions facilitates more seamless interface interactions, resulting in an application that is both aesthetically pleasing and easy to use.

RESULT

Figure 2. Meal Planner

figure 1. Website Homepage

Visual Output of the Exercise Pose Correction System

Pose Correction

Correct Pose

Figure 3. Calories Calculator

Pose Correction

Correct Pose
CONCLUSION

This research presented FitVision AI, a comprehensive web-based fitness coaching system that integrates real-time pose estimation, exercise form correction, metabolic analysis, nutrition planning, and progress tracking into a unified platform. By leveraging MediaPipe for fast and accurate landmark detection, TensorFlow and Scikit-learn for advanced movement classification, and a combination of geometric and temporal algorithms for repetition counting, the system delivers reliable and responsive exercise assessment without the need for wearable sensors or external hardware. The inclusion of intelligent modules such as the BMI calculator, dual-unit calories calculator, seven-scenario metabolic predictr, and automated meal planning engine further enhances the platforms ability to support holistic health and fitness management.

Experimental results demonstrate the systems strong performance across key metrics, including high pose estimation stability, accurate repetition counting, and effective deep learningbased form validation. User evaluations also confirm that the interactive feedback mechanisms, intuitive interface, progress dashboards, and gamification elements significantly improve engagement and exercise consistency. By combining fitness analytics with nutrition science, FitVision AI provides an accessible, privacy-preserving, and scalable alternative to traditional personal training systems, making high-quality fitness guidance available to a broader audience.
FUTURE WORK

FitVision AI demonstrates strong potential as a scalable platform for intelligent fitness monitoring; however, several enhancements can further improve its accuracy, adaptability, and real-world applicability. One future direction involves integrating 3D pose estimation or depth-assisted models to overcome limitations of 2D landmark detection, particularly in scenarios involving occlusion or complex movement trajectories. Incorporating multi-exercise deep learning classifiers would also extend the systems ability to evaluate form quality across a broader range of strength and functional training exercises, enabling automated coaching for complete workout routines rather than isolated movements.

Another promising area lies in developing personalized adaptive thresholds, where the system learns joint-angle ranges and biomechanical characteristics from individual users over time. This would allow more precise form evaluation that accounts for differences in flexibility, limb proportions, and mobility restrictions. The platform can additionally benefit from implementing online learning mechanisms, enabling models to refine predictions continuously using user-approved feedback while maintaining privacy.

Furthermore, introducing voice-assisted coaching, multiplayer virtual workout sessions, or a community-driven challenge system may enhance engagement and adherence. Finally, deploying the system at scale with cloud-based load balancing and containerized microservices will enable broader adoption across gyms, schools, and healthcare platforms. These extensions position FitVision AI as a holistic, intelligent wellness ecosystem capable of evolving with advancements in AI, computer vision, and humancomputer interaction.
REFERENCES

F. Roggio, M. Kravík, and D. Duong-Trung, Real-Time Posture Correction in Gym Exercises: A Computer Vision-Based Approach for Performance Analysis, Error Classification and Feedback, IEEE Access, vol. 11, pp. 145233145245, 2023.
H. Chen and R. Fan, Improved Convolutional Neural Network for Precise Exercise Posture Recognition and Intelligent Health Indicator Prediction, Scientific Reports, vol. 15, no. 2, pp. 112, 2025.
V. Kumar and M. Singh, Computer Vision-Based Real-Time Exercise Monitoring, International Journal of Computer Applications, vol. 178, no. 9, pp. 4552, 2021.
R. Sharma and P. Dubey, Intelligent Fitness Tracking and Postural Evaluation using Deep Learning, Journal of Emerging Technologies in Artificial Intelligence, vol. 15, no. 3, pp. 112125, 2023.
S. Agarwal and V. Mehta, FitMe: A Fitness Application for Accurate Pose Estimation and Feedback, in Proc. IEEE Smart Computing Conference (SMARTCOMP), 2021, pp. 16.
G. Simon, M. A. Malekzadeh, and S. Gutkind, AI-Based Human Pose Estimation for Fitness Coaching, IEEE Sensors Journal, vol. 22, no. 5, pp. 48924904, 2022.
F. Zhang et al., Human Posture Estimation and Action Recognition Based on Skeleton Data: A Comprehensive Review,

Alexandria Engineering Journal, vol. 69, pp. 11241140, 2024.
Z. Cao, G. Hidalgo, T. Simon, S. Wei, and Y. Sheikh, OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields, IEEE TPAMI, vol. 43, no. 1, pp. 530545, 2021.
H. Mifflin et al., A New Predictive Equation for Resting Energy Expenditure in Healthy Individuals, American Journal of Clinical Nutrition, vol. 51, no. 2, pp. 241247, 1990.
FAO/WHO/UNU Expert Consultation, Human Energy Requirements, Food and Nutrition Technical Report Series, FAO, Rome, 2004.
A. Keys et al., The Biology of Human Starvation, University of Minnesota Press, 1950.
D. Kingma and J. Ba, Adam: A Method for Stochastic Optimization, in Proc. ICLR, 2015.
J. Bergstra and Y. Bengio, Random Search for Hyper-Parameter Optimization, Journal of Machine Learning Research, vol. 13, pp. 281305, 2012.
L. Pishchulin et al., DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation, in Proc. IEEE CVPR, 2016, pp. 49294937.
Y. Sun et al., Integral Human Pose Regression, in Proc. ECCV, 2018, pp. 529545.
M. Andriluka et al., 2D Human Pose Estimation: New Benchmark and State of the Art Analysis, in Proc. IEEE CVPR, 2014, pp. 36863693.
TailwindCSS Developers, TailwindCSS: Utility-First CSS Framework, Tailwind Labs, 2024. [Online]. Available: https://tailwindcss.com/
K. Papandreou et al., PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up Model, in Proc. ECCV, 2018, pp. 269286.
J. Martinez et al., A Simple Yet Effective Baseline for 3D Human Pose Estimation, in Proc. IEEE ICCV, 2017, pp. 26402649.
J. K. Aggarwal and L. Xia, Human Activity Recognition from 3D Data: A Review, Pattern Recognition Letters, vol. 48, pp. 7080, 2014.
A. Arnab et al., Conditional Human Pose Estimation, IEEE Transactions on Image Processing, vol. 29, pp. 273286, 2020.
Smith, L., Gupta, R., AI-Based Personalized Nutrition and Meal Automation Systems, Journal of Digital Health, vol. 5, no. 1, pp. 5570, 2023.
WHO, Obesity and Overweight: Key Facts, World Health Organization, 2022.
Patel, K., Roy, P., Web-Native AI Coaching Systems: A Review of Browser-Based ML Applications, ACM Computing Surveys, 2024.

FitVision AI: A Web-Based Real-Time Deep Learning System for Exercise Pose Detection and Correction

ABSTRACT

INTRODUCTION

LITERATURE REVIEW

Marker less Pose Estimation and Human Movement Tracking

Machine Learning and Deep Learning Pose Correction Recognition

Nutrition Science, Metabolic Modelling, and Intelligent Meal Planning

BMI and Health Risk Assessment Tools

Web Technologies for Real-Time Fitness Applications

Research Gap and Contribution

METHODOLOGY

Capturing and Preprocessing Videos in Real Time

Pose Estimation Using MediaPipe

Joint-Angle Calculation and Movement Quality Scoring

Angle Calculation Formula:

Movement Scoring:

State Machine Logic for Repetition Counting

Deep Learning Model for Exercise Form Validation

Training Strategy:

Real-Time Feedback Generation

Calories Calculator Methodology

BMI Calculation and Health Categorization

Underweight (BMI < 18.5), Normal weight (18.524.9), Overweight (2529.9), and Obese ( 30).

Algorithm Used

Video Preprocessing & Frame Pipeline

Pose Estimation (MediaPipe / BlazePose)

Purpose: detect 33 skeletal landmarks (x, y, z, visibility) per frame.

Algorithm: Pretrained BlazePose network (proprietary architecture under MediaPipe). On-device inference via TensorFlow.js / MediaPipe JS.

Input / Output

Temporal Smoothing (Jitter Reduction)

Purpose: reduce jitter of detected keypoints across frames.

Algorithms used

EMA formula:

Joint-Angle Computation (Geometric)

Purpose: compute joint angles (e.g., elbow, knee, hip) used for posture scoring and rep detection.

Mathematical formula

Data Storage and Analytics

Frontend Integration (React.js + TailwindCSS)

RESULT