DOI : 10.17577/IJERTCONV14IS020089- Open Access

- Authors : Dhiraj Bholashankar Vishwakarma, Shreya Upendra Chauhan
- Paper ID : IJERTCONV14IS020089
- Volume & Issue : Volume 14, Issue 02, NCRTCS – 2026
- Published (First Online) : 21-04-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
Machine Learning Based Cardiovascular Disease Prediction using Classification Algorithms
Dhiraj Bholashankar Vishwakarma
Department Of Computer Science
DR. D. Y. Patil, Arts, Commerce & Science College Pimpri, Pune, India.
Shreya Upendra Chauhan
Department Of Computer Science
-
Mahila P.G. College Tamsa Marg, Mirzapur, Akbarpur Ambedkar Nagar, Uttar Pradesh, India.
Abstract – Cardiovascular disease is one of the leading causes of mortality worldwide, making early detection and risk prediction essential for preventive healthcare. Machine Learning techniques have shown significant potential in predicting cardiovascular disease using clinical and lifestyle related data. This research paper presents a comparative study of major classification algorithms for cardiovascular disease prediction. The study evaluates Logistic Regression, Support Vector Machine, K Nearest Neighbor, Decision Tree, and Random Forest using a publicly available heart disease dataset. The models are assessed based on accuracy, precision, recall, F1 score, and overall predictive performance. Experimental results indicate that ensemble based methods such as Random Forest achieve improved classification accuracy compared to traditional models. The paper highlights the importance of selecting appropriate Machine Learning algorithms for medical risk prediction and discusses their strengths, limitations, and practical applications in clinical decision support systems.
Keywords: Machine Learning, Cardiovascular Disease, Classification Algorithms, Logistic Regression, Random Forest, Healthcare Analytics, Disease Prediction
-
INTRODUCTION
Cardiovascular disease continues to pose a major global health challenge. Early identification of high risk individuals can significantly reduce mortality through preventive interventions.
Traditional diagnostic approaches rely on clinical expertise and manual evaluation, which may be time consuming and prone to variability.
Machine Learning provides data driven solutions capable of identifying hidden patterns within medical datasets. Classification algorithms can analyze patient attributes such as age, cholesterol level, blood pressure, heart rate, and lifestyle factors to predict the likelihood of cardiovascular disease.
This paper aims to:
Implement multiple classification algorithms for disease prediction. Compare their predictive performance. Identify the most effective model for cardiovascular risk assessment.
-
BACKGROUND: MACHINE LEARNING IN HEALTHCARE
Machine Learning has become an essential tool in healthcare analytics. Supervised learning techniques are widely used for disease classification tasks. These models learn from labeled datasets to predict outcomes for new patient data.
In cardiovascular prediction systems, structured clinical datasets are used to train algorithms that can classify patients into risk categories. Proper preprocessing, feature selection, and evaluation are critical for achieving reliable results.
-
CLASSIFICATION ALGORITHMS USED
-
-
Logistic Regression
Logistic Regression is a statistical classification algorithm used for binary outcomes.
Advantages:1. Simple and interpretable.
2. Computationally efficient.
Limitations: 1.Limited performance for complex nonlinear relationships.
2.Best Use Case: Baseline medical prediction model.
-
Support Vector Machine
Support Vector Machine constructs optimal hyperplanes for classification.
Advantages:1.Effective in high dimensional data.
2. Good generalization performance.
Limitations: 1.Sensitive to parameter tuning. Best Use Case: Complex classification tasks.
-
K Nearest Neighbor
K Nearest Neighbor classifies data based on similarity measures.
Advantages: 1. Simple implementation.
2. No training phase.
Limitations: 1.Computationally expensive for large datasets. Best Use Case: Small to medium sized datasets.
-
Decision Tree
Decision Tree models classify data using hierarchical rule based splits.
Advantages:1.Easy to interpret.
-
Handles nonlinear relationships.
Limitations: 1.Prone to overfitting.
Best Use Case: Rule based clinical systems.
-
-
Random Forest
Random Forest is an ensemble learning technique that combines multiple decision trees.
Advantages: 1.High accuracy
-
Reduces overfitting
-
Handles missing data effectively Limitations: 1.Less interpretable than simple models. Best Use Case: Medical risk prediction
-
COMPARATIVE ANALYSIS
-
RESULTS AND DISCUSSION
The comparative evaluation indicates that Random Forest achieves superior predictive performance due to its ensemble structure and ability to reduce overfitting. Support Vector Machine also demonstrates strong classification capability but requires careful parameter tuning. Logistic Regression provides a reliable baseline model with good interpretability.
The results confirm that Machine Learning algorithms can effectively assist in early cardiovascular disease prediction and support clinical decision making.
-
CHALLENGES
-
Imbalanced medical datasets. Risk of overfitting.
-
Need for feature selection.
-
Limited interpretability of ensemble models. Data privacy concerns.
-
-
FUTURE WORK
-
Integration with deep learning techniques.
-
Hybrid ensemble models.
-
Explainable AI methods for medical transparency.
-
Real time hospital decision support systems.
-
-
CONCLUSION
This study presents a comparative analysis of classification algorithms for cardiovascular disease prediction. The findings demonstrate that ensemble based methods such as Random Forest outperform traditional models in predictive accuracy. Machine Learning techniques show strong potential as assistive tools for early disease detection and preventive healthcare analytics. Future research should focus on improving interpretability and
real world deployment in clinical environments.
REFERENCES
-
UCI Machine Learning Repository, Heart Disease Dataset.
-
Detrano, R. et al., International application of a new probability algorithm for the diagnosis of coronary artery disease.
-
Breiman, L., Random Forests, Machine Learning Journal.
-
Cortes, C. and Vapnik, V., Support Vector Networks.
-
Hosmer, D. and Lemeshow, S., Applied Logistic Regression.
