🌏
Quality Assured Publisher
Serving Researchers Since 2012
IJERT-MRP IJERT-MRP

Evaluating the Performance and Predictive Power of Machine Learning Classifiers in Cardiovascular Disease Risk Prediction using Biosensor Datasets

DOI : 10.17577/IJERTCONV13IS06045
Download Full-Text PDF Cite this Publication
Text Only Version

 

Evaluating the Performance and Predictive Power of Machine Learning Classifiers in Cardiovascular Disease Risk Prediction using Biosensor Datasets

Babalola Abayomi

Federal Polytechnic,

Ile-Oluji, Nigeria abababalola@fedpolel.edu.ng

Akingbade Kayode Federal University of Technology,

Akure, Nigeria kfakingbade@futa.edu.ng

Brendan ubochi

Deaprtment of Electrical Electronics, Federal University of Technology, Akure, Nigeria bcubochi@futa.edu.ng

Faiyaz Ahamad

Department of Computer science and Engineering, Integral University Lucknow

faiyaz@iul.ac.in

Manish Tripathi computer science and

engineering department integral University Lucknow mmt@iul.ac.in

Medical Superintendent Integral Institute of Medical Sciences& Research sbelal@iul.ac.in

Syed Belal Hassan

AbstractFor real-time health monitoring, wearable smart sensors have become a game-changer, especially when it comes to treating cardiovascular diseases (CVD). The aim of this work is to build a wearable system driven by machine learning for continuous vital sign monitoring including oxygen saturation, blood pressure, and heart rate. Combining predictive modelling, preprocessing, and real-time data collecting helps the system enhance early diagnosis and risk assessment of CVD. Using accuracy, precision, recall, and AUC-ROC criteria, performance of numerous machine learning classifiers was evaluated. Gradient Boosting and XGBoost were the models displaying excellent prediction performance according to the results. Wearable technology has promising potential for preventive healthcare due to its non-invasive, cost-effective, and efficient cardiovascular monitoring method. Future studies will focus on streamlining the clinical application and raising the capacity of the system to accept more physiological indicators.

KeywordsWearable Sensors, Cardiovascular Disease Monitoring, Machine Learning, Smart Healthcare, Predictive, analytics

  1. Introduction

    Cardiovascular diseases (CVDs) are a significant global health concern, responsible for one-third of all anticipated fatalities in 2019. The risk factors include physical inactivity, alcohol consumption, diabetes, smoking, and hypertension. Low educational attainment, pollution, socioeconomic disparities, and insufficient access to healthcare exacerbate the financial burden of cardiovascular disease (CVD)[2]. Conventional pharmaceutical interventions including statins, ACE inhibitors, and -blockers are crucial; yet, due to possible adverse effects, they necessitate meticulous oversight[7]. Emerging technologies such as digital health technology, regenerative medicine, and gene therapy[7] provide promising avenues for cardiovascular disease management.

    For traditional risk prediction systems, constraints in data quality, model fairness, and inductive reasoning cause current challenges. Wearable biosensors enable predictive analytics[14] by providing real- time health monitoring capabilities for CVD prediction, therefore supporting continuous assessments. Early identification of possible health hazards made possible by these sensors lets individualised healthcare treatments possible. Reliable health monitoring must thus address issues including data privacy, sensor accuracy, and battery life

    .

    Mathematical approaches and machine learning have demonstrated great potential in improving risk evaluations and supporting quick interventions[6]. Predicting CVD risk has seen great accuracy rates from models such XGBH and ensemble learning. CNNs and ALSTM deep learning models let one extract intricate patterns from big datasets, hence enhancing prediction accuracy. Real- time apps and web-based dashboards help clinicians and patients to readily use these models since they enable real-time risk assessment. Improving diagnosis, customized treatment regimens, predictive analytics all of which have transformed healthcare artificial intelligence (AI) which made possible Early disease identification by predictive analytics encourages patients to take proactive health activities. Machine learning algorithms can detect high-risk patients, decrease healthcare expenditures, and enhance diagnostic accuracy. Nonetheless, obstacles such as data sparsity and interpretability remain a thing of concern. Unlike previous studies that primarily relied on static clinical data, this work integrates real-time biosensor data with advanced machine learning models to improve cardiovascular disease risk prediction. The study uniquely evaluates multiple classifiers on wearable sensor datasets, identifying optimal models for real-time CVD monitoring.

  2. Literature Review
    1. Cardiovascular Disease Risk Prediction

      The methods of recent machine learning have outshined conventional models such as the Framingham Risk Score in

      the performance and evaluation of cardiovascular disease risk. Though its predictive value is low. The Framingham Risk Score (FRS) uses clinical analysis to assess cardiovascular risk, producing C-statistics of around 0.6565[5]. Conventional methods did better during analysis using random forests and deep learning, among other machine learning models, which have accuracy rates of up to 98.57% [4] and AUCs of up to

      0.92 [8]. These Machine learning algorithms, particularly AutoPrognosis, can identify new risk factors, particularly for vulnerable groups such as diabetics [1]. Machine learning techniques improve early identification of the risk by using biosensor data, such as blood pressure and heart rate monitoring [6]. Data population, processing power, and model interpretability remain the main issues[8]. This can be solved with the Combination of machine learning with conventional techniques which results in a hybrid approach that improves patient outcomes and individualised risk assessments. When complex algorithms like decision trees and neural networks are used, it will be revolutionizing healthcare and enhancing diagnostics using machine learning models, clinical decision- making, and patient outcomes will be improved [16][8]. Using biosensors that jointly analyze vital signs like heart rate and oxygen saturation using machine learning helps to detect cardiovascular disease in real time, hence improving predictive accuracy [5]. Furthermore, privacy, data integrity, and regulatory compliance raises more concern for more research and proof-reading [12]. Early intervention and preventive treatment could be revolutionized by wearable biosensors coupled with machine learning for[10], While studies such as [5] and [4] have shown high prediction accuracy using machine learning models, challenges such as data bias, real-time processing constraints, and model interpretability remain unaddressed. This study aims to bridge these gaps by integrating wearable sensor data with ensemble learning techniques.

    2. Machine Learning in Healthcare

      Machine learning models assist in clinical decision-making by predicting the outcomes such as death, cardiac arrest, stroke and acute kidney injury using patient Health record [3][15]. Privacy of data, ease of model interpretation, algorithmic robustness, and integration into clinical workflows [9] are still very important. Training and education would help to close the knowledge discrepancy in ML among various medical practitioners. We describe promising directions for clinical decision support in causal machine learning (CML). Using the noninvasive transfer of real-time physiological data, wearable biosensors are revolutionizing health monitoing by enabling the early identification and treatment of diseases, including cardiovascular disease (CVD). These tools find biomarkers in biofluids like saliva, sweat, and interstitial fluid [10]. We can use them in place of the traditional methods. Some machine learning classifiers, like random forests, support vector machines, and neural networks, are excellent at making accurate predictions about cardiovascular disease [5]. Using the clinical databases, data accuracy, and feature selection limit the credibility of machine learning models.

      Comprehensive cohort validation studies are essential to guarantee clinical acceptance and dependability [12].

      Often producing conflicting results, risk prediction models for cardiovascular disease (CVD) include ATP-III, Framingham, Pooled Cohort Equations, and SCORE lead to differences in clinical practice. Particularly in diabetes [1][3], advances in machine learning (ML) best shown by AutoPrognosis increase predictive accuracy by the inclusion of non-traditional variables, surpassing conventional models and refining risk categorization. Deep learning models like CNNs and RNNs analyze complex medical data for the prompt anomaly detection (Wu, 2024). Random forests and gradient boosting among machine learning methods routinely outperform tools including QRISK3 and ASCVD scores [5]. Still, challenges in applying these ideas for specific populations call for either recalibration or the development of new models [11]. While multimodal deep learning improves precision medicine, models including SCORE2 and LIFE-CVD2 address these difficulties (Bayappu et al, 2024).

  3. Methodology
    1. System Architecture

      Achieving the set objective of the research calls for evaluation of the system architecture for cardiovascular disease (CVD) risk prediction while integrating biosensors and machine learning models to enhance early detection and CVD risk assessment. Figure 1 displays the Biosensors which continuously collect vital signs data such as heart rate, blood pressure, and oxygen saturation. This will in turn be processed and analyzed using various machine learning classifiers like Random Forest and Gradient Boosting etc. These models assess algorithm performance and predictive power, generating risk scores based on historical and real-time patient data to identify the model based on the datasets to be adopted for the prediction. The central system, Machine Learning and Biosensors in CVD Risk Prediction, acts as a bridge between data collection and predictive analytics, ensuring efficient health monitoring. Using machine learning as a leveraging, the system provides CVD accurate risk predictions, enabling healthcare professionals to take proactive measures in patient care. Real-time monitoring and early warning alerts improve CVD prevention, making this approach a valuable tool for personalized healthcare and reducing cardiovascular-related mortality.

      Figure 1: Machine learning Classifier System architecture

    2. Datasets

      The Biosensor data used in this research is a structured dataset that enhances medical Dynamic, real-time cardiovascular health monitoring which was made possible through sources like Ondo State Teaching Hospital and General Hospital Ile- Oluji to help in uncovering early disease diagnosis and risk trends. The Combination of physiological and lifestyle datasets helps to provide more accurate diagnosis, customized treatment plans, and proactive healthcare management, which helps to avoid cardiovascular disease and hence improve patient outcomes. More validation and integration into healthcare systems can only help to expand their clinical importance. Each data was taken from the patient using various biosensor devices including cuff-based BP digital monitors, ECG sensors, digital weighing scales, pulse oximeters, and infrared temperature sensors. The dataset also includes records of important demographic information like age, BMI (calculated), weight, and height. Preparing the foundation Laying for the predictive analysis, the target variable which shows either 1; the presence of a cardiovascular condition or 0; its absence. Dynamic, real-time cardiovascular health monitoring made possible by this complete collection allows one to observe early CVD disease detection and risk trends. Combining lifestyle and physiological factors helps to enable more accurate diagnosis, individualised treatment methods, and proactive healthcare management, which serve to prevent cardiovascular disease and hence improve patient outcomes.

    3. Machine Learning Classifiers

      Before data Engineering, the dataset was preprocessed prior to the training, this process involved the use of mean imputation to handle the missing values, then the Min-Max scaling to normalize features, and lastly the IQR-based outlier removal. Fourteen Machine learning models were used for the analysis in which Gradient Boosting, XGBoost, and Random Forest were chosen for their interpretability, robustness against overfitting, and ability to manage imbalanced datasets. These models provide a more comprehensive explanation for clinical decision-making than deep learning models. GridSearchCV with cross-validation was used to optimize hyperparameters and enhance the performance of the models to attain the best results.

      1. Using Classifiers and Evaluation Metrics for Cardiovascular Disease (CVD) Prediction

        To predict cardiovascular disease (CVD), machine learning classifiers are trained on patient data, such as blood pressure, heart rate, cholesterol levels, and other risk factors. These classifiers produce predictions that are evaluated using performance metrics to ensure reliability. Table 1 outline how the classifiers and their mathematical foundations link to the evaluation metrics in CVD prediction.

        Table 1: classifier equations and associated evaluation metrics

        1. Comparative Analysis of the Evaluation Performance

          The ensemble models i.e. (XGBoost, Gradient Boosting) displayed a great result Compared to traditional models like Logistic Regression and Decision Tree during Evaluation metrics performance achieving an AUC-ROC above 0.98. This aligns with findings from [5] Dalal et al. (2023) but outperforms their baseline accuracy of 92.1%.

          Table 2: Performance Metrics of Various Machine Learning Models for Cardiovascular Disease Risk Prediction

        2. Ethical Considerations and Study Limitations

      This research work acknowledges the potential biases in the dataset used for the analysis, including demographic anomalies and sensor variability. In addition, ethical considerations, like patient data privacy and adherence to the standard HIPAA/GDPR regulations, remain germane for real- world deployment. Future research work should incorporate diverse datasets and federated learning approaches to mitigate bias.

      Table 2 shows the efficacy in cardiovascular disease (CVD) risk prediction, Linear Discriminant Analysis (LDA) and Ridge Classifier had the highest accuracy (0.946058) and F1- scores (0.943231), confirming there why they can be considered for use in the prediction of CVD. The Random Forest model, with an accuracy of 0.921162, demonstrated the highest precision (0.970874) but the lowest recall (0.862069),

      making the model a very good choice for the prediction of cvd because of its positive predictions though, it may miss some circumstances as depict in fig3.

      Figure 2: Comparison of PCA and t-SNE Projections for Data Visualization

      Figure 3: Classifier Performance (AUC-ROC Score)

      Machine learning models applied in classification challenges were evaluated using metrics including accuracy, precision, recall, F1-score, and AUC-ROC. Improved visualization and feature selection were achieved using PCA and t-SNE as dimensionality reducing methods fig 2. The outcomes showed that ensemble techniques like Random Forest, Gradient Boosting, and XGBoost performed well, with Random Forest striking a good compromise between recall and precision. Gradient Boosting and XGBoost improved accuracy by integrating weak learners. Among traditional classifiers, Linear Discriminant Analysis (LDA) and Ridge Classifier achieved the highest accuracy. There was a better performance in some machine learning models like Naïve Bayes, K-Nearest Neighbours (KNN), and Decision Tree. Nonetheless, Neural Networks (MLP) and Support Vector Machines (SVM)

      demonstrated a diminished recall, indicating a trade-off in sensitivity compared to other models.

  4. Results And Discussion

    We evaluated several machine learning models using important benchmarks. Using performance measures including recall, accuracy, precision, F1-score, and AUC-ROC, we evaluated how well these models classified items at class level. As shown in Figure 1, we used t-SNE and Principal Component Analysis (PCA) to reduce dimensionality hence enhancing the interpretability and visualization of the model. These techniques enhanced data distribution and separation of different classes, therefore enabling feature selection and model optimization. Random Forest, Gradient Boosting, and XGBoost among other ensemble learning methods have good categorization results. Since Random Forest shows a balance between accuracy and Its recall, it is a consistent choice for forecasting operations. Using multiple XGBoost and gradient boosting improved classification accuracy even further. Students who struggle might improve the model’s general resilience. Linear Discriminant Analysis (LDA) and Ridge Classifier obtained the best accuracy among conventional classifiers, therefore supporting their respective efficacy in managing organized data. Notable were also Naïve Bayes, K- Nearest Neighbours (KNN), and Decision Tree performance.

  5. Conclusions

The machine learning models to be classified were subjected to performance evaluation using the following metrics like, precision, recall, F1-score, and AUC-ROC. The results obtained were better as observed graphically and feature selection was achieved using PCA and t-SNE as methods adopted for dimensionality reduction. From the result findings it showed that ensemble techniques like Random Forest, Gradient Boosting, and XGBoost depict a great performance in that Random Forest provides a reasonable mix between recall and precision. When comparing the Combinations of weak learners like gradient boosting and XGBoost enhanced accuracy still came out better in performance. Out of all the conventional classifiers, Ridge Classifier and Linear Discriminant Analysis (LDA) attained the best accuracy. In addition, some other models like Naïve Bayes, K-Nearest Neighbours (KNN), and Decision Tree also did better. A trade-off in sensitivity was observed to have a reduced recall of Support Vector Machines (SVM) and Neural Networks (MLP) when compared to other models.

Recommendation

Using wearable smart sensors and machine learning for real- time cardiovascular disease monitoring is investigated in this work. The system monitors vital signs, therefore improving early identification and action. Predicting CVD risks, the models Gradient Boosting, XGBoost, Random Forest

showered higher accuracy. Still, problems include sensor quality, data variability, and real-time processing call for more optimization. Sensor calibration, clinical studies, and electronic health record integration should be the main topics of next studies.

References

  1. Alaa, A., Bolton, T., Di Angelantonio, E., Rudd, J., & Van Der Schaar, M. (2019). Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants. PLoS ONE, 14. https://doi.org/10.1371/journal.pone.0213653.
  2. Berner-Rodoreda, A., Kanyama, C., Supady, A., & Bärnighausen, T. (2023). Cardiovascular Diseases (pp. 157162). Springer International Publishing. https://doi.org/10.1007/978-3-031-33851-9_24
  3. Chen, P., Liu, Y., & Peng, L. (2019). How to develop machine learning models for healthcare. Nature Materials, 18, 410-414.

    https://doi.org/10.1038/s41563-019-0345-0

  4. Dalal, S., Goel, P., Onyema, E., Alharbi, A., Mahmoud, A., Algarni, M., & Awal, H. (2023). Application of Machine Learning for Cardiovascular Disease Risk Prediction. Computational Intelligence and Neuroscience.

    https://doi.org/10.1155/2023/9418666.

  5. Liu, T., Krentz, A. J., Lü, L., & urin, V. (2024). Machine learning based prediction models for cardiovascular disease risk using electronic health records data: systematic review and meta-analysis. European Heart Journal. https://doi.org/10.1093/ehjdh/ztae080
  6. Weng, S., Reps, J., Kai, J., Garibaldi, J., & Qureshi,

    N. (2017). Can machine-learning improve cardiovascular risk prediction using routine clinical data. PLoS ONE, 12.

    https://doi.org/10.1371/journal.pone.0174944.

  7. Narkhede, M. K., Pardeshi, A., Bhagat, R., & Dharme, G. (2024). Review on Emerging Therapeutic Strategies for Managing Cardiovascular Disease. Current Cardiology Reviews.

    https://doi.org/10.2174/011573403×29926524040508 0030

  8. Ordikhani, M., Abadeh, S., Prugger, C., Hassannejad, R., Mohammadifard, N., & Sarrafzadegan, N. (2022). An evolutionary machine learning algorithm for cardiovascular disease risk prediction. PLoS ONE,

    17. https://doi.org/10.1371/journal.pone.0271723

  9. Qayyum, A., Qadir, J., Bilal, M., & Al-Fuqaha, A. (2020). Secure and Robust Machine Learning for Healthcare: A Survey. IEEE Reviews in Biomedical Engineering, 14, 156-180.

    https://doi.org/10.1109/RBME.2020.3013489

  10. Raj, S., Agarwal, A., Tripathi, S., & Gupta, N. (2024). Prediction and Analysis of Digital Health Records, Geonomics, and Radiology Using Machine Learning. 2443.
  11. Kim, J., Campbell, A., Ávila, B., & Wang, J. (2019). Wearable biosensors for healthcare monitoring. Nature Biotechnology, 37, 389 – 406. https://doi.org/10.1038/s41587-019-0045-y.
  12. Raymond, D. A., Kumar, P., & Goureshettiwar, P. (2024). Intergration of Wearable Biosensors and Data Analytics for Remote Health Monitoring. 707713. https://doi.org/10.1109/icoici62503.2024.10696310
  13. Sanchez, P., Voisey, J., Xia, T., Watson, H., O’Neil, A., & Tsaftaris, S. (2022). Causal machine learning for healthcare and precision medicine. Royal Society Open Science, 9. https://doi.org/10.1098/rsos.220638
  14. Sayed-Ahmed, M. Z., Limkar, S., El-Bahkiry, H. S., Alam, N., & Amin, S. T. (2024). Mathematical Modelling and Deep Learning Techniques for Predicting Cardiovascular Disease. Panamerican Mathematical Journal.

    https://doi.org/10.52783/pmj.v34.i4.1880

  15. Shamout, F., Zhu, T., & Clifton, D. (2020). Machine Learning for Clinical Outcome Prediction. IEEE Reviews in Biomedical Engineering, 14, 116-126. https://doi.org/10.1109/RBME.2020.3007816
  16. Zhou, X. K. (2024). A study of machine learning applications in healthcare. Applied and Computational Engineering, 102(1), 128133. https://doi.org/10.54254/2755-2721/102/20241057