A Review on Prediction of Cardiovascular Disease on Machine Learning Techniques

DOI : 10.17577/IJERTCONV10IS04026

Download Full-Text PDF Cite this Publication

Text Only Version

A Review on Prediction of Cardiovascular Disease on Machine Learning Techniques

Nivyamol P Varghese M.Tech, CSE Dept.

Mangalam College of Engineering Ettumanoor, Kottayam, India

Dr. Sahaya Kingsly Prof., CSE Dept

Mangalam College of Engineering Ettumanoor, Kottayam, India

AbstractCardiovascular diseases are considered due to the fact the most lifestyles- threatening syndromes with the best mortality rate globally. As in step with the distinctive evaluation about eighty-four million human beings in this USA be afflicted by a few shapes of cardiovascular ailment, inflicting approximately 2, two hundred deaths a day, averaging one loss of life every forty seconds. Almost one out of each 3 deaths consequence ofcardiovascular ailment. In this task, we're the use the MaLCaDD framework to get the most accuracy with immoderate precision and the validation of the framework is finished thru three benchmark datasets (i.e. Framingham, Heart Disease, and Cleveland), and the accuracies of 99.1%, 98.0%, and

    1. % are completed respectively through the use of the KNN and logistic regression method. Finally, the comparative assessment proves that MaLCaDD predictions are greater accurate (with a reduced set of features) in the evaluation of the winning contemporary-day methods. Therefore, MaLCaDD is exceptionally reliable and can be carried out in a real environment for the early analysis of cardiovascular diseases.

      Keywords- Prediction; Cardiovascular Diseases; MaLCaDD framework; HRFLM; Decision tree; Random forest; K-nearest neighbors; Machine Learning.


        It is durable to find out coronary heart sickness due to numerous contributory hazard elements including diabetes, excessive blood pressure, excessive cholesterol, atypical pulse price, and plentyof different elements. Various strategies in statistics mining and neural networks were hired to discover the severity of coronary heart sickness amongst humans. The severity of the sickness is classed primarily based totally on diverse techniques like K -Nearest Neighbor Algorithm (KNN),Decision Trees (DT), Genetic set of rules (GA), and Naive Bayes (NB). The nature of coronary heart sickness is complicated and hence, the sickness should be treated carefully. Not doing so may also influence the coronary heart or purpose untimely death. The angle of clinical technology and statistics mining is used for coming across diverse kinds of metabolic syndromes. Data mining with type performs a giant position withinside the prediction of coronary heart sickness and statistics investigation.

        We have various techniques that were used for information abstractionwith the aid of using the use recognized techniques of statistics mining for the prediction of coronaryheart sickness. In this work, several readings were achieved to provide a prediction version of the use of now no longer most effective awesome based ontherelated

        strategies however additionallywith the aid of usingtouching on or extra strategies. These amalgamated new strategies are typically referred to as hybrid techniques We introduce neural networks with the use of coronary heart price time series. This approach makes use of diverse medical data for prediction including Left package deal department block (LBBB), Right package deal department block (RBBB), Atrial fibrillation (AFIB), Normal Sinus Rhythm (NSR), Sinus bradycardia (SBR), Atrial flutter (AFL), Premature Ventricular Contraction (PVC)), and Second diploma block (BII) to discover the precise circumstance of the affected person about coronary heart sickness. The dataset with a radial foundation characteristic community (RBFN) is used for type, in which 70% of the statistics are used for schooling and the last 30% is used for type We additionally introduce a Computer- Aided Decision Support System (CADSS) withinside the discipline of medication and studies. Inpreceding work, the use of statistics mining strategies withinside the healthcare enterprise has beenproven to take much less time for the prediction of sickness with extra correct consequences We recommend the prognosis of coronary heart sickness the use of the GA. For experimental validation, weuse the famous Cleveland dataset that is accumulated from a UCI device to get to know the repository. We will see in a while how our consequences show to be distinguished while in comparison to several the recognized supervised getting to know strategies The maximum effective evolutionary set of rules Particle Swarm Optimization (PSO) is delivered, and a few policies are generated for coronary heart sickness. The policies were carried out randomly with encoding strategies which bring about the development of the accuracy overall. Heart sickness is anticipated primarily based totally on signs namely, pulse price, sex, age, and plenty of others. Over a duration of time, they have ended up very common area and are overstretching the healthcare systems of nations the results of coronary heart ailment and hazard elements may also display up in people as raised blood pressure, raised blood glucose, raised blood lipids, and obese and obesity. These intermediate dangers elements may be measured in number one care centers and imply an extended hazard of coronary heart attack, stroke, coronary heart failure, and different complications. The ML set of rules with Neural Networks is delivered, whose consequences are extra correct and dependable as we've visible in.

        The common challenges in the prediction of cardiovascular diseases are improving accuracy and missing values in the data that highly affect the accuracy of the model. In addition to that, the problem of class imbalance. Similarly, features of the dataset largely affect the accuracy and computational complexity of the machine learning process.

        The rest of the paper is organized in the following way. Section I represents an Introduction. Section II presents a literature review on the prediction of cardiovascular disease approaches. Section III provides the details of previous research models, performances, and their comparison. Section IV describes the conclusion.


There is ample related work in the fields directly related to this paper. First, we are introducing the Ambedkar and Phalinkar paper in 2018, [1]. In that paper, Data evaluation performs a sizable position in dealing with a big quantity of information withinside the healthcare. The preceding scientific research was primarily based totally on dealing with and assimilating a massive quantity of medical institution information rather than prediction. Due to a considerable quantity of information increasing withinside the biomedical and healthcare subject the correct evaluation of scientific information turns into propitious for in advance detection of sickness and affected person care. However, the accuracy decreases while the scientific information is partly lacking. To conquer the hassle of lacking scientific information, we carry out information cleansing and imputation to convert the unfinished information to finish information. We are operating on coronary heart sickness prediction on the premise of the dataset with the assistance of Naïve Bayesand the KNN set of rules. To amplify this work, we endorse the sickness hazard prediction and theuse of based information. We use a convolutional neural community-p based unimodal sickness hazard prediction set of rules. The prediction accuracy of the CNN-UDRP set of rules reaches extra than 65%. Moreover, this machine solves the query associated with sickness that human beings face in their life.

Fig 1. Block diagram of CNN-UDRP, Ambedkar, and Phalinkar [1]

The secondary related work is one of the famous papers published in 2018, [2] written byMienye. The prediction of coronary heart ailment is a project in medical device gaining knowledge. Early detection of humans susceptible to the ailment is essential in stopping its progression. This paper proposes a deep gaining knowledge of methods to reap advanced prediction of a coronary heart ailment. A greater stacked sparse autoencoder community (SSAE) is evolved to reap green function gaining knowledge The community includes more than one sparse autoencoder and a SoftMax classifier. Additionally, in deep gaining knowledge of models, the algorithm's parameters want to be optimized accurately to reap green overall performance. Hence, we suggest a particle swarm optimization (PSO) primarily based approach to song the parameters of the stacked sparse autoencoder. optimization via way of means of the PSO improves the function gaining knowledge of andtype overall performance of the SSAE. Meanwhile, the multilayer structure of autoencodersgenerally ends in an inner covariate shift, trouble that influences the generalization capacityof the community; hence, batch normalization is delivered to save you this trouble. The experimental effects display that the proposed technique efficiently predicts coronary heartailment via way of means of acquiring a typing accuracy of 0.973 and 0.961 at the Framingham and Cleveland coronary heart ailment datasets, respectively, thereby outperforming different devices gaining knowledge of strategies and comparative studies.

Fig 2. Block diagram of XG boost classifier, Mienye [2]

In 2020,[3] Samira diouzi research has been made on machine learning-based identification of patients with Cardiovascular sicknesses was for a long term one of the important clinicalproblems. As indicated by the World Health Association, coronary heart illnesses are the best factor of the ten main motives for death. Correct and early identity is an essential step in rehabilitation and treatment. To diagnose coronary heart defects, it might be vital to enforce a machine capable of expecting the lifestyles of coronary heart sicknesses. In the contemporary article, our important motivation is to increase a powerful clever clinical machine primarily based totally on device mastering techniques, to resource in figuring out an affected persons coronary heart circumstance and manual a medical doctor in making acorrect prognosis of whether or not or now no longer an affected person has cardiovascularsicknesses. Using more than one information processing technique, we cope with the hassle of lacking information in addition to the hassle of imbalanced information withinside the publicly to be had UCI Heart Disease dataset and the Framingham dataset. Furthermore, weuse device mastering to pick out the simplest set of rules for predicting cardiovascular sicknesses. In this work, victimization accommodative adaptive neuro-fuzzy inference system(ANFIS), We have tried to predict this unhealthiness. The speed and therefore the validity of the prompt formula is more than the opposite sensible strategies used. The strategy planned during this article, with a 10% validity increase throughout experimentation incorporates a better performance than previous sensible strategies. Other strategies for the detection of this illness are planned, together will strategies based on computing like fuzzy algorithms for pattern recognition in characteristic extraction, SVM formula, genetic formula, RBF neural networks, MLP neural networks, and strategies supported by the Bayesian model. A brand new technique for predicting polygenic disease, victimization accommodative Neural Fuzzy logical thinking system (ANFIS), is planned. Development and analysis of the model have been performed using real knowledge sets. Mathematician membership functions and hybrid algorithms have been used for network coaching. The required model was created by making ready the desired data concerning polygenic disease from existing information consistent with the results, the model is ready to predict polygenic disease with success. The strategy planned during this study is taken into account a brand new approach to the diagnosing of this unhealthiness thats quicker and additional correct than

previous strategies methods. Different metrics, which include

accuracy, sensitivity, F-measure, and precision, had been used to

check our machine, demonstrating that the proposed technique is appreciable.

The last two papers are based on comprehensive strategy studies written by Wilson and sarulatha. Wilson papers were designed and executed within the MEDLINE, Embase, and Scopus databases from database inception through March 15, 2019, [4]. The primary outcome was a composite of the predictive ability of ML algorithms for coronary artery disease, heart failure, stroke, and cardiac arrhythmias. Of 344 total studies identified, 103 cohorts, with a total of 3,377,318 individuals, met our inclusion criteria. For the prediction of coronary artery disease,boosting algorithms had pooled area under the curve (AUC) of 0.88 (95% CI 0.840.91), and custom-built algorithms had a pooled AUC of 0.93 (95% CI 0.850.97). For the prediction of stroke, support vector machine (SVM) algorithms had a pooled AUC of 0.92 (95% CI 0.810.97), boosting algorithms had pooled AUC of 0.91 (95% CI 0.810.96), and convolutional neural network (CNN) algorithms had a pooled AUCof 0.90 (95% CI 0.830.95) same we are compared to the sarlutha most of the algorithms also [5].


      Predictable attribute

      1. Diagnosis(value Heal: < 50% diameter narrowing

(no heart disease);value Sick: > 50% diameter narrowing(has heart disease))

Key attribute

1. PatientID- Patients identification number

Input attributes

  • Sex (value 1:Male:value 0: Female)

  • Chest Pain Type (value 1: typical type 1 angina, value typical typeangina, value 3: non-angina pain; value asymptomatic)

  • Fasting Blood Sugar (value 1: > 120 mg/dl; value 0:< 120 mg/dl)

  • Restecg – resting electrographic results(value 0: normal: value 1.1having reversible defect)

  • Trest Blood Pressure(mm Hg on admission to the hospital)

  • Serum Cholesterol (mg/dl)

  • Thalach maximum heart rate.

    Predictable attribute Diagnosis

    • Value Healthy: No heart disease value

    • Sick: has Heart disease

Reduced Input attributes:

  • Type Chest Pain Type

  • Rbp Resting blood pressure

  • Eia Exercise-induced angina

  • Oldpk Old peak

  • Vsl No. of vessels colored

  • Thal Maximum heart rate achieved

Fig 3. Cleveland Dataset










Cleveland dataset

Model emphasis on distinct combinations of parameters with classification techniques of ML.

Accuracy is less than 90% for the results.



Cleveland database

Predict cardiovascular diseases with a small number of attributes.

Most difficult to reach results via algorithms.

CNN and Adaboost



CNN algorithm has achieved the highest accuracy. Deep learning, it can be extended with feature selection algorithms.

High error values and hard to evaluate the interpretation.

Yolo v5


A visual VFM

The results of

and deep

analysis method

the experiments



based on YOLO

on the


Model and

evaluation of



the Ultrasonic


Equation is

apical long-axis


view show that

Positioning and

the proposed

tracking of the

method only

myocardial wall

improves the

are realized

accuracy of

based on the


YOLO model

but also

and IBM. The

provides a new

proposed VFM

evaluation basis

method provides

for cardiac

a new basis for


the evaluation of


cardiac function.



A comparative




analysis of

selection has



interpretation is


difficult to find

algorithms on

out and this

the Cleveland

leads to more

dataset has been

chances to find

performed with

out the more

good accuracy.


This work has

that implies the

used ensemble

results have

algorithms such

only a small

as bagging,


boosting, and

compared to


another one.

Table 1: Comparative study of prediction of cardiovascular diseases based on previous research.

In this prediction models are developed with certain features, and it compares theaccuracy, class error, precision, F- measure, sensitivity, and specificity. The maximum accuracy is completed through the HRFLM class technique in assessment with present techniques.

The UCI dataset is further classified into 8 types of datasets based on classification. Each dataset is further classified and processed by python coding. The results are generated by applying the classification rule for the dataset. The classification rules are generated based onthe rule after data pre-processing is done. After pre-processing, the datas three best ML techniques are chosen, and the results are generated. The various datasets with DT, RF, and LM are applied to find out the best classification method. The results show that RF and LM are the best. The error rate for dataset 4 is high (20.9%) compared to the other datasets. The LM method for the dataset is the best (9.1%) compared to DT and RF methods. We combine theRF method with LM and propose a hybrid method to improve the results.


Specifying the processing of herbal healthcare facts of coronary heart facts will assist withinside the long time Saving of human lives and early detection of abnormalities in coronary heart conditions. Machine getting to know Techniques had been utilized in this toclear out facts and deliver a substitute and novel knowledge Towards coronary heart situation. Heart ailment projection is worrying and enormous withinside the medical. However, the mortality ratio may be considered regulated if the ailment is discovered at the primary Stages and preventive standards are normal as quickly as feasible. Further extensionof this look could be very suitable to direct the investigations to real-international datasets in place of simply theoretical Approaches and simulations. The proposed hybrid HRFLM technique is used by combining the traits of Random Forest(RF) and Linear Method (LM). HRFLM proved to be correct withinside the predictionof coronary heart sickness. The destiny route of this study may be per- fashioned with various combos of devices getting to know strategies to higher prediction strategies.Furthermore, new feature- techniques may be evolved to get a broader perception of the sizable functions to boom the overall performance of coronary heart sickness prediction.


[1] S. Ambedkar and R.Phalnikar, Disease Risk Prediction by Using Convolutional Neural Network, Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), 2018.

[2] Ibomoiye I Mienye and Yanxia Sun, Improved Heart Disease Prediction Using Particle Swarm Optimization Based Stacked Sparse Autoencoder, IEEE Access, vol. 6, 2018, pp.1- 5.

[3] Samira diouzi and B.S. Saini, Detection of coronary heart disease by reduced features and extreme learning machine, Medicine and Pharmacy Reports, vol.91, pp.166-175,2020.

[4] W.H. Wilson Tang and Zhen Wang, Machine learning prediction in cardiovascular diseases: a meta-analysis, Springer, Charm, 2020.

[5] M. Swathy and K. Saruladha, A comparative study of classification and prediction of Cardio-Vascular Diseases(CVD)using Machine Learning and Deep Learning techniques, in proc.11th Int.Conf. Hum. Syst.Interact.(HSI), July.2021, pp.1-8.

[6] D. C. Yadav and S. Pal, Prediction of heart disease using feature selection and random forest ensemble method, Int. J. Pharmaceutical Res., vol. 12, no. 4, 2020.

[7] N. Kumar and K. Sikamani, Prediction of chronic and infectious diseases using machine learning classifiersA systematic approach, Int. J. Intell. Eng. Syst., vol. 13, no. 4, pp. 1120, 2020.

[8] K. Uyar and A. Ilhan, Diagnosis of heart disease using genetic algorithm based trained recurrent fuzzy neural networks, Procedia Comput. Sci., vol. 120, pp. 588593, Jan. 2017.

[9] L.Wang, W.Zhoug, Deep ensemble detection of congestive heart failure using short-term RR intervals, IEEE Access, vol. 7, pp. 6955969574, 2019.

[10] G. O. Rashmi and U. M. A. Kumar, Machine learning methods for heart disease prediction, Int. J. Eng. Adv. Technol., vol. 8, no. 5S, pp. 220223, May 2019.

[11] F. Miao, Y.-P. Cai, Y.-X. Zhang, X.-M. Fan, and Y. Li, Predictive modeling of hospital mortality for patients with heart failure by using an improved random survival forest, IEEE Access, vol. 6, pp. 72447253, 2018.

[12] A. U. Haq, J. P. Li, M. H. Memon, S. Nazir, and R. Sun, A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms, Mobile Inf. Syst., vol. 2018, pp. 121, Dec. 2018.

[13] W. Wiharto, H. Kusnanto, and H. Herianto, Hybrid system of tiered multivariate analysis and artificial neural network for coronary heart disease diagnosis, Int. J. Electr. Comput. Eng., vol. 7, no. 2, p. 1023, Apr. 2017.

[14] C. A. Cheng and H. W. Chiu, An artificial neural network model for the evaluation of carotid artery stenting prognosis using a national-wide database, in Proc. 39th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC), Jul. 2017, pp. 25662569.

[15] A. A. Shetty and C. Naik, Different data mining approaches for predicting heart disease, Int. J. Innov. Sci. Eng. Technol., vol. 5, pp. 277281, May 2016.

[16] K. Padmavathi and K. S. Ramakrishna, Classification of ECG signal during atrial fibrillation using autoregressive modeling, Procedia Comput. Sci., vol. 46, pp. 5359, Jan. 2015.

[17] A. M D. Silva, Feature Selection, vol. 13. Berlin, Germany: Springer, 2015, pp. 113.

[18] A. M. De Silva and P. H. W. Leong, Grammar-Based Feature Generation for Time-Series Prediction. Berlin,

Germany: Springer, 2015.

Leave a Reply