Robust Machine Learning Algorithm for Heart Disease Prediction

DOI : 10.17577/IJERTCONV10IS12014

Download Full-Text PDF Cite this Publication

Text Only Version

Robust Machine Learning Algorithm for Heart Disease Prediction

Prajwal D C1, Manjunath Y2, Megha A3, Nimitha B L4, Prof. Rashmi K T5

Sri Krishna Institute of Technology,1,2,3,4 CSE Department, B'lore-560090, India.

Sri Krishna Institute of Technology 5 Faculty CSE Department , B'lore-560090, India.

Abstract:- As we all know cardiovascular(Heart Disease) failure infection is one of the main source of death in around the world. In present current age passings because of the cardiovascular failure has became significant issue, approximately one individual lost their life each moment as a result of heart sickness. So the significant test these days is to anticipate the Occurrence of illness in beginning phases. So to conquer this we can execute Machine Learning in medical care so it is able to do early and precise recognition of infection. In this undertaking, the emerging circumstances of cardiovascular failure ailment is determined. Datasets utilized have characteristics of clinical boundaries. The datasets are been handled utilizing ML calculation i.e., Random Forest in python. In this procedure the previous patient data is utilized to get forecast of new one in beginning phases to forestall the passings and to save life. As referenced, solid coronary illness expectation is executed utilizing Random Forest Algorithm which is solid ML Algorithm. Which read patient dataset as CSV document. Subsequent to handling the dataset the activity is carried out and successful cardiovascular failure level is created. Benefits of this proposed framework are greater precision and execution rate and it is having greater pace of progress and is very flexible.

Keywords: Heart attack disease, prediction, data sets, Machine Learning , Random Forest , CSV files, Algorithm.


    Functioning of heart effects due to heart disease. Over 1 Crore people impacted due Coronary Illness according to WHO Survey and lost their breaths. The biggest challenge we face in real world is early prediction of the occurring of the disease due to huge records and medical history. In previous identifying the disease accurately and providing medication to patients is impossible to all patients[3].

    Numerous researchers attempted to fabricate a model which is fit for anticipating the coronary illness in the beginning phase, yet they can't assemble an ideal model. Each proposed framework has disservices in its particular manner. In the current framework, Shen et al. at first had, develops a framework which depends on self attend to be. In this framework the client need to enter every one of the side effects which he is experiencing, in light of that the outcome is anticipated. This study depends on the investigation information gathered in SAQ.

    Chen et al. thought of a plan to anticipate coronary illness. He utilized the procedure of Vector Quantization which is one of the man-made reasoning methods for order and forecast reason. Preparing of brain web is performed utilizing back engendering to assess the forecast framework. In testing stage roughly 79% precision is accomplished in testing . Viable utilization of information gathered from past records is tedious. Low

    precision rate. So to beat this we are executing Random timberland calculation to accomplish exact outcomes quicker than expected. AI is given a significant need in current life in numerous applications and in medical care area

    So to come up with this we are executing Random Forest Algorithm to accomplish ideal outcome in brief time frame. AI is a first concern in present day life in the medical care industry. Forecast is a field where AI assumes a fundamental part, our subject is anticipating coronary illness by handling the patient's dataset and patient information, that is, the users we need to predict heart disease risks.

    There are certain risk factors based upon whether coronary artery disease is predicted. Risk factors include: Age, Gender, Bp, Cholesterol Levels, Family history of coronary sickness, Sugar level, Smoking, Alcohol, Being overweight, Heart rate, Chest Pain.


Every proposed system built with diverse algorithms has its own set of drawbacks.

While preprocessing the data set we use compels NaN values

. We need to switch these non-mathematical qualities over completely to mathematical qualities on the grounds that the program we use can't deal with these qualities and the Nan values are supplanted by the mean of the segment [2].

We have various calculations like KNN, and other related algorithms, every one of these are AI calculations and the information arranged from data set under preparing set is prepared utilizing these calculations [2].

The principal distinction among different choice trees is, they utilize top notch the class highlight. Tree root decreases entropy in an entropy framework and utilize data gain. At first, data gain in all credits in dataset is determined to pick tree root then the property will utilize indicated data [7].

KNN is one of the most basic and effective classification methods. Because the user is unfamiliar with them, it is difficult to understand some trustworthy constant control of probability densities at the time of quality assessment. KNN classification is used to calculate this type of calculation. The greatest value of K is chosen after processing..

Clustering by K-mean It's an unsupervised learning algorithm in which the data is not labeled and class names are unknown. The main goal is to compile a list of current information. The calculation distributes K gatherings in a recursive manner. These get-togethers are defined by their similarities. Each gathering is made up of K centroid. When a new worth is presented, the k-mean algorithm assigns it to one of the K

groupings based on its likenesses. Because centroid is so important to the gathering, the new factor is incorporated to a certain group using centroid. [7].

At the point when the new occurrence been given to expectation framework, it arranges the new occurrence and creates its class name [8].

  1. Perumal et al. [15] fostered a coronary illness forecast model utilizing the Cleveland dataset of 303 information occurrences through include normalization and element decrease utilizing PCA, where they recognized and used seven head parts to prepare the ML classifiers. They inferred that LR and SVM gave practically comparative precision values (86% and 84%, separately) contrasted with that of k- NN with 69.5%.

    1. Pavithra et al. [21] proposed another cross breed highlight determination method with the blend of arbitrary backwoods, AdaBoost, and straight connection (HRFLC) utilizing the UCI dataset of around 280 cases to foresee coronary illness. Eleven (11) highlights were chosen utilizing channel, covering, and implanted techniques; an improvement of 1.5- 2% was found for the precision of the cross breed model.

  2. Mohan et al. [22] fostered a powerful mixture hybrid random forest with a straight model to improve the exactness of coronary illness expectation utilizing the Cleveland dataset with 296 records and 13 highlights. But RF and LM techniques gave the best blunder rates.

Abbreviations and Acronyms CSV- Comma separated values CVD – Cardiovascular Disease


      1. Input Data Set

        The user enters the input

      2. Dta preprocessing

        itis a critical stage in the data mining process that include controlling or discarding data before it is utilized to ensure or redesign xecution.

        The pre-processing of data is an important step in the creation of an AI model. Information may not be perfect or in the required configuration for the model at first, which might lead to misleading findings. We change information into the necessary arrangement during pre-handling. It is used to manage the dataset's commotions, copies, and missing upsides. Bringing in datasets, splitting datasets, quality scaling, and other exercises are part of the information pre-handling process. Preprocessing of data is expected when working on the model's precision.

      3. Split dataset into train phase and test phase

        The train-test split is utilized to assess the exhibition of AI calculations that are appropriate for forecast based Algorithms/Applications. This technique is a quick and simple method to perform to such an extent that we can analyze our own AI model outcomes to machine results.

        Fig-2: Data preprocessing stages

      4. Train the model

        Using the Random Forest Algorithm, input is compared to the information contained in the current informative index.

        The Random Forest Algorithm is a powerful machine learning calculation that uses a controlled learning technique.

        " Random Forest is a classifier that uses a large number of choose trees on coloured subsets of a dataset and considers the normal to work toward visionary perfection."

      5. Validate

        Model approval alludes to the method involved with affirming that the model really accomplishes its planned reason. Generally speaking, this will include affirmation that the model is prescient under the states of its planned use.

      6. Output or result

      Dataset assortment is gathering information which contains patient subtleties. Credits choice interaction chooses the helpful traits for the forecast of coronary illness. In the wake of recognizing the accessible information assets, they are additionally chosen, cleaned, made into the ideal structure. Different arrangement strategies as expressed will be applied on preprocessed information to anticipate the precision of coronary illness. Precision measure looks at the exactness of various classifiers.

      Fig-3:Procedure of Random Forest Algorithm


      Our point is to Develop An Application of coronary illness prediction utilizing powerful algorithm in ML which is Random Forest Algorithm. A CSV file record is provided as information. After the effective finishing of activity the outcome is anticipated and shown

      Fig-1: Architecture/Methodology Diagram


      In today's world, anticipating a coronary ailment is a huge test. If the patient/client is unable to contact a medical specialist, he or she can use this programme to make a diagnosis by simply entering the report values. Furthermore, regardless of whether or not to consult a doctor, you can go on.


We would like to thank Dr. Shantharam Nayak for his valuable suggestion, expert advice and moral support in the process of preparing this paper.


[1] M Snehith Raja, M Anurag et al. Machine Learning Based Heart Disease Prediction System, (ICCCI 2021), Jan 27-29,2021, Coimbator, INDIA.

[2] Kaan Uyar et al., "Diagnosis of heart disease using genetic algorithm based trained recurrent fuzzy neural networks" in B.V ICTASC,


[3] Theresa Princy and R, J. Thomas, "Human Heart Disease Prediction System using Data Mining Techniques", © IEEE ICCPCT, 2016.

[4] Kaur h Beant and Williamjeet Singh, "Review on Heart Disease Prediction System using Data Mining Techniques", © IJRITCC, vol. 2, no. 10, pp. 3003-08, 2014.

[5] Kirmani, M.M., Ansarullah, S.I.: Prediction of heart disease using decision tree a data mining technique. IJCSN Int. J. Comput. Sci. Netw. 5(6), 885892 (2016)

[6] Tahira Mahboob, Rida Irfan and Bazelah Ghaffar et al.Evaluating ensemble prediction of coronary heart disease using receiver operating characteristics ©2017 IEEE

[7] Ammar Asjad Raja, Irfan-ul-Haq , Madiha Guftar Tamim Ahmed Khan Intelligence syncope Disease Prediction Framework using DM- techniques FTC 2016 Future Technologies Conference 2016.

[8] M.A. Jabbar, B.L.Deekshatulu, and Priti Chandra, Intelligent heart disease prediction system using random forest and evolutionary

[9] approach, Journal of Network and Innovative Computing, Vol. 4, pp.174-184, 2016.

[10] N. Friedman, D. Geiger, and M. Goldszmidt, Bayesian network classifiers, Machine Learning ,( 1997).

[11] Ayon Dey, Jyoti Singh, N. Singh Analysis of supervised chine learning algorithms for heart disease prediction

[12] Perumal, R. Early Prediction of Coronary Heart Disease from Cleveland Dataset using Machine Learning Techniques. Int. J. Adv. Sci.

Technol. 2020, 29, 42254234

[13] Gupta, A.; Kumar, R.; Arora, H.S.; Raman, B. MIFH: A Machine Intelligence Framework for Heart Disease Diagnosis. IEEE Access 2019, 8, 1465914674. [CrossRef] [14] Pavithra, V.; Jayalakshmi, V. Hybrid feature selection technique for prediction of cardiovascular diseases. Mater. Today Proc. 2021, 22,660- 670

[15] Mohan, S.; Thirumalai, C.; Srivastava, G. Effective Heart Disease Prediction Using Hybrid Machine Learning Techniques. IEEE Access 2019, 7, 8154281554. [CrossRef]