Ensembling of SVM and Decision Tree for Prediction of Heart Disease

DOI : 10.17577/IJERTV10IS100119

Download Full-Text PDF Cite this Publication

Text Only Version

Ensembling of SVM and Decision Tree for Prediction of Heart Disease

R. Chandini

Department of Computer Science and Engineering

G. Narayanamma Institute of Technology & Science Hyderabad, Telangana, India

Dr. K. Venugopal Rao

Department of Computer Science and Engineering

  1. Narayanamma Institute of Technology & Science, Hyderabad, Telangana, India

    AbstractCardiovascular sickness is a major reason of mortality in the present life style. Heart disease is accountable for deaths in all age groups and is common among males and females. A good solution to this problem is to predict, patients health status will be forwarded to the doctors. So, doctors can start treatment much sooner which will yield better results. For this reason, machine learning techniques are used. Now a days, Machine learning plays an important role in the medical domain, which includes development of new procedures for medical, handling of patient data and records and treatment of diseases. Machine Learning holds incredible potential in diagnosis of various diseases. The research focuses on heart disease diagnosis by using Machine Learning algorithms and ensembling of SVM and Decision Tree.

    KeywordsMachine learning algorithms, Decision tree, Support Vector Machine (SVM), Heart disease, Ensemble Learning Model.

    1. INTRODUCTION

      The healthcare sector is an important industry which offers a value-based care to the millions of people and it has been early adopter and greatly benefited from the technological advances. Playing an important role in patients care, billing, medical records, and todays technology allowing healthcare specialists to develop alternate staffing models, IP capitalization, providing smart healthcare, and reducing administrative and supply costs. Now a days, Machine Learning plays an important role in the healthcare industry, which includes the development of new procedures for medical, handling of patient data and records and treatment of diseases. In healthcare sector machine learning helps to analyze the different data points and outcomes, provide risk scores, precise resource allocation, and has different applications.

      Machine learning algorithms can detect patterns associated with disease and health conditions by studying thousands of different healthcare records and other patient data. Machine Learning in healthcare sector has its ability to process large number of datasets beyond the scope of human capability and reliably convert analysis of data into clinical insights that aid physicians in planning and providing care, ultimately leading to better outcomes, improve care at lower costs and increase patient satisfaction. One of the serious issues in medical care is the risingnumber of heart disease patients. Heart disease is one of the most prevalent diseases that can lead to death, disability and other economic crises in patients who suffer from heart disease. Each year 17.5 million people are dying due to CAD (Coronary Artery Disease) in this world, according to World Health Organization (WHO) reports.

      Popular known common type of heart disease areheart failure, hypertensive heart disease, coronary artery disease, heart murmurs, congenital heart disease, pulmonary stenosis, cardiomyopathy and rheumatic heart diseases are some of the heart disease types that can be caused by many factors. Due to digital technologies are rapidly growing, healthcare centers store huge amount of data in their database that is very complex and challenging to analyses. So that using data mining techniques and Machine learning algorithms play vital roles in analyses of different data in healthcare (hospitals and other medical) centers. Prediction is a good methodology in healthcare centers where clinicians do not have more knowledge and skill as well as where there are no specialists, for instance, such clinicians may give their own decision that may give poor result and lead the patients to death. Prediction of heart disease is used for automatic diagnosis of the disease and give sufficient qualities of services inhealthcare centers to save the life of individuals. Prediction technique helps to make an accurate decision for the stakeholders,particularly for specialists to give reasonable decision to treat patients.

      The healthcare sector has been an early adopter of and benefited greatly from technological advances. Now a days machine learning plays an important role in health-related realms, including patient data and records and the treatment of the disease. The rate of heart disease, liver disease and kidney disease is increasing at an exponential rate. In this busy lifestyle of people, having all fast food in lunch break and getting back to working and sitting has pushed an over the edge lack exercise andless active and these factors boosted the rate of the heart disease, liver disease and kidney disease to an unfortunately high percentage.

    2. LITERATURE SURVEY

      In this section, we are going to discuss basically the previous works on heart disease.

      A. Work related to heart disease:

      Dun et al. [1] tried various machine learning and deep learning techniques for detecting the heart disease and also performed hyper parameters tuning for increasing the results accuracy. Neural networks achieved high accuracy of 78.3 percent, and the other models were logistic regression, SVM, and ensemble techniques like Random Forest, etc. For reducing the cardiovascular features. Singh et al. [2] used generalized discriminant analysis for extracting nonlinear features; a binary classifier like an extreme learning machine for less overfitting and increasing the training speed and the

      ranking method used for all these was Fisher. The accuracy achieved was 100 percent for detecting coronary heart disease. Arrhythmias classification was done by Yaghouby et al. [3] for heart rate variability. Asl et al.[4] used Gaussian discriminant analysis for reducing the HRV signal features to 15 and 100 percent precision is achieved using the SVM classifier. Rajagopal and Ranganathan [5] used five different dimensionality reduction techniques which are unsupervised (linear and nonlinear), and neural network is used as a classifier for classifying cardiac arrhythmia. Zhang et al. [6] used the AdaBoost algorithm which is based on PCA for detecting breast cancer. Negi et al. [7] combined uncorrelated discriminant analysis with PCA so that the best features that are used for controlling the upper limb motions can be selected and the results were great. Avendaño-Valencia et al. [8] tried to reduce heart sounds to increase performance by applying PCA techniques on time-frequency representations. Kamencay et al.[9] tried a new method for different medical images reaching an accuracy of 83.6 percent when trained on 200 images by using PCA-KNN which is a scale-invariant feature used in medical images for the scaling purpose. Ratnasari et al. [10] used a gray- level threshold of 150 based on PCA and ROI, all of these used for reducing features of the X-ray images. Heart disease is very fatal and it should not be taken lightly. Heart disease happens more in males than females, which can be read further from Harvard Health Publishing [11].

    3. PROPOSED SYSTEM

      This system aims at presenting the end users a personalized health-care system with assit of Data Mining Techniques. Overview of five main steps that research framework constitutes:

      • Data Collection: Data Collection process involves the need for selecting appropriate data for analysis and obtaining effective knowledge by performing diverse data mining techniques.

      • p>Pre-processing: The pre-processing is avoiding missing values either by replacement or remove missing value from thedataset.

      • Feature extraction: It is the process of finding input features for a predictive model which involves removing irrelevant features that dont contribute towards the model.

      • Classification: Classification is performed using various machine learning algorithms which includes SVM and Decision tree algorithms.

      • Prediction of disease: From given data predicting whether the user is having disease or not.

    4. METHODOLOGY Methodology for this system is as follows:

      • The dataset is collected from the online resources UCI repositoryand Kaggle.

      TABLE I. DATASET INFORMATION

      Attribute

      Description

      Value

      Age

      Age in years

      Continuous

      Gender

      Male or Female

      1=Male 0=Female

      Cp

      Chest Pain Type

      1=typical angina, 2=atypical angina, 3=non-anginal pain, 4=asymptotic

      Trestbps

      Resting Blood Pressure

      (in mmHg)

      Continuous

      Chole

      Serum Cholesterol (in mg/dl)

      Continuous

      FBS

      Fasting Blood Sugar

      1>= 120 mg/dl

      0<= 120 mg/dl

      Restcg

      Resting electrocardiographic results

      0=normal 1=having ST-T wave abnormality2=left ventricular hyperthrophy

      Thalach

      Maximum heart rate achieved

      Continuous

      Exang

      Exercise induced angina

      1=yes, 0=no

      Old peak

      ST depression induced by exercise relative to

      rest

      Continuous

      Slope

      Peak exercise ST segment

      1=upsloping, 2=flat, 3=downsloping

      Ca

      Number of major vessels colored by flourosopy

      (0-3)

      Thal

      Thallium scan

      3=normal, 6=fixed defect, 7=reversible defect

      • Then Pre-Processing is done to avoid the missing values from thedataset.

      • Data reduction method repeats till high performance accuracy isoccurred.

      • Model development through the usage of machine learningalgorithms.

      • Models are evaluated by way of the use of overall performance evaluation.

      • After evaluating select the best accuracy performance.

      • Analyses and predicts heart disease dataset by using the selected model.

      Fig.1. Methodology

      MACHINE LEARNING MODEL BUILDING

      The main purpose of this study is to build the machine learning algorithms and find the best machine learning algorithm

      1. Decision Tree:

        Decision tree learning is one of the most widely used techniques for classification and it is supervised machine learning algorithm. Its classification accuracy is competitive with other methods, and it is very efficient. It hs flow like tree structure and it works on the rules and conditions. It mainly has the attributes, branches andterinal node.

        Fig.2. Decision Tree Classification

        Fig.2. Decision Tree Classification

      2. SVM:

        Support Vector Machine (SVM) algorithm are a class of extremely popular classification models and Supervised Machine Learning algorithm that can be used for building both regression and classification models. SVM algorithm can perform really well with both linearly separable and non- linearly separable datasets. The working of SVM is given below:

        Fig.3. Support Vector Machine

      3. Ensemble Learning:

        Ensemble learning algorithms is a machine learning technique it combines the several base models to produce an optimal predictive model. Ensemble is supervised machine learning algorithms. This helps to improve the machine learning algorithms results by combining the several models.

        Fig.4. Ensemble Learning

        An Ensemble learning is a set of classifiers, it is constructed with a given algorithms and for each new examples are classified by combining the prediction of every classifier from the ensemble. These predictions can be combined by taking the average (for regression tasks) or the majority vote (for classification tasks), as described by Breiman [12], or by taking more complex combinations [13][14][15]. Ensemble methods are able to improve the predictive performance of many base classifiers. In this paper, we consider ensemble learning technique, and apply them to SVM and Decision Tree algorithm.

        • Voting Classifier:

      Voting ensemble or a majority voting ensemble is an ensemble machine learning model that it combines the predictions from multiple models. This technique is used to improve performance model and achieving better performance than any single model used in the ensemble. A voting ensemble works by combining the predictions from multiple models.

      Fig.5. Majority Voting Classifier

    5. RESULTS

      The following Table II. shows Prediction for heart disease.

      Algorithm

      SVM

      Decision Tree

      Ensemble Learning

      Heart disease

      82.3

      81

      84.1

      Algorithm

      SVM

      Decision Tree

      Ensemble Learning

      Heart disease

      82.3

      81

      84.1

      TABLE II. ACCURACY FOR PREDICTIVE MODEL

    6. CONCLUSION AND FUTURE WORK

This project proposes a system to identify the best machine learning model in the prediction of disease. In this paper we have studied the different machine learning algorithms. We have analysed different attributes related to different patients and predicted the accuracy for different machine learning algorithms and the focus is to establish a healthcare prediction using machine learning algorithms that takes into the consideration of Support Vector Machine and Decision tree. Support Vector Machine and Decision Tree are used to predict heart disease. And Ensembling of SVM and Decision tree are done.From experimental results, this work concludes Ensemble Model is considered as a best algorithm because of its high accuracy.This system predicts the heart disease if unknown sample is given as an input by the user and the doctors can attend to more patients and this device can reduce the workload of medical personnel.

In the future, application developers should work together with health care professionals and researchers to deliver disease apps which improve healthcare system.

REFERENCES

  1. B. Dun, E. Wang, and S. Majumder, Heart disease diagnosis on medical data using ensemble learning, 2016.

  2. R. S. Singh, B. S. Saini, and R. K. Sunkaria, Detection of coronary artery disease by reduced features and extreme learning machine, Medicine and Pharmacy Reports, vol. 91, no. 2, pp. 166175, 2018.

  3. F. Yaghouby, F. Yaghouby, A. Ayatollahi, and R. Soleimani, Classification of cardiac abnormalities using reduced features of heart rate variability signal, World Applied Science Journal, Vol.6, no. 11, pp. 1547-1554, 2009.

  4. B. M. Asl, S. K. Setarehdan, and M. Mohebbi, Support vector machine-based arrhythmia classification using reduced features of heart rate variability signal, Artificial Intelligence in Medicine, vol. 44, no.1, pp. 5164, 2008.

  5. R. Rajagopal and V. Ranganathan, Evaluation of effect of unsupervised dimensionality reduction techniques on automated arrhythmiaclassification, Biomedical Signal Processing and Control, vol.34, pp.1-8, 2017.

  6. D. Zhang, L. Zou, X. Zhou, and F. He, Integrating feature selection and feature extraction methods with deep learning to predict clinical outcome of breast cancer, IEEE Access, vol. 6, pp. 2893628944, 2018.

  7. S. Negi, Y. Kumar and V.M. Mishra, Feature extraction and classification for EMG signals using linear discriminant analysis, in Proceedings of the 2016 2nd International Conference on Advances in Computing, Communication, & Automation (ICACCA) (Fall), IEEE, Bareilly, India, September 2016.

  8. D. Avendano-Valencia, F. Martinez_Tabares, D.Acosta-Medina, I. Godino-Llorente, and G. Castellanos-Dominguez, TFR-based feature extraction using PCA approaches for discrimination of heart murmurs, in Proceedings of the 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 56655668, IEEE, Minneapolis, MN, USA, September 2009.

  9. P. Kamencay, R.Hudec, M. Benco and M. Zachariasova, Feature extraction for object recognition using PCA-KNN with application to medical image analysis, in Proceedings of the 2013 36th International Conference on Telecommunications and Signal Processing (TSP), pp. 830834, IEEE, Rome, Italy, July 2013.

  10. N. R. Ratnasari, A. Susanto, I. Soesanti, and Maesadji, Thoracic X- ray features extraction using thresholding-based ROI template and PCA-based features selection for lung TB classification purposes, in Proceedings of the 2013 3rd International Conference on Instrumentation, Communications, Information Technology and Biomedical Engineering (ICICI-BME), pp.65-69, IEEE, Bandung, Indonesia, November 2013.

  11. Harvard Medical School, Throughout life, heart attacks are twice as common in men than women, 2020, https://www.health.harvard.edu/heart-health/throughout-life-heart- attacks-are-twice-as-common-in-men-than-women.

  12. Breiman, L.: Bagging predictors. Machine Learning, 1996.

  13. Ho, T., Hull, J., Srihari, S, Decision combination in multiple classifier systems. IEEE Trans. on Pattern Anal. And Mach.Intell.16, 1994.

  14. Kittler, J., Hatef, M., Duin, R., Matas, J.: On combining classifiers. IEEE Trans. on Pattern Anal. and Mach. Intell, 1998.

  15. Gorgevik, D., Cakmakov D. Combining SVM Classifiers for Handwritten Digit Recognition. Proceedings of 16th Int. Conference on Pattern Recognition, ICPR2002, Vol.3, SII.8p, IEEE Computer Society, Quebec City, Canada, 2002.

Leave a Reply