Using Multivariate Linear Regression to Estimate the Probability of Having a Heart Attack

Neel Adwani

doi:10.17577/IJERTV8IS110095

Volume 08, Issue 11 (November 2019)

Using Multivariate Linear Regression to Estimate the Probability of Having a Heart Attack

DOI : 10.17577/IJERTV8IS110095

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 224
Authors : Neel Adwani
Paper ID : IJERTV8IS110095
Volume & Issue : Volume 08, Issue 11 (November 2019)
Published (First Online): 14-11-2019
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Using Multivariate Linear Regression to Estimate the Probability of Having a Heart Attack

Parameters used: Age, Cholesterol Levels

Neel Adwani

First Year,

BTech Computer Science with Specialization in Artificial Intelligence and Machine Learning University of Petroleum and Energy Studies

Dehradun, India

AbstractHeart attack due to high cholesterol level is a new growing problem in the Health industry. For problems like this, Machine Learning can be of great use, when it is put into action. To estimate the probability of having a heart attack, I have written a multivariate linear regression algorithm, which is a part of Machine Learning.

KeywordsCholesterol; Age; Machine Learning; Linear Regression

INTRODUCTION

Linear Regression in an approach of plotting data on a graph and drawing a straight line that is the best fit for the data. Using that trend, the next value can be predicted easily with the help of slope (theta). In Univariate Linear Regression, one input (x) is fed into the program and it is trained on the basis of y. Then the value of x can be entered to predict the value of y at that point.

In statistics, linear regression is known to be a linear approach to model the connection between a scalar response (or dependent variable) and one or additional informative variables (or independent variables). The case of 1 informative variable is termed univariate linear regression. For quite one informative variable, the method is termed multiple rectilinear regression. This term is distinct from variable linear regression, wherever multiple related to dependent variables are foreseen, instead of one scalar variable.

Multivariate Linear Regression is a technique in which multiple inputs are given, denoted by X(x1, x2, x3,, xn) and the value of y is fed to train the model. Using the training dataset, a graph is plotted and the value of y can be further predicted by multiplying X with theta.
SOFTWARES USED
1. GNU Octave
  
  It is an open source software that is compatible with MATLAB commands and is open source, featuring a high level open source programming language named Octave. The Octave language is an interpreted programing language. it's a structured programing language (similar to C) and supports several common C commonplace library functions, and additionally bound UNIX system calls and functions. However, it doesn't support passing arguments by reference. Octave programs accommodates a listing of perform calls or a script. The syntax is matrix-based and provides numerous
  
  functions for matrix operations. It supports numerous information structures and permits object-oriented programming. Its syntax is extremely almost like MATLAB, and careful programming of a script can permit it to run on each Octave and MATLAB. As a result of Octave is formed out there beneath the wildebeest General Public License, it can be freely changed or modified. The program runs on Microsoft Windows and in most operating systems and Unix- like operating systems, together with macOS.
2. KAGGLE
  
  KAGGLE is a website that is a home to a numerous amount of datasets freely available for research purposes. It is an internet community of information scientists and machine learners, closely-held by Google. Kaggle permits users to seek out and publish knowledge sets, explore and build models in an exceedingly web-based data-science setting, work with different knowledge scientists and machine learning engineers, and enter competitions to unravel knowledge science challenges. Kaggle got its begin by providing machine learning competitions and currently additionally offers a public knowledge platform, a cloud-based work table for knowledge science, and short kind AI education.
ALGORITHM
Figure 4: Convergence Plot
1. Input the age and the Cholesterol level.
  
  Figure 2: Graph 1
  
  9.
  
  Figure 5: Probability of having a heart attack at age 18 with cholesterol level 56
2. Predict the Probability by multiplying the transpose of theta with the transpose of x.
CODE data = load('heartdata.txt')

X = data(:,1:2);

y = data(:, 3);

a = data(:, 1);

b = data(:, 2); m = length(y); figure(1); plot(a, y, 'bo'); figure(2); plot(b, y, 'ro');

[X mu sigma] = featureNormalize(X); X = [ones(m,1) X];

alpha = 0.001;

num_iters = 4000; theta = zeros(3, 1);

[theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters);

figure(3);

plot(1:numel(J_history), J_history, 'xy', 'LineWidth', 2); age = input("Enter your Age: ")

ch_level = input("Enter your Cholesterol Level: ") x = [1 age ch_level]'

Chances_of_Heart_Attack = (theta' * x) / 100

FUTURE SCOPE

This algorithm can be used for various purposes in the future, after a lot of improvement. This model is a bit

inaccurate because of the lack of data, but once the correct data set is fed into it, itll be able to find the probability more accurately. Also, more parameters like heart rate need to be considered to increase the accuracy of this model.

REFERENCES

Asmaa Shaker Ashoor , Ali Abdul Karim Kadim Naji, 2019, Statistical Analysis of the Fish Death in Babylon Province by using an Interactive Network of Simple and Multiple Linear Regression/Iraq, INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume 08, Issue 05 (May 2019).
Girraj Singh, D. S. Chauhan, Aseem Chandel, Deepak Parashar, Girijapati Sharma, 2014, Factor Affecting Elements and Short term Load forecasting Based on Multiple Linear Regression Method, INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume 03, Issue 12 (December 2014),
Dr. Jihad Alfarajat, Dr. Mohammad Alalaya, 2017, Factors Affecting Heart Diseases through Logistic Linear and Nonliner Regression, INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume 06, Issue 07 (July 2017),

Using Multivariate Linear Regression to Estimate the Probability of Having a Heart Attack

Leave a Reply