**Open Access**-
**Total Downloads**: 4 -
**Authors :**Poornima K.M, T Jayakumari -
**Paper ID :**IJERTCONV3IS27119 -
**Volume & Issue :**NCRTS – 2015 (Volume 3 – Issue 27) -
**Published (First Online):**30-07-2018 -
**ISSN (Online) :**2278-0181 -
**Publisher Name :**IJERT -
**License:**This work is licensed under a Creative Commons Attribution 4.0 International License

#### Neural Network based Technique for Parkinson’s Disease Classification using ANOVA as Feature Selection Model

Poornima K.M1

2nd yearM.Tech Student, Dept of Computer Science and Engineering, BTL Institute of Technology, Bangalore, India

Smt. T Jayakumari2

Assistant Professor, Dept of Computer Science and Engineering, BTL Institute of Technology, Bangalore, India

Abstract: Parkinsons disease (PD) caused due to the loss of dopamine in the brains thalamic region that effect in involuntary or oscillatory movement in the body. In this study, depicted the analysis performed based on two training test algorithms which are Levenberg- Marquardt (LM) and Scaled Conjugate Gradient (SCG) of the Multi-Layer Perceptrons (MLP) with Back- Propagation learning algorithm, that are applied to classify for effective diagnosis of PD with Analysis of Variance (ANOVA) as a feature extraction.

Keywords – Parkinsons disease (PD), Multilayer Perceptrons (MLPs) Neural Network, Levenberg- Marquardt (LM) , Scaled Conjugate Gradient (SCG).

INTRODUCTION

Parkinsons disease (PD) is the second most common neurodegenerative affliction after Alzheimers disease (AD)[1].PD is a progressive neurological disorder qualified by tremor, rigidity, and slowness of movements. It is associated with progressive neuronal loss in the substantia nigra and other brain structures [2]. Non-motor features, such as dementia and dysautonomia, happen frequently, particularly in advanced stages of the disease. Diagnosis depending on the presence of two or more cardinal motor features such as rest tremor, bradykinesia, or rigidity[3].Having so many factors to examine to diagnose PD, specialist generally makes decisions by measuring the current test results of their patients. Furthermore, the previous outcomes made on other patients with a similar condition are also done by them.

These are complex procedures, especially when the number of factors that the specialist has to measure is high (high quantity and variety of these data). For these reasons, PD diagnosis needs experience and highly skilled specialists. Classification systems can help in increasing accuracy and relevance of diagnoses and minimizing potential errors, as well as making the diagnoses more time effectively [4].

This paper depicted experimental analysis of MPLs neural network based on two training test algorithms which are Levenberg-Marquardt (LM) and Scaled Conjugate Gradient (SCG) in diagnosing PD. The experiment comprises of two parts .The first part of the experiment, the attribute acted as inputs to MLP without ANOVA as feature Extraction and the second part of the experiment, the significant of ANOVA as feature extraction is measured. The results attained from this paper can be used for other diseases such as breast cancer, heart problem and etc. Much cost can be reduced by applying this technique because the analysis process is fully based software Approach by utilizing real database of the disease.

RELATED WORK

There are several techniques and analysis that have been done to diagnose Parkinson's disease (PD), for example artificial neural network classifier for the diagnosis of Parkinson's disease using [99mTc] TROD AT-1 and SPECT.Besides, there are another diseases have been detected using Artificial Neural Network. The brief descriptions on the detection disease using several methods are shown below:

Artificial Neural Network Classifier for the Diagnosis of Parkinson's disease using [99mtc] TRODAT-1 and SPECT According to Acton PD and Newberg A in their research [5], imaging the dopaminergic neurotransmitter system with positron emission tomography (PET) or single photon emission tomography (SPECT) is a powerful tool for diagnosis of Parkinson's disease. It has been hypothesized that an artificial neural network (ANN), which can mimic the pattern recognition skills of human observers, may provide similar results. A set of patients with PD, and normal healthy control subjects, were studied using the dopamine transporter tracer[99mTc]. TRODAT-1 and SPECT. The sample was comprised of 81 patients (mean age Â± SD: 63.4 Â± 10.4years;age;range:39.0-84.2years)and94healty cont

range: 39.0-84.2 years) and 94 healthy controls (mean age Â± SD: 61.8 Â± 11.0 years; age range: 40.9-83.3 years). The images were processed to extract the striatum and the striatal pixel values were used as inputs to a three layer ANN. However, by using this technique, it was difficult to interpret precisely what triggers in the images were being detected by the network.

DATA SET

The dataset was created by Max Little of the University of Oxford, in collaboration with the National Centre for Voice and Speech, Denver, Colorado, who recorded the speech signals. The original study published the feature extraction methods for general voice disorders [6]. The attributes are tabulated as in TABLE I.

Table I : Feature information for Parkinson dataset

The data set is comprises of a range of biomedical voice measurement with 195 samples with 16 attributes where 147 samples were diagnosed with PD. The main objective of study is to recognize healthy people from those with PD using MLP. According to "status column which is set to 0 for healthy and 1 for PD.

METHODOLOGY

This section will depict the methodology as described in Fig. 1 for classifying PD using MLPs neural network based on two training test algorithms which are LM algorithm and SCG Algorithm. ANOVA is applied as the feature Extraction for the PD data set.

Multilayer Perceptrons Neural Network

Multilayer is feed-forward neural networks trained with the standard back-propagation algorithm. It is supervised networks so they need a desired response to be trained [7].

Fig. 1. Overall Block diagram

It learns how to transform input data in to a desirable response, so they are widely applied for pattern classification. With one or two hidden layers, they can estimated almost input-output map. It has been presented to estimate the performance of optimal statistical classifiers in difficult problems. The most popular static network in the multilayer [8]. The multilayer is trained with error correction learning, which is appropriate here because the desired multilayer response is the arteriographic result and as such known. Error correction learning works in the following way from the system response at neuron j at iteration t, () , and the desired response for given input pattern an instantaneous error is defined by

= () (1)

Using the theory of gradient descent learning, each weight in the network can be adapted by correcting the present value of the weight with a term that is proportional to the present input and error at the weight, i.e.

+ 1 = + () (2)

The () is the learning-rate patameter.The is input neuron j at iteration t.The local error

can be computed as a weighted sum of errors at the internal neurons.

ANOVA (Analysis of Variance)

Analysis of Variance is a statistical test used to determine whether three or more data sets (means) are statistically significantly different. Selection of proper ANOVA depends on the data to be examined. First criterion involving selection of ANOVA test represents to distribution of the data sets. Because the data analyzed do not follow normal distribution, nonparametric version of the classical one-way ANOVA should be used. There are two suitable tests implemented in MATLAB that can be used KruskalWallis test and Friedmas test. Both the tests examine the ranks of the data rather than their original numeric values. Ranks are obtained by ordering the data from smallest to largest across all groups, and taking the numeric index of this ordering [9].

Kruskal-Wallis test is a nonparametric test that comparing three or more unpaired groups of data. Moreover, MATLAB implementation of the test allows analysis of data where number of recordings is not equally distributed into individual groups to be examined. The Kruskal-Wallis test measures the hypothesis that all samples come from populations that have the same median, against the alternative that the medians are not all the same. As a result, p-value for the null hypothesis that all samples are drawn from the same population is obtained[10].

ANOVA can be used as feature extraction technique [11], this are the Steps involved in ANOVA is given below:

Step1: Calculation of Total Sum of Squares (SST):

Sum of squares within-groups examines error variation or variation of individual scores around each group mean. This is variation in the scores that is not due to the independent variable.

SSW = ( )2

The total sum of squares can be computed by adding the SSB and the SSW.

Step2: Calculation of Degree of Freedom (DF)

Degree of freedom (DF) represents the number of independent values in a calculation, minus the number of estimated parameter

DF within the group= ( )

DF between the group=( 1)

DF = = ( 1)

Where is number of sample, is number of groups.

Step3: Calculation of Mean Squares Total (MST)

Mean squares (MS) are estimates of variance across groups[12]. Mean squares are used in analysis of variance and are calculated as a sum of squares divided by its appropriate degrees of freedom. Let N equal the total number of samples in a survey, and K the number of groups, then the:

Mean Square Total (MST) is an estimate of total variance against the grand mean (mean of all samples).

There are three possible sums of squares- between-group some of squares (SSB), within-group

=

1

or error sum of squares (SSW), and total sum of squares (SST). Total sum of squares can be partitioned into between sum of squares and within

Square Between groups compare the means of

groups to the grand mean:

sum of squares, representing the variation due to the

=

Mean Square within (MSW) groups

1

independent variable and variation due to individual differences in the score respectively:

SST=SSB+SSW

Sum of squares between-groups examines the differences among the group means by calculating the variation of each mean ( ) around the grand mean ( ).

SSB= ( )2

n is the number of observations in each group.

calculates the variance within each individual group:

=

Step4: Calculation of F statistic or also known as F ratio:

A value used in determining whether the difference between two variables is statistically significant or stable. A larger variance is divided by a smaller variance, both of which are the results of analysis of variance procedures. Mean Square Between (MSB) and Mean Square Within (MSW) are used to calculate the F-ratio:

=

Step5: Calculation of P-value

To find the P- value from an F distribution you must know the numerator (MSW) and denominator (MSB) degrees of freedom, along with the significance level. P-value has df1 and df2 degrees of freedom, where df1 is the numerator degrees of freedom equal to 1 and df2 is the denominator degrees of freedom equal to .

Step 6: Decision Rule

Reject the null hypothesis if: F (observed value)

> P-value .It means that there is a significant difference between the groups. Fail to reject null hypothesis means that there is no difference between the groups.

RESULTS AND DISCUSSION

As a proposed, ANOVA as feature Extraction is Implemented in MATLAB R2013b. Fig 2 showed the result of ANOVA table Analysis.

are highlighted in red. This lack of interaction indicates that both means are different than group 6 mean. Select other group means to confirm that all group means are significantly different from each other.

#### Click on the group you want to test

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

-50 0 50 100 150 200 250

4 groups have means significantly different from Group 6

Fig. 4. Multi-comparison analysis

LM and SCG were developed using the MATLAB Neural Network Toolbox. In this section the experimental results obtained using LM and SCG algorithms in terms of Average Training Accuracy, Average Testing Accuracy, Average Iterations and Average MSE is discussed. Prior to training, the dataset was rescaled between -1 and 1 before divided into 50:20:30 as ratio for training: validation: testing.

Average Training Accuracy (%) versus No of Hidden Units for LM and SCG algorithms before ANO 100

87.8866 88.6598

90

83.2474

91.7526 89.4330

Fig. 2. ANOVA Table analysis

After using ANOVA, the attribute reduced to 4 features instead of 16 features with statistic value, F of 1068.65 as Shown in Fig. 2 and Fig. 3.The selected features are MDVP:FO(HZ), MDVP:Fhi(HZ), HNR, MDVP:Flo(HZ).The highest value of F indicates that the ANOVA analysis is effective.

80

81.7010 80.6701

Avarage Training (%)

Avarage Training (%)

70

60

50

40

30

20

10

83.2474

80.6701 84.0206

LM SCG

0

600

5 10 15 20 25

No of Hidden Units

500

400

300

200

100

0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Fig. 5. Average Training Accuracy (%) versus No of Hidden Units for LM and SCG algorithms before ANOVA

Fig. 5 and Fig. 6 showed the results for the Average training accuracy for LM and SCG for both analysis; with and without ANOVA. It can be seen that the average training accuracy using LM algorithm for both analysis are higher at almost all hidden unit as compared to SCG algorithm. As for LM algorithm, highest accuracy is obtained at hidden units of 20 & 10 for both Analyses with and without

Fig. 3. ANOVA Analysis

The Fig. 4 shows the multiple-comparison of the mean. The Group 6 mean is highlighted and the comparison interval is in blue. Because the comparison intervals for the other 4 groups do not intersect with the intervals for the group 6 mean, they

ANOVA as feature Extraction that is 91.75% and 90.20% whilst SCG algorithm achieved the accuracy rate 84.02% at 25 hidden units. However, via ANOVA, SCG increased the accuracy rate to 88.14% at hidden unit 10.

Average Training Accuracy (%) versus No of Hidden Units for LM and SCG algorithms using ANO 100

89.1753 90.2062 86.5979 88.9175 87.8866

90

CONCLUSION

80 87.6289 88.1443

Avarage Training (%)

Avarage Training (%)

70

60

50

40

30

20

10

0

85.5670 87.1134

88.1443

LM

SCG

As a conclusion, MLPs can be used to classify the Parkinsons disease. The experiment shows that feature selection helps to increase computational efficiency whilst improving classification accuracy Further, based on both training algorithm Measured, the classification of PD using LM algorithm of MLPs training achieved higher classification rate as compared to SCG algorithm. This is confirmed from the accuracy rate achieved as well as lower MSE obtained. Further, with ANOVA as feature selection

5 10 15 20 25

No of Hidden Units

Fig. 6. Average Training Accuracy (%) versus No of Hidden Units for LM and SCG algorithms after ANOVA

Average Testing Accuracy (%) versus No of Hidden Units for LM and SCG algorithmsbefore ANO 100

the PD dataset is reduced to 4 features instead of 16 features that showed more than 75% reduction in dataset with above 90% accuracy rate achieved for LM while SCG achieved above 80% accuracy.

90 81.3559

84.7458

86.0169

85.1695

85.1695

References

80

87.7119

Avarage Testing (%)

Avarage Testing (%)

70

60

50

40

30

20

10

0

84.3220

83.0508 84.3220

78.3890

LM SCG

Norlinah Mohamed Ibrahim, Misconceptions about Parkinsons Disease, Neurology Unit, Pusat Perubatan University Kebangsaan Malaysia, November 2009.

AJ Hughes, SE Daniel, L. Kilford, and AJ Lees. Accuracy of clinical diagnosis of idiopathic Parkinsons disease: a clinico- pathological study of 100 cases. British Medical Journal, 55(3) 181184, 1992.

Deep-Brain Stimulator and Control of Parkinson's diseaseproc. SPIE 5389,smart Structure And Metrials 2004;

5 10 15 20 25

No of Hidden Units

Fig. 7. Average Testing Accuracy (%) versus No of Hidden Units for LM and SCG algorithms before ANOVA

Average Testing Accuracy (%) versus No of Hidden Units for LM and SCG algorithms using ANOVA

100

Anchana Khemphila, Veera Boonjing, Parkinsons Disease Classification using Neural Network and Feature selection, World Academy of Science & Tech, 64, 2012.

J. Tebelskis, "Speech recognition using neural networks," PhD, Carnegie Mellon University, Pittsburgh, Pennysylvania,

90 86.8644

86.8644 83.8983 85.1695 86.0169

1995.

80 86.0169

Avarage Testing (%)

Avarage Testing (%)

70

60

50

40

81.3559 83.4746 82.6271

84.3220

LM

UCI Machine Learning Repository- Center for Machine Learning and Intelligent System, http://archive.ics.uci.edu.

Data Mining Techniques For Marketing,Sales, and Customer Support. John Wiley & Sons,Inc.

Active media Innovation Sdn. Bhd, Applying Neural

30 SCG

20

10

0

5 10 15 20 25

No of Hidden Units

Fig. 8. Average Testing Accuracy (%) versus No of Hidden Units for LM and SCG algorithms after ANOVA

In Fig. 7 and Fig. 8 the average testing accuracy for both LM and SCG is showed. Using LM algorithm, achieved testing accuracy rate at the hidden units and are ranged between 78.38% to 84.74% while, for SCG are between 84.32% to 87.71% With ANOVA as feature selection, LM algorithm achieved testing accuracy rate at hidden unit with ranged from 81.35% to 86.86% while for SCG algorithm are ranged from 83.89% to 86.86% by using ANOVA. This confirmed that for both training and testing phase, LM performed better than SCG.

Network with MATLAB.

Lukas Zoubek, Introduction to Educational Data Mining Using MATLAB, Department of Information and Communication Technologies, Pedagogical Faculty, University of Ostrava.

Statistics Toolbox Users Guide. (September 2009),Available:http://www.mathworks.com/access/helpdesk/help/ pdf_doc/stats/stats

http://www.upa.pdx.edu/IOA/newsom/da1/ho_ANOVA

http://www.chegg.com/homeworkhelp/definitions/mean- squares-31