 Open Access
 Total Downloads : 4
 Authors : Poornima K.M, T Jayakumari
 Paper ID : IJERTCONV3IS27119
 Volume & Issue : NCRTS – 2015 (Volume 3 – Issue 27)
 Published (First Online): 30072018
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Neural Network based Technique for Parkinson’s Disease Classification using ANOVA as Feature Selection Model
Poornima K.M1
2nd yearM.Tech Student, Dept of Computer Science and Engineering, BTL Institute of Technology, Bangalore, India
Smt. T Jayakumari2
Assistant Professor, Dept of Computer Science and Engineering, BTL Institute of Technology, Bangalore, India
Abstract: Parkinsons disease (PD) caused due to the loss of dopamine in the brains thalamic region that effect in involuntary or oscillatory movement in the body. In this study, depicted the analysis performed based on two training test algorithms which are Levenberg Marquardt (LM) and Scaled Conjugate Gradient (SCG) of the MultiLayer Perceptrons (MLP) with Back Propagation learning algorithm, that are applied to classify for effective diagnosis of PD with Analysis of Variance (ANOVA) as a feature extraction.
Keywords – Parkinsons disease (PD), Multilayer Perceptrons (MLPs) Neural Network, Levenberg Marquardt (LM) , Scaled Conjugate Gradient (SCG).

INTRODUCTION
Parkinsons disease (PD) is the second most common neurodegenerative affliction after Alzheimers disease (AD)[1].PD is a progressive neurological disorder qualified by tremor, rigidity, and slowness of movements. It is associated with progressive neuronal loss in the substantia nigra and other brain structures [2]. Nonmotor features, such as dementia and dysautonomia, happen frequently, particularly in advanced stages of the disease. Diagnosis depending on the presence of two or more cardinal motor features such as rest tremor, bradykinesia, or rigidity[3].Having so many factors to examine to diagnose PD, specialist generally makes decisions by measuring the current test results of their patients. Furthermore, the previous outcomes made on other patients with a similar condition are also done by them.
These are complex procedures, especially when the number of factors that the specialist has to measure is high (high quantity and variety of these data). For these reasons, PD diagnosis needs experience and highly skilled specialists. Classification systems can help in increasing accuracy and relevance of diagnoses and minimizing potential errors, as well as making the diagnoses more time effectively [4].
This paper depicted experimental analysis of MPLs neural network based on two training test algorithms which are LevenbergMarquardt (LM) and Scaled Conjugate Gradient (SCG) in diagnosing PD. The experiment comprises of two parts .The first part of the experiment, the attribute acted as inputs to MLP without ANOVA as feature Extraction and the second part of the experiment, the significant of ANOVA as feature extraction is measured. The results attained from this paper can be used for other diseases such as breast cancer, heart problem and etc. Much cost can be reduced by applying this technique because the analysis process is fully based software Approach by utilizing real database of the disease.

RELATED WORK
There are several techniques and analysis that have been done to diagnose Parkinson's disease (PD), for example artificial neural network classifier for the diagnosis of Parkinson's disease using [99mTc] TROD AT1 and SPECT.Besides, there are another diseases have been detected using Artificial Neural Network. The brief descriptions on the detection disease using several methods are shown below:
Artificial Neural Network Classifier for the Diagnosis of Parkinson's disease using [99mtc] TRODAT1 and SPECT According to Acton PD and Newberg A in their research [5], imaging the dopaminergic neurotransmitter system with positron emission tomography (PET) or single photon emission tomography (SPECT) is a powerful tool for diagnosis of Parkinson's disease. It has been hypothesized that an artificial neural network (ANN), which can mimic the pattern recognition skills of human observers, may provide similar results. A set of patients with PD, and normal healthy control subjects, were studied using the dopamine transporter tracer[99mTc]. TRODAT1 and SPECT. The sample was comprised of 81 patients (mean age Â± SD: 63.4 Â± 10.4years;age;range:39.084.2years)and94healty cont
range: 39.084.2 years) and 94 healthy controls (mean age Â± SD: 61.8 Â± 11.0 years; age range: 40.983.3 years). The images were processed to extract the striatum and the striatal pixel values were used as inputs to a three layer ANN. However, by using this technique, it was difficult to interpret precisely what triggers in the images were being detected by the network.

DATA SET
The dataset was created by Max Little of the University of Oxford, in collaboration with the National Centre for Voice and Speech, Denver, Colorado, who recorded the speech signals. The original study published the feature extraction methods for general voice disorders [6]. The attributes are tabulated as in TABLE I.
Table I : Feature information for Parkinson dataset
The data set is comprises of a range of biomedical voice measurement with 195 samples with 16 attributes where 147 samples were diagnosed with PD. The main objective of study is to recognize healthy people from those with PD using MLP. According to "status column which is set to 0 for healthy and 1 for PD.

METHODOLOGY
This section will depict the methodology as described in Fig. 1 for classifying PD using MLPs neural network based on two training test algorithms which are LM algorithm and SCG Algorithm. ANOVA is applied as the feature Extraction for the PD data set.

Multilayer Perceptrons Neural Network
Multilayer is feedforward neural networks trained with the standard backpropagation algorithm. It is supervised networks so they need a desired response to be trained [7].
Fig. 1. Overall Block diagram
It learns how to transform input data in to a desirable response, so they are widely applied for pattern classification. With one or two hidden layers, they can estimated almost inputoutput map. It has been presented to estimate the performance of optimal statistical classifiers in difficult problems. The most popular static network in the multilayer [8]. The multilayer is trained with error correction learning, which is appropriate here because the desired multilayer response is the arteriographic result and as such known. Error correction learning works in the following way from the system response at neuron j at iteration t, () , and the desired response for given input pattern an instantaneous error is defined by
= () (1)
Using the theory of gradient descent learning, each weight in the network can be adapted by correcting the present value of the weight with a term that is proportional to the present input and error at the weight, i.e.
+ 1 = + () (2)
The () is the learningrate patameter.The is input neuron j at iteration t.The local error
can be computed as a weighted sum of errors at the internal neurons.

ANOVA (Analysis of Variance)
Analysis of Variance is a statistical test used to determine whether three or more data sets (means) are statistically significantly different. Selection of proper ANOVA depends on the data to be examined. First criterion involving selection of ANOVA test represents to distribution of the data sets. Because the data analyzed do not follow normal distribution, nonparametric version of the classical oneway ANOVA should be used. There are two suitable tests implemented in MATLAB that can be used KruskalWallis test and Friedmas test. Both the tests examine the ranks of the data rather than their original numeric values. Ranks are obtained by ordering the data from smallest to largest across all groups, and taking the numeric index of this ordering [9].
KruskalWallis test is a nonparametric test that comparing three or more unpaired groups of data. Moreover, MATLAB implementation of the test allows analysis of data where number of recordings is not equally distributed into individual groups to be examined. The KruskalWallis test measures the hypothesis that all samples come from populations that have the same median, against the alternative that the medians are not all the same. As a result, pvalue for the null hypothesis that all samples are drawn from the same population is obtained[10].
ANOVA can be used as feature extraction technique [11], this are the Steps involved in ANOVA is given below:
Step1: Calculation of Total Sum of Squares (SST):
Sum of squares withingroups examines error variation or variation of individual scores around each group mean. This is variation in the scores that is not due to the independent variable.
SSW = ( )2
The total sum of squares can be computed by adding the SSB and the SSW.
Step2: Calculation of Degree of Freedom (DF)
Degree of freedom (DF) represents the number of independent values in a calculation, minus the number of estimated parameter
DF within the group= ( )
DF between the group=( 1)
DF = = ( 1)
Where is number of sample, is number of groups.
Step3: Calculation of Mean Squares Total (MST)
Mean squares (MS) are estimates of variance across groups[12]. Mean squares are used in analysis of variance and are calculated as a sum of squares divided by its appropriate degrees of freedom. Let N equal the total number of samples in a survey, and K the number of groups, then the:
Mean Square Total (MST) is an estimate of total variance against the grand mean (mean of all samples).
There are three possible sums of squares betweengroup some of squares (SSB), withingroup
=
1
or error sum of squares (SSW), and total sum of squares (SST). Total sum of squares can be partitioned into between sum of squares and within
Square Between groups compare the means of
groups to the grand mean:
sum of squares, representing the variation due to the
=
Mean Square within (MSW) groups
1
independent variable and variation due to individual differences in the score respectively:
SST=SSB+SSW
Sum of squares betweengroups examines the differences among the group means by calculating the variation of each mean ( ) around the grand mean ( ).
SSB= ( )2
n is the number of observations in each group.
calculates the variance within each individual group:
=
Step4: Calculation of F statistic or also known as F ratio:
A value used in determining whether the difference between two variables is statistically significant or stable. A larger variance is divided by a smaller variance, both of which are the results of analysis of variance procedures. Mean Square Between (MSB) and Mean Square Within (MSW) are used to calculate the Fratio:
=
Step5: Calculation of Pvalue
To find the P value from an F distribution you must know the numerator (MSW) and denominator (MSB) degrees of freedom, along with the significance level. Pvalue has df1 and df2 degrees of freedom, where df1 is the numerator degrees of freedom equal to 1 and df2 is the denominator degrees of freedom equal to .
Step 6: Decision Rule
Reject the null hypothesis if: F (observed value)
> Pvalue .It means that there is a significant difference between the groups. Fail to reject null hypothesis means that there is no difference between the groups.


RESULTS AND DISCUSSION
As a proposed, ANOVA as feature Extraction is Implemented in MATLAB R2013b. Fig 2 showed the result of ANOVA table Analysis.
are highlighted in red. This lack of interaction indicates that both means are different than group 6 mean. Select other group means to confirm that all group means are significantly different from each other.
Click on the group you want to test
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
50 0 50 100 150 200 250
4 groups have means significantly different from Group 6
Fig. 4. Multicomparison analysis
LM and SCG were developed using the MATLAB Neural Network Toolbox. In this section the experimental results obtained using LM and SCG algorithms in terms of Average Training Accuracy, Average Testing Accuracy, Average Iterations and Average MSE is discussed. Prior to training, the dataset was rescaled between 1 and 1 before divided into 50:20:30 as ratio for training: validation: testing.
Average Training Accuracy (%) versus No of Hidden Units for LM and SCG algorithms before ANO 100
87.8866 88.6598
90
83.2474
91.7526 89.4330
Fig. 2. ANOVA Table analysis
After using ANOVA, the attribute reduced to 4 features instead of 16 features with statistic value, F of 1068.65 as Shown in Fig. 2 and Fig. 3.The selected features are MDVP:FO(HZ), MDVP:Fhi(HZ), HNR, MDVP:Flo(HZ).The highest value of F indicates that the ANOVA analysis is effective.
80
81.7010 80.6701
Avarage Training (%)
Avarage Training (%)
70
60
50
40
30
20
10
83.2474
80.6701 84.0206
LM SCG
0
600
5 10 15 20 25
No of Hidden Units
500
400
300
200
100
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Fig. 5. Average Training Accuracy (%) versus No of Hidden Units for LM and SCG algorithms before ANOVA
Fig. 5 and Fig. 6 showed the results for the Average training accuracy for LM and SCG for both analysis; with and without ANOVA. It can be seen that the average training accuracy using LM algorithm for both analysis are higher at almost all hidden unit as compared to SCG algorithm. As for LM algorithm, highest accuracy is obtained at hidden units of 20 & 10 for both Analyses with and without
Fig. 3. ANOVA Analysis
The Fig. 4 shows the multiplecomparison of the mean. The Group 6 mean is highlighted and the comparison interval is in blue. Because the comparison intervals for the other 4 groups do not intersect with the intervals for the group 6 mean, they
ANOVA as feature Extraction that is 91.75% and 90.20% whilst SCG algorithm achieved the accuracy rate 84.02% at 25 hidden units. However, via ANOVA, SCG increased the accuracy rate to 88.14% at hidden unit 10.
Average Training Accuracy (%) versus No of Hidden Units for LM and SCG algorithms using ANO 100
89.1753 90.2062 86.5979 88.9175 87.8866
90

CONCLUSION
80 87.6289 88.1443
Avarage Training (%)
Avarage Training (%)
70
60
50
40
30
20
10
0
85.5670 87.1134
88.1443
LM
SCG
As a conclusion, MLPs can be used to classify the Parkinsons disease. The experiment shows that feature selection helps to increase computational efficiency whilst improving classification accuracy Further, based on both training algorithm Measured, the classification of PD using LM algorithm of MLPs training achieved higher classification rate as compared to SCG algorithm. This is confirmed from the accuracy rate achieved as well as lower MSE obtained. Further, with ANOVA as feature selection
5 10 15 20 25
No of Hidden Units
Fig. 6. Average Training Accuracy (%) versus No of Hidden Units for LM and SCG algorithms after ANOVA
Average Testing Accuracy (%) versus No of Hidden Units for LM and SCG algorithmsbefore ANO 100
the PD dataset is reduced to 4 features instead of 16 features that showed more than 75% reduction in dataset with above 90% accuracy rate achieved for LM while SCG achieved above 80% accuracy.
90 81.3559
84.7458
86.0169
85.1695
85.1695
References
80
87.7119
Avarage Testing (%)
Avarage Testing (%)
70
60
50
40
30
20
10
0
84.3220
83.0508 84.3220
78.3890
LM SCG

Norlinah Mohamed Ibrahim, Misconceptions about Parkinsons Disease, Neurology Unit, Pusat Perubatan University Kebangsaan Malaysia, November 2009.

AJ Hughes, SE Daniel, L. Kilford, and AJ Lees. Accuracy of clinical diagnosis of idiopathic Parkinsons disease: a clinico pathological study of 100 cases. British Medical Journal, 55(3) 181184, 1992.

DeepBrain Stimulator and Control of Parkinson's diseaseproc. SPIE 5389,smart Structure And Metrials 2004;
5 10 15 20 25
No of Hidden Units
Fig. 7. Average Testing Accuracy (%) versus No of Hidden Units for LM and SCG algorithms before ANOVA
Average Testing Accuracy (%) versus No of Hidden Units for LM and SCG algorithms using ANOVA
100

Anchana Khemphila, Veera Boonjing, Parkinsons Disease Classification using Neural Network and Feature selection, World Academy of Science & Tech, 64, 2012.

J. Tebelskis, "Speech recognition using neural networks," PhD, Carnegie Mellon University, Pittsburgh, Pennysylvania,
90 86.8644
86.8644 83.8983 85.1695 86.0169
1995.
80 86.0169
Avarage Testing (%)
Avarage Testing (%)
70
60
50
40
81.3559 83.4746 82.6271
84.3220
LM

UCI Machine Learning Repository Center for Machine Learning and Intelligent System, http://archive.ics.uci.edu.

Data Mining Techniques For Marketing,Sales, and Customer Support. John Wiley & Sons,Inc.

Active media Innovation Sdn. Bhd, Applying Neural
30 SCG
20
10
0
5 10 15 20 25
No of Hidden Units
Fig. 8. Average Testing Accuracy (%) versus No of Hidden Units for LM and SCG algorithms after ANOVA
In Fig. 7 and Fig. 8 the average testing accuracy for both LM and SCG is showed. Using LM algorithm, achieved testing accuracy rate at the hidden units and are ranged between 78.38% to 84.74% while, for SCG are between 84.32% to 87.71% With ANOVA as feature selection, LM algorithm achieved testing accuracy rate at hidden unit with ranged from 81.35% to 86.86% while for SCG algorithm are ranged from 83.89% to 86.86% by using ANOVA. This confirmed that for both training and testing phase, LM performed better than SCG.
Network with MATLAB.

Lukas Zoubek, Introduction to Educational Data Mining Using MATLAB, Department of Information and Communication Technologies, Pedagogical Faculty, University of Ostrava.

Statistics Toolbox Users Guide. (September 2009),Available:http://www.mathworks.com/access/helpdesk/help/ pdf_doc/stats/stats

http://www.upa.pdx.edu/IOA/newsom/da1/ho_ANOVA

http://www.chegg.com/homeworkhelp/definitions/mean squares31
