A Comparative Study of Legendre Neural Network and Chebyshev Functional Link Artificial Neural Network for Diabetes Data Classification

Data mining plays an important role in data classification technology. As diabetes is an ongoing research project in medical science, analyzing diabetes data has become increasingly important in the near future. Better and faster is a more efficient data analysis method to get results more accurate. Our proposed work is based on the classification of the most effective diabetes data in current medical science research .We worked on two advanced neural networks, namely the Legendre Neural Network (LeNN) and Chebyshev Functional Link based ANN (CHFLANN) and compared their performance in terms of accuracy and F-Measure for a diabetic sample collected from the UCI database. By performing simulations in the MATLAB environment and analyzing the results of the Legendre Neural Network-based architecture provides better performance compared to the Chebyshev-based method. Keywords— Diabetes data, LeNN, CHFLANN, FLANN, Artificial neural network (ANN)


I. INTRODUCTION
The current area of interest and research work in the health care sector is the prevention of various diabetes-related diseases. Numerous data mining methods have been proposed and performance analysis has been done for the identification of major causes of diabetes when considering several data sets. Data mining techniques are applied to existing diabetic records and for decades of analysis. Data mining can be referred to as simultaneously extracting logical data and analyzing and summarizing useful information that can be used to predict future data or experiments. Development of data mining techniques using various predictive algorithms predicts and calculates the error data of sugar data that can also be used in patient safety research. Our aim is to create different testing environment using strategies like Legendre neural network and Chebyshev functional link artificial neural network for predicting the class value for the data set. In [1] a method of data mining considering an analytical problem of health care system in the New Orleans Area with 30,386 diabetic patients was performed with respective results. In [2] simulated treatment data can predict errors of omission in clinical patient data and developed the potential for wide use in identifying decision strategies leading encounter-specific treatments errors in chronic disease care. In [3] various wireless channel equalization techniques for communication system has been analyzed and compared such as LeNN, MLP and FLANN considering a set of database and showed LeNN gives better performance. In [4] an investigation and forecasting based approach has been done for the price fluctuation by an improved LeNN algorithm. In the predictive modeling, the investor decides their investing positions by analyzing the historical data on the stock market. In [5] a Functional Link Artificial Neural Network (FLANN) based approach has been performed for task for classification and an extensive simulation study is performed to demonstrate the effectiveness of the classifier. In [6] an approach has been established for the comparison analysis on Decision Tree, Multi-Layer Perceptron (MLP) and Chebyshev functional link artificial neural network (CFLANN) in terms of their classification accuracy and elapsed time for credit card fraud detection. With reference to the above proposed works and considering a new problem statement for diabetes data analysis, we have decided to compare the results for LeNN with respect to CHFLANN.
In the Performance analysis we will evaluate the specificity, sensitivity, recall, precision accuracy and f-measure for the PIMA Indian Diabetes dataset. Meanwhile, we will calculate the mean square error curve for both the processes .The details for the proposed technique has been given below along with the appropriate mathematical equations.

II. DATA MINING TECHNIQUES USED IN THE
STUDY In recent days a number of applications of data mining techniques have been found in the diabetes data. Legendre Neural Networks (LeNN) and Chebyshev Functional Link Artificial Neural network (CHFLANN) are the two techniques commonly used in this field. The working principle of these techniques has been described below.

A. LEGENDRE NEURAL NETWORK
The Legendre Neural Network (LeNN) structure is similar to FLANN (Functional neural network link). In FLANN, trigonometric functions are used in expansion functions and LeNN uses Legendre orthogonal function. [8] The Legendre polynomial involves less computation compared to that of trigonometric functions. Therefore, LeNN offers faster training compared to FLANN. The Legendre polynomials are given by Ln(X), where n is the order whereas -1 < x < 1 will be the argument of the polynomial.

International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181 http://www.ijert.org The zero and the first order Legendre polynomial are, respectively given by L0(x) =1 and L1(x) = x (1) The higher order polynomials are given by The recursive formula to generate higher order Legendre polynomials is expressed as The error obtained by comparing the output with desired output is used to update the weights of the network structure by a weight updating algorithm. The Back Propagation algorithm, which is used to train the network, becomes very simple because of absence of any hidden layer its hidden layer.
In every iteration the gradient of the cost function w.r.t the weights is determined and the weights are incremented by a fraction of the negative gradient Where, Ek is the error rate dk is the desired value yk is output of output of the network ek is the error at kth step w j, k+1 = w j, k + α ek (1y k) 2 L (X) (6) Where, w j, k+1 is the updated coefficient w j, k is the old co efficient Alpha is the learning rate L(X) is the expanded unit The higher order polynomials are The recursive formula for generating higher order Chebyshev polynomial is given as After getting final expansion for LeNN, the weighted sum of the components of the enhanced input pattern is obtained using the below formula Weighted Sum=∑wjCHXj The error obtained by comparing the output with desired output is used to update the weights of the network structure by a weight updating algorithm. The Back Propagation algorithm, which is used to train the network, becomes very simple because of absence of any hidden layer its hidden layer.
In every iteration the gradient of the cost function w.r.t the weights is determined and the weights are incremented by a fraction of the negative gradient Where, Ek is the error rate dk is the desired value yk is output of output of the network ek is the error at kth step w j, k+1 = w j, k + α ek (1y k) 2 CH(X) Where, w j, k+1 is the updated coefficient w j,k is the old co efficient Alpha is the learning rate CH(X) is the expanded unit

A. Input Dataset
Diabetes data classification plays a major role in disease classification and analysis. Forwarding patient information. In this paper the data sets were based on the PIMA Indian Diabetes database of the UCI repository. The data set description is given in Table I.

B. Performance Metrics
For analysing the results and performance metrics of LeNN and CHFLANN the f-measure and accuracy has been considered for training and testing of input dataset. With the specification of the diabetic data classification precision is the fraction of the relevant data instances and yet the recall is the relevant instances obtained. In both mathematical details we have calculated the f-ratio which is the harmonic mean of the precision and recall and accuracy which is the arithmetic mean of precision and recall. The

A. Description
In our proposed work to analyze the performance of LeNN and CFLANN, the PIMA Indian Diabetes database is divided into training and testing set of 512 and 256 attribute data respectively. Simulation was performed for the data by transferring each set of training and tests to LeNN and CFLANN 10 times taking into account the 3rd order polynomial equation with different learning rates (alpha) values and threshold values. After that the best accuracy is evaluated for a particular threshold value and learning rate, alpha. Finally the performance achieved with the best network size and the threshold value of both networks have been compared.

B. Experiment with LeNN
In the LeNN based diabetes data classification for the data mining operations, a better accuracy has been observed for the 0.5 threshold value and 0.04 learning rate. The detailed description has been given below in the table.
The Accuracy value for the training dataset and testing dataset found to be 0.8086 and 0.7695 respectively showing satisfied result.    While analyzing the effect of both LeNN and CHFLANN processes, the 2nd order polynomial equation shows a significant difference in the results comparison compared to the 3rd order. So LeNN shows better performance in 2nd order calculations. However the f-measure gives better results for the classification of the 3rd order data as compared to the 2nd order data in the LeNN based method.