Weight Optimization through Differential Evolution Algorithm in Neural Network based Ensemble Approach

In recent years, with the advent of computational intelligence, machine learning has remained an extremely popular area for research. Till now, in this field, many machine learning models are available for prediction. So, rather than implementing new machine learning method, we should go for ensembles of these models for getting better results. The ensemble learning to represent the most innovative strategy in the field of machine learning. It provides better predictions than ordinary machine learning approaches and provides better accuracy. Generally, in machine learning, only one learner is used for the training and prediction, whereas ensemble learning methods to build a number of learners and then aggregate their results to obtain a more precise prediction. In this work, neural network ensemble is used and the weights for predictions are optimized using Differential Evolution (DE) Algorithm to obtain a more precise final prediction. Comparison between NNensemble and other ensemble approaches is performed on various evaluation measures and NN-ensemble outperforms with accuracy 91.82%. Keywords— Machine learning, Neural Network, Differential Evolution (DE) Algorithm, Ensemble.

INTRODUCTION Machine learning represents an application of Artificial Intelligence (AI) that provides frameworks the capacity to consequently take on and enhance as a matter of fact without being expressly customized. An ensemble is a group of items viewed as a whole rather than individually. Ensemble Learning utilizes a technique that applied to provide training to multiple learners and aggregate their outcomes, considering them as a ``committee of decision-makers". The key idea behind it is the decisions of the committee, with single predictions aggregated suitably, ought to have more acceptable overall accuracy, on average, as in contrast to any committee member. The ensemble members might be predicting real-values, labels for class, posterior probabilities, clustering, rankings, or any quantity. So that their predictions can be integrated with distinct techniques, including voting, averaging, and probabilistic methods. Predictions from machine learning algorithms play a vital role in modern human's life. As a user of technology, we complete certain tasks that desired to make a decision or classify something like in the medical field, research and development, also in our day to day life and in any data analysis task. There are several machine learning algorithms like the random forest, K-nearest neighbor, decision tree, support vector machine, etc. Pantola et al. [1] presented an ensemble technique with the weighted average strategy to combine machine learning models to obtain a more precise prediction. In weighted average strategy, weights are assigned manually to achieve precise results from ensemble model. Hence, it is a limitation of traditional ensemble approaches. Manual assignment of weights increases the probability of not getting the increased accuracy of ensemble model.
Instead of assigning weights manually there may be some method or algorithm to find these results. In the literature, there are many methods for finding weights for ensemble model like iDHS-EL [2], for finding weights for fusion used for grid search and iDNA-KACC-EL also used for the same purpose [3]. Liu, Zang et al proposed a method that measures the distance of incorrect classifying data values and for clustering, it used affinity propagation algorithm [4]. Many more methods with fuzzy logic are used in literature to aggregate classifiers like multiple kernel learning for better prediction [5,6,7]. Samples having high fuzziness get higher incorrect classification cases. To get rid-off divide and conquer method is proposed [8].
For getting better performance through machine learning models there ensembles approaches should be refined. In Bioinformatics for protein structure prediction, Rana et al. showed that random forest gives a more precise prediction for classification and regression data [9,10]. Then neural network ensemble [11] approach is proposed by Rayal and Rana on regression dataset and more promising results are gained. They compare classical ensemble approaches with neural network based ensemble approach and found that N-semble [12] outperform. But there was not any scheme to optimize weights to get better predictions. So, in our proposed work we are also working on neural network based ensemble approach for protein structure prediction using regression dataset. Over here, weight optimization is done through Nature Inspired Algorithm (NIA) which is Differential Evolution (DE) Algorithm.
The paper would be structured as: in Material and Models, the description of Dataset and Models is provided. In Methodology and Implementation, Differential Evolution Algorithm and proposed work are discussed. Then Experimental Results and K-fold Cross-Validation for proposed work is explained and complete work is enclosed in conclusion.

A. Data with its Description
Proteins are the core component of our life. Proteins provide the foundation of structures such as skin, hair, and tendon and they are the primary cause for catalyzing and synchronizing bio-chemical reactions, transporting molecules [13]. Regression data [14] is used with the proposed approach, for protein structure prediction. The physicochemical properties of a protein structure are used to determine the quality of the protein structure. Therefore, these properties are utilized to identify native or native-like structure from various predicted structures. In this work, the machine learning regression models are experimented with six physicochemical properties to predict the root mean square deviation (RMSD) of the protein structure. This prediction works in the absence of its true native state and each protein structure ranges from 0Å to 5 Å RMSD space. Physicochemical properties explored in the paper are as described with each attribute in TABLE I. RMSD value is predicted and equated with known protein structure [9]. In the TABLE II, a sample of data of five instances is shown from the specified dataset. For Regression purpose, the value of RMSD is considered as continuous.

B. Machine Learning Models
This section explores the entire machine learning model description with configuration and specifications and these are depicted in TABLE III.

A. Differential Evolution(DE) Algorithm
Differential Evolution (DE) algorithm is an optimization technique with stochastic nature. It is an Evolutionary Algorithm (EA) with faster and simpler evaluation process. It was invented by Storn and Price [17]. The steps involve in DE is shown as a flowchart in Fig 1. The procedural illustration is provided in Algorithm 1. In DE, a solution is finding by potential solutions (individuals). The search space is a D-dimensional space defined as i = 1, 2, ..., SN. Where, SN is the population (number of individuals). The algorithm employs the operators, namely Mutation, Crossover, and Selection. Initially, to find a trial vector mutation operator is utilized for each individual of the present population. Then, using crossover operator, parent vector and trail vector are generated and after this generated offspring is compared with the parent. In selection operator, two functions are there: first, mutation operation with a selected individual is used to generate a trial vector. Second, for the next generation select the best fitness values between parent and Offspring. At the end return individual with the best fitness as solution [18]. In our work, we are using DE for finding optimized weights so that we can achieve improved results from the complete ensemble model. For this reason, we have considered our objective function as follows: = ∑ * …….. (1) Where, Yi represents the Actual RMSD and Xi represents the Predicted value of the top three models. Wi is the optimized weights for each model's prediction in top three models.

B. Methodology
The proposed approach is shown as a step-down flow diagram in Fig 2. It is organized in four parts: (1) data splitting, (2) train and test base models, selection, (3) weight Learning through DE algorithm,(4) train and test neural network. Dataset is loaded and 75% of the dataset is the training data (Set-1) and rest of the dataset (Set-2) is used to Fig 2: Steps for Proposed Approach test the NN-ensemble model. So, the training data is expressed as Set-1 and testing data is expressed as Set-2.
In the Phase-II, after getting data from phase-I, dataset (Set-1) is further partitioned into a ratio of 2:1. This specific data splitting is used to fit more efficient ensemble model without compromising the testing data. Further, Base models are trained on 50% of the whole dataset and tested on 25% of the dataset. After obtaining prediction results Top three base models are selected based on accuracy parameter.
In the Phase-III, top three models prediction and corresponding target RMSD are used to form a dataset and weight learning is done through DE Algorithm 1. It will tune the parameters and determine the Optimized weights to form new training and testing dataset for complete ensemble model.
In the Phase-IV, Train the neural network by establishing the relationship between top three model's actual-predicted values with optimized weights. This special combined data is

International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181 http://www.ijert.org used to improve the performance of final ensemble model. The Set-2 dataset is used in the same way as training to perform testing of the final ensemble model.

C. Model Evaluation Parameters
The prediction results are compared and priority is given to the models by considering their respective accuracy.
• Accepted Error: The acceptable error range e ranges from [0,1]. It defines acceptable error range between predicted and actual value. • Accuracy: Accuracy is defined in eq (2) for the regression data model: Where, y is predicted target and ŷ is actual target, total number of instances is denoted by n. • Correlation(r): Correlation(r) for actual and predicted value is measured as Pearson correlation which is defined as:

….. (4)
Where x is actual and y is predicted. The total number of instances is denoted by n. Value of r ranges from -1 to 1. • R 2 : R 2 represents the variance proportion of the dependent variable explained by the regression model.
• RMSE: Root Mean Square Error (RMSE) is used to measure error rate of regression data model. It is measured as: …….. (6) Where, as is used for actual and p is used for predicted value of target and n represents the total number of instances.

D. Implementation
The proposed procedure is followed as shown in Algorithm 2. The procedure is followed by four phases to process data and evaluates results from complete ensemble model.

A. Results and Discussions
This section provides, experimental results obtained from base models and ensemble models are presented in tabular format. TABLE IV shows the base models prediction results in terms of evaluation parameters, i.e., accuracy, r, R 2 , and RMSE on the test dataset (set-1 (II)). Top three models are selected on the basis of accuracy for further processing with an ensemble model as expressed in the algorithm. These top models are highlighted that are more accurate than other models, having least RMSE, highest accuracy, correlation and R 2 . the According to Pearson correlation formula if the attributes are significantly correlated with each other then, correlation will be high. Over here, correlation is compared between actual and prediction values of these models and according to that accuracy is measured.     Experimental results show that neural network ensemble outperforms among all these ensembles. It is because NNensemble training establishes the more accurate relationship between actual and predicted classes. Because of weight optimization, neural network performance increases and provides the more efficient performance as compared to other ensembles.

B. Validation
Validation i.e., proving the validity or accuracy of anything is a technique that is properly used in machine learning to measure the robustness of the final model. For this reason, we have performed K-fold cross validation [8] with the final ensemble model. Over here, we have considered K size as 10-fold, i.e. the models are trained and tested each time with random data samples of the same size. This test found that in each fold ensemble perform in a uniform way. So, these results testify to that NN-ensemble outperforms for regression dataset.
The validation results for all ensemble models are presented in TABLE VII with ten iterations of folds. Fig 3, shows that ensemble of neural network performs more efficiently than classical ensemble approach. It demonstrates that due to optimization of weights through DE algorithm neural network based ensemble approach performs in a more accurate way.

C. Conclusion
In this work, Differential Evolution (DE) Algorithm is applied for optimizing prediction from top three base models to improve the results of neural network based ensemble approach. Initially, ten models are trained and tested and top three are selected based on accuracy. Next, these predictions are optimized through DE to improve the performance of the final ensemble model. Data partitioning is done in a specific way to achieve the extremely efficient model. The neural network is used to ensemble the models because of its generic nature. The experimented results have been evaluated through data tables, verified through K-fold cross-validation method. The obtained results clearly show we can improve ensemble model results by optimizing parameters. Model evaluation is done through evaluation parameters like r, R 2 , RMSE and accuracy and compared other ensemble approaches.
It is expected by using more physical and chemical properties may result in improved accuracy and can decrease computing time. Other Nature-inspired algorithms can also be used to enhance the performance of ensemble models. This idea would represent an efficient approach for protein structure identification. Additionally, this work can be International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 http://www.ijert.org prolonged in the prediction of template-modeling (TM-score) and global distance test (GDT TS-score). So, at the last, we can say that instead of making new machine learning models we can improve ensemble results through an effective optimization scheme.