 Open Access
 Authors : Antima Modi , Amit Swami
 Paper ID : IJERTV9IS010161
 Volume & Issue : Volume 09, Issue 01 (January 2020)
 Published (First Online): 29012020
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Weight Optimization through Differential Evolution Algorithm in Neural Network based Ensemble Approach
Antima Modi
Department of Computer Science and Engineering, UCE, Rajasthan Technical University,
Kota, India
Amit Swami
Department of Computer Science and Engineering UCE, Rajasthan Technical University,
Kota, India
Abstract In recent years, with the advent of computational intelligence, machine learning has remained an extremely popular area for research. Till now, in this field, many machine learning models are available for prediction. So, rather than implementing new machine learning method, we should go for ensembles of these models for getting better results. The ensemble learning to represent the most innovative strategy in the field of machine learning. It provides better predictions than ordinary machine learning approaches and provides better accuracy. Generally, in machine learning, only one learner is used for the training and prediction, whereas ensemble learning methods to build a number of learners and then aggregate their results to obtain a more precise prediction. In this work, neural network ensemble is used and the weights for predictions are optimized using Differential Evolution (DE) Algorithm to obtain a more precise final prediction. Comparison between NN ensemble and other ensemble approaches is performed on various evaluation measures and NNensemble outperforms with accuracy 91.82%.
Keywords Machine learning, Neural Network, Differential Evolution (DE) Algorithm, Ensemble.

INTRODUCTION
Machine learning represents an application of Artificial Intelligence (AI) that provides frameworks the capacity to consequently take on and enhance as a matter of fact without being expressly customized. An ensemble is a group of items viewed as a whole rather than individually. Ensemble Learning utilizes a technique that applied to provide training to multiple learners and aggregate their outcomes, considering them as a “committee of decisionmakers". The key idea behind it is the decisions of the committee, with single predictions aggregated suitably, ought to have more acceptable overall accuracy, on average, as in contrast to any committee member. The ensemble members might be predicting realvalues, labels for class, posterior probabilities, clustering, rankings, or any quantity. So that their predictions can be integrated with distinct techniques, including voting, averaging, and probabilistic methods. Predictions from machine learning algorithms play a vital role in modern human's life. As a user of technology, we complete certain tasks that desired to make a decision or classify something like in the medical field, research and development, also in our day to day life and in any data analysis task. There are several machine learning algorithms like the random forest, Knearest neighbor, decision tree, support vector machine, etc. Pantola et al. [1] presented an ensemble technique with the weighted
average strategy to combine machine learning models to obtain a more precise prediction. In weighted average strategy, weights are assigned manually to achieve precise results from ensemble model. Hence, it is a limitation of traditional ensemble approaches. Manual assignment of weights increases the probability of not getting the increased accuracy of ensemble model.
Instead of assigning weights manually there may be some method or algorithm to find these results. In the literature, there are many methods for finding weights for ensemble model like iDHSEL [2], for finding weights for fusion used for grid search and iDNAKACCEL also used for the same purpose [3]. Liu, Zang et al proposed a method that measures the distance of incorrect classifying data values and for clustering, it used affinity propagation algorithm [4]. Many more methods with fuzzy logic are used in literature to aggregate classifiers like multiple kernel learning for better prediction [5, 6, 7]. Samples having high fuzziness get higher incorrect classification cases. To get ridoff divide and conquer method is proposed [8].
For getting better performance through machine learning models there ensembles approaches should be refined. In Bioinformatics for protein structure prediction, Rana et al. showed that random forest gives a more precise prediction for classification and regression data [9, 10]. Then neural network ensemble [11] approach is proposed by Rayal and Rana on regression dataset and more promising results are gained. They compare classical ensemble approaches with neural network based ensemble approach and found that Nsemble
[12] outperform. But there was not any scheme to optimize weights to get better predictions. So, in our proposed work we are also working on neural network based ensemble approach for protein structure prediction using regression dataset. Over here, weight optimization is done through Nature Inspired Algorithm (NIA) which is Differential Evolution (DE) Algorithm.The paper would be structured as: in Material and Models, the description of Dataset and Models is provided. In Methodology and Implementation, Differential Evolution Algorithm and proposed work are discussed. Then Experimental Results and Kfold CrossValidation for proposed work is explained and complete work is enclosed in conclusion.

MATERIAL AND MODELS

Data with its Description
Proteins are the core component of our life. Proteins provide the foundation of structures such as skin, hair, and tendon and they are the primary cause for catalyzing and synchronizing biochemical reactions, transporting molecules [13]. Regression data [14] is used with the proposed approach, for protein structure prediction. The physicochemical properties of a protein structure are used to determine the quality of the protein structure. Therefore, these properties are utilized to identify native or nativelike structure from various predicted structures. In this work, the machine learning regression models are experimented with six physicochemical properties to predict the root mean square deviation (RMSD) of the protein structure. This prediction works in the absence of its true native state and each protein structure ranges from 0Ã… to 5 Ã… RMSD space. Physicochemical properties explored in the paper are as described with each attribute in TABLE I. RMSD value is predicted and equated with known protein structure [9]. In the TABLE II, a sample of data of five instances is shown from the specified dataset. For Regression purpose, the value of RMSD is considered as continuous.
TABLE I. DESCRIPTION OF ATTRIBUTES
S. No.
Table Column Head
Attribute
Description
1
RMSD
Root Mean Square Deviation
2
Area
Total Surface Area
3
ED
Euclidean Distance
4
Energy
Total Emperical Energy
5
SS
Secondary Structure penalty
6
SL
Sequence Length
7
PN
Pair Number
TABLE II. A SMALL DATA SAMPLE
Table Column Head
RMSD
Area
ED
Energy
SS
SL
PN
3
12462.1
60594.3
3668.59
32
247
2754
0
15337.3
121624
5856
35
299
4918
5
9418.94
17629.7
593.842
195
133
890
4
9331.73
11717.2
1059.27
135
156
654
1
3769.08
644.369
615.49
24
50
42

Machine Learning Models
This section explores the entire machine learning model description with configuration and specifications and these are depicted in TABLE III.


METHODOLOGY AND IMPLEMENTATION

Differential Evolution(DE) Algorithm
Differential Evolution (DE) algorithm is an optimization technique with stochastic nature. It is an Evolutionary Algorithm (EA) with faster and simpler evaluation process. It
was invented by Storn and Price [17]. The steps involve in DE is shown as a flowchart in Fig 1.
Fig 1: DE : Flow Chart
The procedural illustration is provided in Algorithm 1. In DE, a solution is finding by potential solutions (individuals). The search space is a Ddimensional space defined as i = 1, 2,
…, SN. Where, SN is the population (number of individuals). The algorithm employs the operators, namely Mutation, Crossover, and Selection. Initially, to find a trial vector mutation operator is utilized for each individual of the present population. Then, using crossover operator, parent vector and trail vector are generated and after this generated offspring is compared with the parent. In selection operator, two functions are there: first, mutation operation with a selected individual is used to generate a trial vector. Second, for the next generation select the best fitness values between parent and Offspring. At the end return individual with the best fitness as solution [18]. In our work, we are using DE for finding optimized weights so that we can achieve improved results from the complete ensemble model. For this reason, we have considered our objective function as follows:
= .. (1)
Where, Yi represents the Actual RMSD and Xi represents the Predicted value of the top three models. Wi is the optimized weights for each model's prediction in top three models.
TABLE III. DESCRIPTION OF CLASSIFICATION MODEL USED
Table Column Head
S.
no.
Models
Description
Required Packages
Parameters
1
Random Forest
Random decision forest of trees
randomForest
sampling=adaBoost, mtry
2
Neural Network
Neural Network with back propagation is used
neuralnet
algorithm=rprop+, size=15
3
CART
Classification And Regression tree for Classification
Rpart
maxdepth=15
4
Ksvm
SVM with kernel fuction
Kernlab
kernel function
5
SVMPoly
SVM with polynomial method
Kernlab
svmPoly
6
Decision Tree
Tree based model for classification
Rpart
method=rpart2
7
Xgboost
Extreme Gradient Boosting for faster and scalable output
Xgboost
nrounds=2000
8
SVM
Discriminative classifier formally defined by a separating hyperplane
e1071
method, kernel
9
LM
Multinominal classifier
Caret
Multinom

Methodology
The proposed approach is shown as a stepdown flow diagram in Fig 2. It is organized in four parts: (1) data splitting, (2) train and test base models, selection, (3) weight Learning through DE algorithm,(4) train and test neural network. Dataset is loaded and 75% of the dataset is the training data (Set1) and rest of the dataset (Set2) is used to
Fig 2: Steps for Proposed Approach
Algorithm 1 : Differential Evolutionary Algorithm [18]
Initialization
Initialize control parameters CR (Crossover Rate) and F (Scale Factor)
Initialize and create population vector P(0) of SN individuals
Evaluation with Operators
while (stopping condition(s) true) do
for each individual, xi(G) P(G) do
Add (G) to P(G+1);
else
Add xi(G) to P(G+1)
end if
end for end while
Return individual with best fitness as solution;

Evaluate Fitness, f(xi(G));

Mutation : Generate trail vector ui(G)

Crossover : Generate an Offspring xi(G)

if f( (G) ) is better than f(xi(G)) then
Algorithm 1 : Differential Evolutionary Algorithm [18]
Initialization
Initialize control parameters CR (Crossover Rate) and F (Scale Factor)
Initialize and create population vector P(0) of SN individuals
Evaluation with Operators
while (stopping condition(s) true) do
for each individual, xi(G) P(G) do
Add (G) to P(G+1);
else
Add xi(G) to P(G+1)
end if
end for end while
Return individual with best fitness as solution;

Evaluate Fitness, f(xi(G));

Mutation : Generate trail vector ui(G)

Crossover : Generate an Offspring xi(G)

if f( (G) ) is better than f(xi(G)) then
test the NNensemble model. So, the training data is expressed as Set1 and testing data is expressed as Set2.
In the PhaseII, after getting data from phaseI, dataset (Set1) is further partitioned into a ratio of 2:1. This specific data splitting is used to fit more efficient ensemble model without compromising the testing data. Further, Base models are trained on 50% of the whole dataset and tested on 25% of the dataset. After obtaining prediction results Top three base models are selected based on accuracy parameter.
In the PhaseIII, top three models prediction and corresponding target RMSD are used to form a dataset and weight learning is done through DE Algorithm 1. It will tune the parameters and determine the Optimized weights to form new training and testing dataset for complete ensemble model.
In the PhaseIV, Train the neural network by establishing the relationship between top three model's actualpredicted values with optimized weights. This special combined data is
used to improve the performance of final ensemble model.
( )2
..(6)
The Set2 dataset is used in the same way as training to perform testing of the final ensemble model.
TABLE IV. RESULTS EVALUATION FROM PHASEII
Table Column Head
S.
No.
Model
R
R2
RMSE
Acc%
1
Random Forest
0.84
0.71
0.59
87.04
2
Neural Network
0.29
0.08
142
52.75
3
CART
0.76
0.58
0.69
84.4
4
Ksvm
0.71
0.5
0.8
81.35
5
SVMPoly
0.01
0
1.55
52.85
6
Decision Tree
0.01
0
1.48
37.91
7
Xgboost
0.78
0.61
0.62
85.85
8
SVM
0.47
0.22
1.24
63.99
9
LM
0.8
0.64
1.88
47.21


Model Evaluation Parameters
The prediction results are compared and priority is given to the models by considering their respective accuracy.

Accepted Error: The acceptable error range e ranges from [0,1]. It defines acceptable error range between predicted and actual value.

Accuracy: Accuracy is defined in eq (2) for the regression data model:
=1
Where, as is used for actual and p is used for predicted value of target and n represents the total number of instances.


Implementation
The proposed procedure is followed as shown in Algorithm 2. The procedure is followed by four phases to process data and evaluates results from complete ensemble model.
% = 100
.. (2)
Algorithm 2 : Complete NNensemble Algorithm
while d 0
Phase I : DATA SPLITTING
d1 = random (d, frac = 0.75) d2 = d – d1
d1 = Training [Set 1] d1 = Testing [Set 2]
Phase II : TRAINING TESTING BASEMODELS
dBaseTrain = random(d1, frac = 0.66) dBaseTest = d1 – dBaseTrain
Train BaseModels on dBaseTrain Test BaseModels on dBaseTest absolute (yi )
Acc%
r
R2 RMSE
BaseResult : NN(x)
Phase III : WEIGHT LEARNING THROUGH DE
Input NN(x) DE Algorithm for i = 1 to 3
measure Wi
Combine weights to top three models predictions
Phase IV : NNENSEMBLE AND TESTING PHASE
Train NN – ensemble with training dataset after DE Test NN ensemble on Set2 Dtaset
Acc% r
R2 RMSE
end
Algorithm 2 : Complete NNensemble Algorithm
while d 0
Phase I : DATA SPLITTING
d1 = random (d, frac = 0.75) d2 = d – d1
d1 = Training [Set 1] d1 = Testing [Set 2]
Phase II : TRAINING TESTING BASEMODELS
dBaseTrain = random(d1, frac = 0.66) dBaseTest = d1 – dBaseTrain
Train BaseModels on dBaseTrain Test BaseModels on dBaseTest absolute (yi )
Acc%
r
R2 RMSE
BaseResult : NN(x)
Phase III : WEIGHT LEARNING THROUGH DE
Input NN(x) DE Algorithm for i = 1 to 3
measure Wi
Combine weights to top three models predictions
Phase IV : NNENSEMBLE AND TESTING PHASE
Train NN – ensemble with training dataset after DE Test NN ensemble on Set2 Dtaset
Acc% r
R2 RMSE
end
= 1
{
{
= 1 ( ) .. (3)
0
Where, y is predicted target and is actual target, total number of instances is denoted by n.

Correlation(r): Correlation(r) for actual and predicted value is measured as Pearson correlation which is defined as:

( ) ( )


EXPERIMENTAL RESULTS
=
1
1
1
( )2 ( )2
. .. (4)

Results and Discussions
This section provides, experimental results obtained from
Where x is actual and y is predicted. The total number of instances is denoted by n. Value of r ranges from 1 to 1.

R2: R2 represents the variance proportion of the dependent variable explained by the regression model.
R2 = r*r .. (5)

RMSE: Root Mean Square Error (RMSE) is used to measure error rate of regression data model. It is measured as:
base models and ensemble models are presented in tabular format. TABLE IV shows the base models prediction results in terms of evaluation parameters, i.e., accuracy, r, R2, and RMSE on the test dataset (set1 (II)). Top three models are selected on the basis of accuracy for further processing with an ensemble model as expressed in the algorithm. These top models are highlighted that are more accurate than other models, having least RMSE, highest accuracy, correlation and R2. the According to Pearson correlation formula if the attributes are significantly correlated with each other then, correlation will be high. Over here, correlation is compared between actual and prediction values of these models and according to that accuracy is measured.
TABLE V. PREDICTIONS FROM TOP MODELS
Table Column Head
Actual RMSD
CART
Random Forest
Xgboost
5
5
5
5
0
0
0
0
5
5
4
5
4
4
4
2
5
4
3
3
4
4
4
4
0
0
0
0
3
3
3
3
1
1
1
1
According to results, top three selected models are random forest, Cubist, and xgboost are used for finding optimized weights according to the objective function definition with DE algorithm. Top three models prediction sample is depicted in TABLE V. DE Algorithm uses actual and predicted values of the top three models as input and returns output. It provides the most appropriate combination of weights or optimized weight for better results. After that, we will apply these predictions and weight combination to ensemble model as training data and testing dataset set2. It is also using these weight combinations and on this dataset, complete neural network ensemble model is tested. Complete ensemble is evaluated on actual and predicted values. Results achieved from neural network ensemble are also compared with classical ensemble models.
TABLE VI. PERFORMANCE COMPARISON
Table Column Head
S.
No.
Model
r
R2
RMSE
Acc%
1
Random Forest
0.8
0.64
0.65
87.99
2
CART
0.79
0.62
0.67
86.35
3
Xgboost
0.77
0.58
0.74
86.32
4
NNensemble
0.83
0.69
0.63
90.48
5
Classical ensemble approach
0.78
0.45
0.82
83.52
TABLE VI show the prediction results in terms of accuracy obtained from selected models, its classical ensemble technique and neural network ensemble technique. Experimental results show that neral network ensemble outperforms among all these ensembles. It is because NN ensemble training establishes the more accurate relationship between actual and predicted classes. Because of weight optimization, neural network performance increases and provides the more efficient performance as compared to other ensembles.


Validation
Validation i.e., proving the validity or accuracy of anything is a technique that is properly used in machine learning to measure the robustness of the final model. For this reason, we have performed Kfold cross validation [8] with the final ensemble model. Over here, we have considered K size as 10fold, i.e. the models are trained and tested each time with random data samples of the same size. This test found that in each fold ensemble perform in a uniform way. So, these results testify to that NNensemble outperforms for regression dataset.
The validation results for all ensemble models are presented in TABLE VII with ten iterations of folds. Fig 3, shows that ensemble of neural network performs more efficiently than classical ensemble approach. It demonstrates that due to optimization of weights through DE algorithm neural network based ensemble approach performs in a more accurate way.
TABLE VII. 10FOLD CROSSVALIDATION FOR ACCURACY
ColumnTable Head
Fold
NN
ensemble
Cubist
Random Forest
Xgboost
Classical Ensemble
1
91.82
84.58
87.99
86.32
84.67
2
90.85
83.46
87.62
85.51
84.16
3
89.86
85.43
88.02
85.87
84.66
4
91.58
82.09
86.86
87.25
83.95
5
89.53
83.88
87.22
88.18
83.45
6
91.33
84.48
87.83
85.61
84.33
7
91.79
83.54
88.46
87.63
85.36
8
90.08
84.7
87.52
88.08
85.09
9
92.01
81.98
87.94
85.99
85.61
10
90.01
84.41
87.09
88.66
84.05
Fig 3: KFold Cross Validation NNEnsemble and Classical Ensemble
for Accuracy

Conclusion
In this work, Differential Evolution (DE) Algorithm is applied for optimizing prediction from top three base models to improve the results of neural network based ensemble approach. Initially, ten models are trained and tested and top three are selected based on accuracy. Next, these predictions are optimized through DE to improve the performance of the final ensemble model. Data partitioning is done in a specific way to achieve the extremely efficient model. The neural network is used to ensemble the models because of its generic nature. The experimented results have been evaluated through data tables, verified through Kfold crossvalidation method. The obtained results clearly show we can improve ensemble model results by optimizing parameters. Model evaluation is done through evaluation parameters like r, R2, RMSE and accuracy and compared other ensemble approaches.
It is expected by using more physical and chemical properties may result in improved accuracy and can decrease computing time. Other Natureinspired algorithms can also be used to enhance the performance of ensemble models. This idea would represent an efficient approach for protein structure identification. Additionally, this work can be
prolonged in the prediction of templatemodeling (TMscore) and global distance test (GDT TSscore). So, at the last, we can say that instead of making new machine learning models we can improve ensemble results through an effective optimization scheme.
REFERENCES

Paritosh Pantola, Anju Bala, and Prashant Singh Rana, Consensus based ensemble model for spam detection, In Advances in Computing, Communications and Informatics (ICACCI), 2015 International Conference on, pages 17241727. IEEE, 2015.

Bin Liu, Ren Long, and KuoChen Chou, idhsel: identifying dnase i hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework, Bioinformatics, 32(16):24112418, 2016.

Bin Liu, Shanyi Wang, Qiwen Dong, Shumin Li, and Xuan Liu. Identification of dnabinding proteins by combining autocross covariance transformation and ensemble learning, IEEE transactions on nanobioscience, 15(4):328334, 2016.

Bin Liu, Shanyi Wang, Ren Long, and KuoChen Chou, irspotel: identify recombination spots with an ensemble learning approach, Bioinformatics, 33(1):3541, 2016.

Bin Liu, Deyuan Zhang, Ruifeng Xu, Jinghao Xu, Xiaolong Wang, Qingcai Chen, Qiwen Dong, and KuoChen Chou, Combining evolutionary information extracted from frequency profiles with sequencebased kernels for protein remote homology detection, Bioinformatics, 30(4):472479, 2013.

Corinna Cortes, Mehryar Mohri, and Afshin Rostamizadeh, Twostage learning kernel algorithms, In ICML, pages 239246. Citeseer 2010.

Manik Varma and Bodla Rakesh Babu, More generality in efficient multiple kernel learning, In Proceedings of the 26th Annual International Conference on Machine Learning, pages 10651072. ACM, 2009.

XiZhao Wang, Rana Aamir Raza Ashfaq, and AiMin Fu, Fuzziness based sample categorization for classifier performance improvement, Journal of Intelligent & Fuzzy Systems, 29(3):11851196, 2015.

Sonal Mishra, Yadunath Pathak, and Anamika Ahirwar, Classification of protein structure (rmsd6Ã…) using physicochemical properties, International Journal of BioScience and BioTechnology, 7(6):141 150, 2015.

Prashant Singh Rana, Harish Sharma, Mahua Bhattacharya, and Anupam Shukla. Quality assessment of modeled protein structure using physicochemical properties, Journal of bioinformatics and computational biology, 13(02):1550005, 2015.

Lars Kai Hansen and Peter Salamon, Neural network ensemble, IEEE transactions on pattern analysis and machine intelligence, 12(10):993 1001, 1990.

Rishith Rayal, Divya Khanna, Jasminder Kaur Sandhu, Nishtha Hooda, and Prashant Singh Rana, Nsemble: neural network based ensemble approach, International Journal of Machine Learning and Cybernetics, pages 19, 2017.

Er Amanpreet Kaur and Baljit Singh Khehra, Aproaches to prediction of protein structure: A review, International Research Journal of Engineering and Technology, 2017.

Sourceforge : http://bit.ly/RFPCPDataSets

CART website: CRAN.RProject. http://goo.gl/ulWSI3

XgBoost website: CRAN.RProject. http://goo.gl/ulWSI3

Kenneth V Price, Differential evolution: a fast and simple numerical optimizer, In Fuzzy Information Processing Society, 1996. NAFIPS, 1996 Biennial Conference of the North American, pages 524527. IEEE,1996.

Harish sharma Kavita Sharma, Nature Inspired Algorithms: An Introduction, Soft Computing Research Society, 2009.