**Open Access**-
**Authors :**Antima Modi , Amit Swami -
**Paper ID :**IJERTV9IS010161 -
**Volume & Issue :**Volume 09, Issue 01 (January 2020) -
**Published (First Online):**29-01-2020 -
**ISSN (Online) :**2278-0181 -
**Publisher Name :**IJERT -
**License:**This work is licensed under a Creative Commons Attribution 4.0 International License

#### Weight Optimization through Differential Evolution Algorithm in Neural Network based Ensemble Approach

Antima Modi

Department of Computer Science and Engineering, UCE, Rajasthan Technical University,

Kota, India

Amit Swami

Department of Computer Science and Engineering UCE, Rajasthan Technical University,

Kota, India

Abstract In recent years, with the advent of computational intelligence, machine learning has remained an extremely popular area for research. Till now, in this field, many machine learning models are available for prediction. So, rather than implementing new machine learning method, we should go for ensembles of these models for getting better results. The ensemble learning to represent the most innovative strategy in the field of machine learning. It provides better predictions than ordinary machine learning approaches and provides better accuracy. Generally, in machine learning, only one learner is used for the training and prediction, whereas ensemble learning methods to build a number of learners and then aggregate their results to obtain a more precise prediction. In this work, neural network ensemble is used and the weights for predictions are optimized using Differential Evolution (DE) Algorithm to obtain a more precise final prediction. Comparison between NN- ensemble and other ensemble approaches is performed on various evaluation measures and NN-ensemble outperforms with accuracy 91.82%.

Keywords Machine learning, Neural Network, Differential Evolution (DE) Algorithm, Ensemble.

INTRODUCTION

Machine learning represents an application of Artificial Intelligence (AI) that provides frameworks the capacity to consequently take on and enhance as a matter of fact without being expressly customized. An ensemble is a group of items viewed as a whole rather than individually. Ensemble Learning utilizes a technique that applied to provide training to multiple learners and aggregate their outcomes, considering them as a “committee of decision-makers". The key idea behind it is the decisions of the committee, with single predictions aggregated suitably, ought to have more acceptable overall accuracy, on average, as in contrast to any committee member. The ensemble members might be predicting real-values, labels for class, posterior probabilities, clustering, rankings, or any quantity. So that their predictions can be integrated with distinct techniques, including voting, averaging, and probabilistic methods. Predictions from machine learning algorithms play a vital role in modern human's life. As a user of technology, we complete certain tasks that desired to make a decision or classify something like in the medical field, research and development, also in our day to day life and in any data analysis task. There are several machine learning algorithms like the random forest, K-nearest neighbor, decision tree, support vector machine, etc. Pantola et al. [1] presented an ensemble technique with the weighted

average strategy to combine machine learning models to obtain a more precise prediction. In weighted average strategy, weights are assigned manually to achieve precise results from ensemble model. Hence, it is a limitation of traditional ensemble approaches. Manual assignment of weights increases the probability of not getting the increased accuracy of ensemble model.

Instead of assigning weights manually there may be some method or algorithm to find these results. In the literature, there are many methods for finding weights for ensemble model like iDHS-EL [2], for finding weights for fusion used for grid search and iDNA-KACC-EL also used for the same purpose [3]. Liu, Zang et al proposed a method that measures the distance of incorrect classifying data values and for clustering, it used affinity propagation algorithm [4]. Many more methods with fuzzy logic are used in literature to aggregate classifiers like multiple kernel learning for better prediction [5, 6, 7]. Samples having high fuzziness get higher incorrect classification cases. To get rid-off divide and conquer method is proposed [8].

For getting better performance through machine learning models there ensembles approaches should be refined. In Bioinformatics for protein structure prediction, Rana et al. showed that random forest gives a more precise prediction for classification and regression data [9, 10]. Then neural network ensemble [11] approach is proposed by Rayal and Rana on regression dataset and more promising results are gained. They compare classical ensemble approaches with neural network based ensemble approach and found that N-semble

[12] outperform. But there was not any scheme to optimize weights to get better predictions. So, in our proposed work we are also working on neural network based ensemble approach for protein structure prediction using regression dataset. Over here, weight optimization is done through Nature Inspired Algorithm (NIA) which is Differential Evolution (DE) Algorithm.The paper would be structured as: in Material and Models, the description of Dataset and Models is provided. In Methodology and Implementation, Differential Evolution Algorithm and proposed work are discussed. Then Experimental Results and K-fold Cross-Validation for proposed work is explained and complete work is enclosed in conclusion.

MATERIAL AND MODELS

Data with its Description

Proteins are the core component of our life. Proteins provide the foundation of structures such as skin, hair, and tendon and they are the primary cause for catalyzing and synchronizing bio-chemical reactions, transporting molecules [13]. Regression data [14] is used with the proposed approach, for protein structure prediction. The physicochemical properties of a protein structure are used to determine the quality of the protein structure. Therefore, these properties are utilized to identify native or native-like structure from various predicted structures. In this work, the machine learning regression models are experimented with six physicochemical properties to predict the root mean square deviation (RMSD) of the protein structure. This prediction works in the absence of its true native state and each protein structure ranges from 0Ã… to 5 Ã… RMSD space. Physicochemical properties explored in the paper are as described with each attribute in TABLE I. RMSD value is predicted and equated with known protein structure [9]. In the TABLE II, a sample of data of five instances is shown from the specified dataset. For Regression purpose, the value of RMSD is considered as continuous.

TABLE I. DESCRIPTION OF ATTRIBUTES

S. No.

Table Column Head

Attribute

Description

1

RMSD

Root Mean Square Deviation

2

Area

Total Surface Area

3

ED

Euclidean Distance

4

Energy

Total Emperical Energy

5

SS

Secondary Structure penalty

6

SL

Sequence Length

7

PN

Pair Number

TABLE II. A SMALL DATA SAMPLE

Table Column Head

RMSD

Area

ED

Energy

SS

SL

PN

3

12462.1

60594.3

-3668.59

32

247

2754

0

15337.3

121624

-5856

35

299

4918

5

9418.94

17629.7

-593.842

195

133

890

4

9331.73

11717.2

-1059.27

135

156

654

1

3769.08

644.369

-615.49

24

50

42

Machine Learning Models

This section explores the entire machine learning model description with configuration and specifications and these are depicted in TABLE III.

METHODOLOGY AND IMPLEMENTATION

Differential Evolution(DE) Algorithm

Differential Evolution (DE) algorithm is an optimization technique with stochastic nature. It is an Evolutionary Algorithm (EA) with faster and simpler evaluation process. It

was invented by Storn and Price [17]. The steps involve in DE is shown as a flowchart in Fig 1.

Fig 1: DE : Flow Chart

The procedural illustration is provided in Algorithm 1. In DE, a solution is finding by potential solutions (individuals). The search space is a D-dimensional space defined as i = 1, 2,

…, SN. Where, SN is the population (number of individuals). The algorithm employs the operators, namely Mutation, Crossover, and Selection. Initially, to find a trial vector mutation operator is utilized for each individual of the present population. Then, using crossover operator, parent vector and trail vector are generated and after this generated offspring is compared with the parent. In selection operator, two functions are there: first, mutation operation with a selected individual is used to generate a trial vector. Second, for the next generation select the best fitness values between parent and Offspring. At the end return individual with the best fitness as solution [18]. In our work, we are using DE for finding optimized weights so that we can achieve improved results from the complete ensemble model. For this reason, we have considered our objective function as follows:

= .. (1)

Where, Yi represents the Actual RMSD and Xi represents the Predicted value of the top three models. Wi is the optimized weights for each model's prediction in top three models.

TABLE III. DESCRIPTION OF CLASSIFICATION MODEL USED

Table Column Head

S.

no.

Models

Description

Required Packages

Parameters

1

Random Forest

Random decision forest of trees

randomForest

sampling=adaBoost, mtry

2

Neural Network

Neural Network with back propagation is used

neuralnet

algorithm=rprop+, size=15

3

CART

Classification And Regression tree for Classification

Rpart

maxdepth=15

4

Ksvm

SVM with kernel fuction

Kernlab

kernel function

5

SVMPoly

SVM with polynomial method

Kernlab

svmPoly

6

Decision Tree

Tree based model for classification

Rpart

method=rpart2

7

Xgboost

Extreme Gradient Boosting for faster and scalable output

Xgboost

nrounds=2000

8

SVM

Discriminative classifier formally defined by a separating hyperplane

e1071

method, kernel

9

LM

Multinominal classifier

Caret

Multinom

Methodology

The proposed approach is shown as a step-down flow diagram in Fig 2. It is organized in four parts: (1) data splitting, (2) train and test base models, selection, (3) weight Learning through DE algorithm,(4) train and test neural network. Dataset is loaded and 75% of the dataset is the training data (Set-1) and rest of the dataset (Set-2) is used to

Fig 2: Steps for Proposed Approach

Algorithm 1 : Differential Evolutionary Algorithm [18]

Initialization

Initialize control parameters CR (Crossover Rate) and F (Scale Factor)

Initialize and create population vector P(0) of SN individuals

Evaluation with Operators

while (stopping condition(s) true) do

for each individual, xi(G) P(G) do

Add (G) to P(G+1);

else

Add xi(G) to P(G+1)

end if

end for end while

Return individual with best fitness as solution;

Evaluate Fitness, f(xi(G));

Mutation : Generate trail vector ui(G)

Crossover : Generate an Offspring xi(G)

if f( (G) ) is better than f(xi(G)) then

Algorithm 1 : Differential Evolutionary Algorithm [18]

Initialization

Initialize control parameters CR (Crossover Rate) and F (Scale Factor)

Initialize and create population vector P(0) of SN individuals

Evaluation with Operators

while (stopping condition(s) true) do

for each individual, xi(G) P(G) do

Add (G) to P(G+1);

else

Add xi(G) to P(G+1)

end if

end for end while

Return individual with best fitness as solution;

Evaluate Fitness, f(xi(G));

Mutation : Generate trail vector ui(G)

Crossover : Generate an Offspring xi(G)

if f( (G) ) is better than f(xi(G)) then

test the NN-ensemble model. So, the training data is expressed as Set-1 and testing data is expressed as Set-2.

In the Phase-II, after getting data from phase-I, dataset (Set-1) is further partitioned into a ratio of 2:1. This specific data splitting is used to fit more efficient ensemble model without compromising the testing data. Further, Base models are trained on 50% of the whole dataset and tested on 25% of the dataset. After obtaining prediction results Top three base models are selected based on accuracy parameter.

In the Phase-III, top three models prediction and corresponding target RMSD are used to form a dataset and weight learning is done through DE Algorithm 1. It will tune the parameters and determine the Optimized weights to form new training and testing dataset for complete ensemble model.

In the Phase-IV, Train the neural network by establishing the relationship between top three model's actual-predicted values with optimized weights. This special combined data is

used to improve the performance of final ensemble model.

( )2

..(6)

The Set-2 dataset is used in the same way as training to perform testing of the final ensemble model.

TABLE IV. RESULTS EVALUATION FROM PHASE-II

Table Column Head

S.

No.

Model

R

R2

RMSE

Acc%

1

Random Forest

0.84

0.71

0.59

87.04

2

Neural Network

0.29

0.08

142

52.75

3

CART

0.76

0.58

0.69

84.4

4

Ksvm

0.71

0.5

0.8

81.35

5

SVMPoly

-0.01

0

1.55

52.85

6

Decision Tree

0.01

0

1.48

37.91

7

Xgboost

0.78

0.61

0.62

85.85

8

SVM

0.47

0.22

1.24

63.99

9

LM

0.8

0.64

1.88

47.21

Model Evaluation Parameters

The prediction results are compared and priority is given to the models by considering their respective accuracy.

Accepted Error: The acceptable error range e ranges from [0,1]. It defines acceptable error range between predicted and actual value.

Accuracy: Accuracy is defined in eq (2) for the regression data model:

=1

Where, as is used for actual and p is used for predicted value of target and n represents the total number of instances.

Implementation

The proposed procedure is followed as shown in Algorithm 2. The procedure is followed by four phases to process data and evaluates results from complete ensemble model.

% = 100

.. (2)

Algorithm 2 : Complete NN-ensemble Algorithm

while d 0

Phase I : DATA SPLITTING

d1 = random (d, frac = 0.75) d2 = d – d1

d1 = Training [Set 1] d1 = Testing [Set 2]

Phase II : TRAINING TESTING BASE-MODELS

dBaseTrain = random(d1, frac = 0.66) dBaseTest = d1 – dBaseTrain

Train Base-Models on dBaseTrain Test Base-Models on dBaseTest absolute (yi )

Acc%

r

R2 RMSE

BaseResult : NN(x)

Phase III : WEIGHT LEARNING THROUGH DE

Input NN(x) DE Algorithm for i = 1 to 3

measure Wi

Combine weights to top three models predictions

Phase IV : NNENSEMBLE AND TESTING PHASE

Train NN – ensemble with training dataset after DE Test NN ensemble on Set-2 Dtaset

Acc% r

R2 RMSE

end

Algorithm 2 : Complete NN-ensemble Algorithm

while d 0

Phase I : DATA SPLITTING

d1 = random (d, frac = 0.75) d2 = d – d1

d1 = Training [Set 1] d1 = Testing [Set 2]

Phase II : TRAINING TESTING BASE-MODELS

dBaseTrain = random(d1, frac = 0.66) dBaseTest = d1 – dBaseTrain

Train Base-Models on dBaseTrain Test Base-Models on dBaseTest absolute (yi )

Acc%

r

R2 RMSE

BaseResult : NN(x)

Phase III : WEIGHT LEARNING THROUGH DE

Input NN(x) DE Algorithm for i = 1 to 3

measure Wi

Combine weights to top three models predictions

Phase IV : NNENSEMBLE AND TESTING PHASE

Train NN – ensemble with training dataset after DE Test NN ensemble on Set-2 Dtaset

Acc% r

R2 RMSE

end

= 1

{

{

= 1 ( ) .. (3)

0

Where, y is predicted target and is actual target, total number of instances is denoted by n.

Correlation(r): Correlation(r) for actual and predicted value is measured as Pearson correlation which is defined as:

( ) ( )

EXPERIMENTAL RESULTS

=

1

1

1

( )2 ( )2

. .. (4)

Results and Discussions

This section provides, experimental results obtained from

Where x is actual and y is predicted. The total number of instances is denoted by n. Value of r ranges from -1 to 1.

R2: R2 represents the variance proportion of the dependent variable explained by the regression model.

R2 = r*r .. (5)

RMSE: Root Mean Square Error (RMSE) is used to measure error rate of regression data model. It is measured as:

base models and ensemble models are presented in tabular format. TABLE IV shows the base models prediction results in terms of evaluation parameters, i.e., accuracy, r, R2, and RMSE on the test dataset (set-1 (II)). Top three models are selected on the basis of accuracy for further processing with an ensemble model as expressed in the algorithm. These top models are highlighted that are more accurate than other models, having least RMSE, highest accuracy, correlation and R2. the According to Pearson correlation formula if the attributes are significantly correlated with each other then, correlation will be high. Over here, correlation is compared between actual and prediction values of these models and according to that accuracy is measured.

TABLE V. PREDICTIONS FROM TOP MODELS

Table Column Head

Actual RMSD

CART

Random Forest

Xgboost

5

5

5

5

0

0

0

0

5

5

4

5

4

4

4

2

5

4

3

3

4

4

4

4

0

0

0

0

3

3

3

3

1

1

1

1

According to results, top three selected models are random forest, Cubist, and xgboost are used for finding optimized weights according to the objective function definition with DE algorithm. Top three models prediction sample is depicted in TABLE V. DE Algorithm uses actual and predicted values of the top three models as input and returns output. It provides the most appropriate combination of weights or optimized weight for better results. After that, we will apply these predictions and weight combination to ensemble model as training data and testing dataset set-2. It is also using these weight combinations and on this dataset, complete neural network ensemble model is tested. Complete ensemble is evaluated on actual and predicted values. Results achieved from neural network ensemble are also compared with classical ensemble models.

TABLE VI. PERFORMANCE COMPARISON

Table Column Head

S.

No.

Model

r

R2

RMSE

Acc%

1

Random Forest

0.8

0.64

0.65

87.99

2

CART

0.79

0.62

0.67

86.35

3

Xgboost

0.77

0.58

0.74

86.32

4

NN-ensemble

0.83

0.69

0.63

90.48

5

Classical ensemble approach

0.78

0.45

0.82

83.52

TABLE VI show the prediction results in terms of accuracy obtained from selected models, its classical ensemble technique and neural network ensemble technique. Experimental results show that neral network ensemble outperforms among all these ensembles. It is because NN- ensemble training establishes the more accurate relationship between actual and predicted classes. Because of weight optimization, neural network performance increases and provides the more efficient performance as compared to other ensembles.

Validation

Validation i.e., proving the validity or accuracy of anything is a technique that is properly used in machine learning to measure the robustness of the final model. For this reason, we have performed K-fold cross validation [8] with the final ensemble model. Over here, we have considered K size as 10-fold, i.e. the models are trained and tested each time with random data samples of the same size. This test found that in each fold ensemble perform in a uniform way. So, these results testify to that NN-ensemble outperforms for regression dataset.

The validation results for all ensemble models are presented in TABLE VII with ten iterations of folds. Fig 3, shows that ensemble of neural network performs more efficiently than classical ensemble approach. It demonstrates that due to optimization of weights through DE algorithm neural network based ensemble approach performs in a more accurate way.

TABLE VII. 10-FOLD CROSS-VALIDATION FOR ACCURACY

ColumnTable Head

Fold

NN-

ensemble

Cubist

Random Forest

Xgboost

Classical Ensemble

1

91.82

84.58

87.99

86.32

84.67

2

90.85

83.46

87.62

85.51

84.16

3

89.86

85.43

88.02

85.87

84.66

4

91.58

82.09

86.86

87.25

83.95

5

89.53

83.88

87.22

88.18

83.45

6

91.33

84.48

87.83

85.61

84.33

7

91.79

83.54

88.46

87.63

85.36

8

90.08

84.7

87.52

88.08

85.09

9

92.01

81.98

87.94

85.99

85.61

10

90.01

84.41

87.09

88.66

84.05

Fig 3: K-Fold Cross Validation NN-Ensemble and Classical Ensemble

for Accuracy

Conclusion

In this work, Differential Evolution (DE) Algorithm is applied for optimizing prediction from top three base models to improve the results of neural network based ensemble approach. Initially, ten models are trained and tested and top three are selected based on accuracy. Next, these predictions are optimized through DE to improve the performance of the final ensemble model. Data partitioning is done in a specific way to achieve the extremely efficient model. The neural network is used to ensemble the models because of its generic nature. The experimented results have been evaluated through data tables, verified through K-fold cross-validation method. The obtained results clearly show we can improve ensemble model results by optimizing parameters. Model evaluation is done through evaluation parameters like r, R2, RMSE and accuracy and compared other ensemble approaches.

It is expected by using more physical and chemical properties may result in improved accuracy and can decrease computing time. Other Nature-inspired algorithms can also be used to enhance the performance of ensemble models. This idea would represent an efficient approach for protein structure identification. Additionally, this work can be

prolonged in the prediction of template-modeling (TM-score) and global distance test (GDT TS-score). So, at the last, we can say that instead of making new machine learning models we can improve ensemble results through an effective optimization scheme.

REFERENCES

Paritosh Pantola, Anju Bala, and Prashant Singh Rana, Consensus based ensemble model for spam detection, In Advances in Computing, Communications and Informatics (ICACCI), 2015 International Conference on, pages 17241727. IEEE, 2015.

Bin Liu, Ren Long, and Kuo-Chen Chou, idhs-el: identifying dnase i hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework, Bioinformatics, 32(16):24112418, 2016.

Bin Liu, Shanyi Wang, Qiwen Dong, Shumin Li, and Xuan Liu. Identification of dna-binding proteins by combining auto-cross covariance transformation and ensemble learning, IEEE transactions on nanobioscience, 15(4):328334, 2016.

Bin Liu, Shanyi Wang, Ren Long, and Kuo-Chen Chou, irspotel: identify recombination spots with an ensemble learning approach, Bioinformatics, 33(1):3541, 2016.

Bin Liu, Deyuan Zhang, Ruifeng Xu, Jinghao Xu, Xiaolong Wang, Qingcai Chen, Qiwen Dong, and Kuo-Chen Chou, Combining evolutionary information extracted from frequency profiles with sequencebased kernels for protein remote homology detection, Bioinformatics, 30(4):472479, 2013.

Corinna Cortes, Mehryar Mohri, and Afshin Rostamizadeh, Two-stage learning kernel algorithms, In ICML, pages 239246. Citeseer 2010.

Manik Varma and Bodla Rakesh Babu, More generality in efficient multiple kernel learning, In Proceedings of the 26th Annual International Conference on Machine Learning, pages 10651072. ACM, 2009.

Xi-Zhao Wang, Rana Aamir Raza Ashfaq, and Ai-Min Fu, Fuzziness based sample categorization for classifier performance improvement, Journal of Intelligent & Fuzzy Systems, 29(3):11851196, 2015.

Sonal Mishra, Yadunath Pathak, and Anamika Ahirwar, Classification of protein structure (rmsd6Ã…) using physicochemical properties, International Journal of Bio-Science and Bio-Technology, 7(6):141 150, 2015.

Prashant Singh Rana, Harish Sharma, Mahua Bhattacharya, and Anupam Shukla. Quality assessment of modeled protein structure using physicochemical properties, Journal of bioinformatics and computational biology, 13(02):1550005, 2015.

Lars Kai Hansen and Peter Salamon, Neural network ensemble, IEEE transactions on pattern analysis and machine intelligence, 12(10):993 1001, 1990.

Rishith Rayal, Divya Khanna, Jasminder Kaur Sandhu, Nishtha Hooda, and Prashant Singh Rana, N-semble: neural network based ensemble approach, International Journal of Machine Learning and Cybernetics, pages 19, 2017.

Er Amanpreet Kaur and Baljit Singh Khehra, Aproaches to prediction of protein structure: A review, International Research Journal of Engineering and Technology, 2017.

Sourceforge : http://bit.ly/RF-PCP-DataSets

CART website: CRAN.R-Project. http://goo.gl/ulWSI3

XgBoost website: CRAN.R-Project. http://goo.gl/ulWSI3

Kenneth V Price, Differential evolution: a fast and simple numerical optimizer, In Fuzzy Information Processing Society, 1996. NAFIPS, 1996 Biennial Conference of the North American, pages 524527. IEEE,1996.

Harish sharma Kavita Sharma, Nature Inspired Algorithms: An Introduction, Soft Computing Research Society, 2009.