 Open Access
 Total Downloads : 294
 Authors : Sibarama Panigrahi, Sreetoma Pandey, Ria Singh
 Paper ID : IJERTV2IS90958
 Volume & Issue : Volume 02, Issue 09 (September 2013)
 Published (First Online): 25092013
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
A Novel Evolutionary Higher Order Neural Network for pattern Classification
Sibarama Panigrahi 1, Sreetoma Pandey 2, Ria Singh 3
Department of Computer Science and Engineering, MITS, Rayagada, Odisha, India
Abstract
Over the last few decades pattern classification, pattern matching and mathematical function approximation were predominately performed using various neural networks (NNs). Compared to traditional NNs, higher order neural networks (HONNs) have several unique characteristics, including: 1) stronger approximation property; 2) faster convergence; 3) greater storage capacity; and 4) higher fault tolerance capability. In this paper, a novel evolutionary HONN especially the PiSigma network (PSN) is used for pattern classification. The PSN is trained using a strictly greedy chemical reaction optimization algorithm. In contrast to basic CRO algorithms, in this CRO algorithm the reactant size (population size) is remained fixed throughout all the iteration, which makes it easier to implement. The performance of proposed methodology for pattern classification is evaluated through three wellknown real world classification problems from UCI machine learning data library. The results obtained from the proposed method for classification is compared with results obtained by applying the two most popular variants of differential evolution algorithm (DE/rand/1/bin and DE/best/1/bin). It is observed that the proposed method provides better classification accuracy than that of other methods.

Introduction
Classification is the process of assigning an object into a predefined group or class of objects based on the number of attributes of that object. It is one of the most frequently encountered decision making tasks of human activity. Traditionally, statistical procedures such as discriminant analysis were widely used to perform pattern classification. However, the effectiveness of these methods depends to a large extent on the various assumptions or conditions under which the models are developed and thus, these procedures work well only when the underlying assumptions are satisfied. Users must have a good knowledge of both data properties and model capabilities before the models can be successfully applied. Considering the above pitfalls several classifiers using various data mining and computational intelligence methods have been developed.
Out of various types of classifiers neural network based classifiers are most popular and predominantly found in the literature [1]. Despite of advantageous features of HONN models over traditional NN models, only few papers were found in the literature for pattern classification using HONN models [24]. Therefore, in this paper the class of HONNs and in particular Pi Sigma Networks (PSNs) has been studied. The PSNs were introduced by Shin and Ghosh [4]. The PSNs have addressed several difficult tasks such as zeroing polynomials [5] and polynomial factorization [6] more effectively than traditional feedforward neural networks (FFNNs). Moreover, PSN employ less number of weights than other HONNs, but still manage to incorporate the capability of first order HONN indirectly. Therefore, in this paper PiSigma Network is considered for pattern classification.
The rest of this paper is organized as follows. Section2 briefly describes the background related to architecture and mathematical model of PSN; chemical reaction optimization; and differential evolution. The method used for classification using an evolutionary PSN is explained in Section3. Experimental results are presented in section4. And finally conclusion are described in Section5.

Preliminaries

PiSigma Network
PiSigma Network (PSN) is a special type of feed forward higher order neural network that calculates the product of sum of the input components and passes it to a nonlinear function. The network architecture of PSN is shown in Figure 1. It consists of a single hidden layer with summing units and an output layer of product units. The weights connecting the input and hidden layer are adapted during the training process, while those connecting the neurons of the hidden layer to the output layer are fixed to one and they are not trainable. Such a network topology with only one layer of trainable weights drastically reduces the training time [2, 7]. Moreover, the product units of PSN gives higher order capabilities which increase its computational power. This is because; the product units enable to expand the input space into higher dimensional space, thus easily separates nonlinearly separable classes to linear separable. Thus, PSN provides nonlinear
decision boundaries offering a better classification capability than the linear neuron (Guler and Sahin, 1994). In addition, Shin and Ghosh (1991) also
suggested that PSNs not only offers better classification
Algorithm 1 (CROHONNT)
Set the iterationcounter i=0
/*Randomly Initialize the ReacNum of Reactants from a uniform distribution [U;L]: Pi={R i, R i ,
over a broad class of problems but also requires less
R i., R
i}, with R i ={ W
1
i,.,W
2
i} for
k
k
3 ReacNum
j j,1
j,D
memory and need at least two orders of magnitude less number of computations as compared to MLP for similar performance level.
Consider a PSN with NOIN (number of input neurons), NOHN (number of hidden neurons) and one output neuron. The number of hidden neurons in the hidden layer defines the order of a PSN. For a NOHNth order PSN the number of trainable weights is NOIN Ã— NOHN considering each summing unit is associated with NOIN weights. The output of the PSN is computed by making product of the output of NOHN hidden units and passing it to a nonlinear function, which is defined as follows:
NOHN
Y ( hj )
j1
Where is a nonlinear transfer function and hj is the output of jth hidden unit which is computed by making sum of the products of each input (xi) with the
corresponding weight (wij) between ith input and jth
hidden unit. The output of hidden unit is computed as follows:
NOIN
j=1,2,3….. ReacNum, D=length of each Reactant (NOINÃ—NOHN), Wj, i=kth atom of jth reactant in ith iteration representing a weight of PSN.
for j=1 to ReacNum
Calculate the enthalpy e(Rj)
end of for
While (termination criteria is not satisfied) do
begin
for j=1 to ReacNum
// perform all reaction over the reactants of Pi Get rand1 randomly in an interval [0, 1]
if rand1 0.5
Get rand2 randomly in an interval [0, 1]
if rand2 0.5 Decomposition (Rj);
else
Redox1(Rj)
end of if else
Get rand3 randomly in an interval [0, 1]
if rand3 0.33
Select the best reactant Rk(Rk Rj) Synthesis (Rj, Rk)
hj
(wij xi )
i1
else if rand3 0.66
Select another reactant Rk (Rk Rj) randomly Displacement(Rj, Rk);
else
Select another reactant Rk(Rk Rj) randomly Redox2(Rj, Rk)
end of if end of if
Apply strictly greedy Reversible Reaction for increased enthalpy to update reactants
end of for
Figure 1: Architecture of a Typical PiSigma Network


Proposed Method
The efficiency of any supervised neural network depends on the algorithm used for its training. The objective of any supervised PiSigma network training is to minimize the error between the approximation by the HONN and the target output. For this the optimal weight set o a HONN must be obtained. The optimal weight set of a HONN can be obtained by using either gradient or evolutionary learning algorithms. The objective function of HONN training is going to be a multimodal search problem, since it depends on number of parameters.
Set the iteration counter i=i+1
end of while
Use the reactant having best enthalpy as the optimal weight set of PSN and perform classification.
Therefore, the gradient based training algorithms often suffer from several shortcomings, including: 1) easily getting trapped to local minima; 2) have slow convergence properties; 3) training performance is sensitive to initial values of its parameters. Due to these disadvantages, research on different optimization techniques that are dedicated to HONN training is still required. There are many optimization techniques such as differential evolution (DE) [8], genetic algorithm (GA) [9], particle swarm optimization (PSO) [10], ant
colony optimization (ACO) [11], a bee colony optimization (BCO) [12], an evolutionary strategy (ES) [13], quantum inspired algorithms (QEA) [14], chemical reaction optimization (CRO) [1517] etc. can be used for HONN training. Chemical reaction optimization (CRO) is a new optimization technique, inspired by the nature of chemical reactions. CRO has demonstrated excellent performance in solving many engineering problems such as the mining classification rules [18], quadratic assignment problem [15] etc. Therefore, in this paper a novel strictly greedy CRO algorithm has been used to train PSN and further used for classification. The algorithm is described as follows.
The proposed training algorithm operates in three phases: initialization phase, iteration phase and final phase. The initial phase assigns the value to initial parameters like termination criterion, length of reactants/molecules, ReacNum and generates initial reactants. The iteration phase simulates the reaction processes. The reactions may be monomolecular or bimolecular. For monomolecular reaction, Decomposition and Redox1 reactions are considered; and for bimolecular reactions three types of reactions such as: Synthesis, Displacement and Redox2 are considered. The reaction types are chosen considering both intensification and diversification. Moreover, a strictly greedy reversible reaction is used to update the reactants. All the reactions have been elaborated in the following subsequent subsections. In final phase the reactant having best enthalpy is used as the optimal solution (i.e. optimal weight set of a PSN). The pseudo code of the proposed method is explained in Algorithm 1.

Reactant Encoding
A set of real numbers are used to represent one reactant, with each reactant corresponding to a weight set of the PSN. The length of a reactant depends on the number of input and hidden neurons of the PSN (i.e. NOINÃ—NOHN).

Enthalpy of A Reactant
Each reactant is associated with some enthalpy. As each reactant represents a weight set of the PSN, the mean square error (MSE) on the train set is considered as enthalpy. The lower the value of enthalpy the better the reactant is. The MSE is defined as follows:
NOP (Y T )2

Chemical Reactions

Monomolecular reactions
In monomolecular reactions only one reactant takes part in the reaction and one product is produced by modifying one atom of the reactant. These reactions assist in intensification of the solution by making local search. In our algorithm monomolecular reactions are performed with a probability of 50%, there by glorifying the chances to obtain a better solution around the current solution. Two monomolecular reactions are considered such as: Decomposition and Redox1.

Decomposition Reaction
In this reaction a randomly selected atom of the reactant takes a value from the range [U, L]. Consider a
reactant Rj={Wj,1,Wj,2.,Wj,D} with Wj,x (x[1,n])
be an atom of the reactantj. The pseudocode of the decomposition reaction is described in Algorithm2.
Algorithm 2 (Decomposition(Rj))
Input: A reactant Rj Duplicate Rj to produce R1
Select an atom x (x [1, n]) randomly.
W1,x=L+ rand() Ã— (UL)
Where rand() is a random number generated randomly from a range [0, 1].
Output: A new reactant R1

Redox1 Reaction
It is similar to decomposition reaction except that the rate of reaction()used in this algorithm is obtained in a self adaptive manner. The pseudocode is described in Algorithm3.
Algorithm 3 (Redox1(Rj))
Input: A reactant Rj Duplicate Rj to produce R1
Select a point x (x [1:n]) randomly
W1,x =L+ t Ã— (UL)
Where t+1= 4 t (1 t) With 0[0,1]{0, 0.25, 0.5,
0.75, 1.0}
Output: A new reactant R1


Bimolecular reactions
Here two reactants Rj={Wj,1,.,Wj,D} and Rk={Wk,1,Wk,2 .,Wk,D} will take part in the reaction. These reactions help in diversification of the solution by generating a new solution that is significantly different from the current solution. These
MSE
i1
i i
NOP
reactions occur with a probability of 50%. Following types of bimolecular reactions are used.
Where Yi and Ti are the output of PSN and target for ith
train pattern.

Synthesis Reaction
In this reaction one reactant is produced due to reaction between a reactant and the best reactant of the iteration.
Algorithm 4 (Synthesis (Rj,Rk))
Input: Two reactants Rj, Rk R1=RJ+ Ã— (RkRJ)
Where is a random number between [0.25;1.25]
Output: A new reactant R1

Displacement Reaction
Two solutions R1 and R2 are obtained from reaction between two reactants Rj and Rk. Algorithm 5 (Displacement (Rj, Rk))
Input: Two reactants Rj, Rk R1=tÃ—Rj+t Ã— (1 Rk) R2=tÃ—Rk+t Ã— (1Rj)
Where t is initialized to a random number [0,1]and is updated in the following manner every time this reaction reoccurs (t=number of time the reaction
Algorithm 7 (Strictly greedy Reversible Reaction ())
For Monomolecular Reactions
Let Rj under goes monomolecular reaction to produce R1
If enthalpy(R1)<enthalpy(worst(R)))
Replace Rworst by R1
end of if
For Bimolecular Reactions
If Rj and Rk under goes reaction to produce R1 and R2
If enthalpy (R1) < enthalpy(Rworst)
Replace Rworst by R1
end of if
end of if




Experimental Setup and Simulation Results
The simulations in this paper were carried out on a system with Intel Â® core(TM) 2Duo E7500 CPU, 2.93 GHz with 2GB RAM and implemented using
occurs).
t+1=2.3(t)
2sin( t)
SCILAB5.4.1. All ANNs are trained using proposed CRO, DE/rand/1/bin and DE/best/1/bin with population
Output: Two reactants R1 and R2

Redox2 Reaction
Algorithm 6 (Redox2 (Rj,Rk))
Input: Two reactants Rj, Rk R1=Rj+ Ã— (Rk Rj)
Where the rate of reaction =rand(0,1) random number between [0;1].
Output: A new reactant R1
size (reactant size) 50 and initial value of each chromosome (representing a ANN weightset) is initialized to uniform distributed random values drawn from a range [1, 1].
4.1. Performance Measure
Correct classification percentage is used as performance measure, which is computed as follows: Correct classification
NOP C
This reaction is similar to that of synthesis reaction, but here, the rate of reaction is obtained from a range of [0
Correct Classification(%)
i1 i
NOP
1].

Reactant update
Every monomolecular or bimolecular reaction is
Where NOP is number of test patterns (NOP/2); Ci the
coefficient representing the correctness of the classification of the ith testing pattern which is determined as follows:
followed by a strictly greed reversible reaction to update the reactants. In the strictly greedy reversible reaction, for a monomolecular reaction the product produced replaces the worst reactant of the reactant set
for better enthalpy; and for a bimolecular reaction if
1,
C
C
i 1,
0,
when Yi 1and Ti 1 when Yi 1and Ti 1 Otherwise
two products are produced, best product replace the worst reactant for better enthalpy. Thus the number of reactants of the population remains same throughout the reaction process. The pseudocode of the strictly greedy reversible reaction is elaborated in algorithm 7.
Where Yi and Ti are the output of PSN and target for ith test pattern.
4.1. Data Sets and Simulation Results
For experimental analysis three binary classification problems (Sonar, Breast Cancer Wisconsin, Habermans Survival) are considered from the well known UCI machine learning data library.
For the Sonar problem the task is to train a PSN to distinguish between sonar signals bounced off a metal cylinder (mine) and those bounced off a roughly
cylindrical rock. In this experiment the dataset contains 208 samples obtained by bouncing sonar signals off a metal cylinder and a rock at various angles and under various conditions. There exist 111 samples obtained from mines and 97 samples obtained from rocks. Each pattern consists of 60 real numbers in the range [0.0, 1.0]. Each number represents the energy within a particular frequency band, integrated over a certain period of time. The trained PSNs have one unit in the middle layer with 6011 architecture. For comparative performance analysis, the results obtained from 100 independent simulations using the proposed method (using proposed strictly greedy CRO algorithm), DE/rand/1/bin and DE/best/1/bin methods are shown in Table 1 where mean, St.Dev, Min and Max represents the mean, standard deviation, minimum and maximum classification accuracy of the 100 independent simulations.
Table 1 Classification Accuracy (%) for Sonar problem.
Method
Mean
St.Dev.
Min
Max
Proposed Method
77.88
3.64
56.67
84.13
DE/rand
73.81
4.24
52.88
81.73
DE/best
73.35
4.34
53.36
80.77
For the breast cancer wisconsin problem the task is to train a PSN to discriminate between benign and malignant based on ten attributes. In the original dataset 2 represents benign (367 patterns) and 4 represents malignant (332 patterns) which are converted to 1 and 1 respectively for our simulation. 100 independent simulations were carried out and the mean, standard deviation, min and max values of the obtained results are shown in Table 2. Note that PSNs with 1011 architectures are trained with 50% of total samples and tested with other 50%.
Table 2 Classification Accuracy (%) for breast cancer wisconsin problem.
Method
Mean
St.Dev.
Min
Max
Proposed Method
75.15
6.75
52.44
85.67
DE/rand
71.46
8.06
52.15
83.38
DE/best
73.59
7.94
49.86
84.53
Havermans survival dataset contains cases from study conducted on the survival of patients who had undergone surgery for breast cancer. For the Habermans survival problem the task is to train a PSN to predict whether the patient will survive more than 5 years or not based on three attributes. The dataset contains 306 patterns with three attributes each and a
class level 1 for less than 5 years and 1 for more than five years of survival. 100 independent simulations were carried out and the mean, standard deviation, min and max values of the obtained results are shown in Table 3. Note that PSNs with 311 architectures are trained with 50% of total samples and tested with other 50%.
Table 3 Classification Accuracy (%) for Havermans survival problem.
Method
Mean
St.Dev.
Min
Max
Proposed Method
70.66
4.51
43.14
73.20
DE/rand
68.99
9.71
30.72
72.55
DE/best
68.25
9.36
28.76
73.20
It can be observed from the results that the proposed methodology provides better classification accuracy for the three problems considered.


Conclusion
In this paper, we have studied evolutionary HONN models especially; the PiSigma network for pattern classification. A novel strictly greedy chemical reaction optimization algorithm is developed for its training. For comparative performance analysis of the proposed method three real world binary classification problems are considered. It is found that the strictly greedy CRO algorithm along with PSN provides better classification accuracy than the two most popular variants of differential evolution algorithm i.e. DE/rand/1/bin and DE/best/1/bin. In future more efficient training algorithm for PSN can be developed to improve the classification accuracy.

References

G. P. Zhang, Neural Networks for Classification: A Survey, IEEE Transaction on Systems, Man, and Cybernetics Part C: Applications and Reviews, vol. 30, no. 3, pp. 451462, 2000.

Y. Shin and J. Ghosh, Efficient higherorder neural networks for classification and function approximation, In: International Journal on Neural Systems, vol. 3, pp.323350, 1992.

M. G. Epitropakis, V. P. Plagianakos, M. N. Vrahatis, Hardwarefriendly HigherOrder Neural Network Training using Distributed Evolutionary Algorithms, Applied Soft Computing, vol. 10, pp. 398408, 2010.

Y. Shin, J. Ghosh, The pisigma network: An efficient higherorder neural network for pattern classification and function approximation, International Joint Conference on Neural Networks, 1991.

D. S. Huang, H. H. S. Ip, K. C. K. Law and Z. Chi, Zeroing polynomials using modified constrained neural network approach, IEEE Transactions on Neural Networks, vol. 16, no. 3, pp. 721732, 2005.

S. Perantonis, N. Ampazis, S. Varoufakis and G. Antoniou, Constrained learning in neural networks: Application to stable factorization of 2d polynomials, Neural Processing Letter, vol.7, no. 1, pp. 514, 1998.

Y. Shin and J. Ghosh, Realization of Boolean functions using binary pisigma networks, in: C. H. Dagli, S. R. T. Kumara, Y. C. Shin (Eds.), Intelligent Engineering Systems through Artificial Neural Networks, ASME Press, pp. 205 210, 1991.

R. Storn and K.Price, Differential evolution A simple and efficient heuristic for global optimization over continuous spaces, Journal of Global Optimization, vol. 11, no.4, pp. 341359, 1997.

D. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA:Addison Wesley, 1989.

J. Kennedy, R. C.Eberhart and Y.Shi, Swarm intelligence, San Francisco, CA:Morgan Kaufmann, 2001.

K. Socha and M. Doringo, Ant colony optimization for continuous domains, Europian Journal of Operation Research, vol. 185, no. 3, pp. 11551173, 2008.

D. T. Pham, A. Ghanbarzadeh, E. Koc, S. Otri, S. Rahim and M. Zaidi The bees algorithm A novel tool for complex optimization problems, in IPROMS Oxford, U.K.: Elsevier, 2006.

H.G. Beyer and H.P. Schwefel, Evolutionary Strategies: A Comprehensive introduction, Nat. Comput., vol. 1, no. 1, pp. 352, 2002.

K. H. Han and J.H. Kim, Quantuminspired evolutionary algorithm for a class of combinatorial optimization, IEEE Transactions on Evolutionary Computation, vol. 6, pp. 580593,2002.

A. Y. S. Lam and V. O. K. Li, ChemicalReaction inspired metaheuristic for optimization, IEEE Transactionson on Evolutionary Computation, vol. 14, no.3, pp. 381399, 2010.

A.Y.S. Lam, RealCoded Chemical Reaction Optimization, IEEE Transaction on Evolutionary Computation, vol. 16, no. 3,pp. 339353, 2012.

B. Alatas, ACROA: Articial Chemical Reaction Optimization Algorithm for global optimization, Expert Systems with Applications, vol. 38, pp. 1317013180, 2011.

B. Altas, A novel chemistry based metaheuristic optimization method for mining of classification rules, Expert Systems with Applications, vol. 39, pp. 1108011088, 2012.

T. K. Truong, K. Li and Y. Xu, Chemical reaction optimization with greedy strategy for the 01 knapsack problem, Applied Soft Computing, vol. 13, pp. 17741880, 2013.

J.J.Q. Yu, A.Y.S. Lam and V.O.K. Li, Evolutionary Artificial Neural Network based on chemical reaction optimization, in: IEEE Congress on Evolutionary Computation (CEC), pp. 20832090, 2011.