Hybridized Metaheuristic Algorithmfor Feature Selection Guided with Naive Bayes Classifier

Download Full-Text PDF Cite this Publication

Text Only Version

Hybridized Metaheuristic Algorithmfor Feature Selection Guided with Naive Bayes Classifier

Pallavi1, Smt. T Jayakumari2

1M.Tech Dept of Computer Science and Engineering, BTL Institute of Technology, Bangalore, India

2Assistant professor Dept of Computer Science and Engineering, BTL Institute of Technology, Bangalore, India

Abstract-Feature selection is a problem of discovering efficient features among all the features in which the final feature set can improve the accuracy and reduce complexity. This is very essential due to rapid escalation in amount of data and information for every 20 months. In some cases, too many redundant or irrelevant features may defeat main features for classification. Feature selection can remedy this problem and therefore improve the prediction accuracy and reduce the computational overhead of classification algorithms. This feature selection methods can be used in many fields like pattern recognition, machine learning, signal processing. Irrelevant features do not contribute to the predictive accuracy, and redundant features do not contribute to getting a better predictor for that they provide most information which is already present in other feature(s). Many feature selection methods have been proposed and studied for machine learning applications. The proposed hybridised metaheuristic algorithm for feature selection guided with Naïve Bayes classifier will select minimum number of relevant features in order to maintain the classification accuracy. This feature selection method is compared against other two algorithms such as Exhaustive Search and Genetic Search. This work focuses on three perspectives: Number of features, classification accuracy and generalization with multiclass dataset. Results shows that BANB outperforms against other two feature selection algorithms in selecting lower number of features by removing irrelevant, redundant, or noisy features to maintain highest classification accuracy.

Keywords: No. of Bats, Fitness function, attribute evaluator, local search, feature selection.

GA-Genetic Algorithm, BANB- Bat Algorithm with Naïve Bayes


    The importance of data mining technique is to mine the information from massive data set and make over it into a reasonable form for supplementary purpose. Classification is a significant task in data analysis and data mining applications. The enthusiasms to perform feature selection in classification experiments are two- fold. First is to enrich the classifier performance by selecting only useful features and removing irrelevant, redundant, or noisy features. Second is to reduce the number of features when classification algorithm could not scale up to the size of feature set either in space or time. In general, there are two essential steps in feature selection: (i) searching for desired subset using some search strategies. (ii) evaluating the subset produced.

    Search strategy may be exhaustive or approximate. Exhaustive search strategy evaluates all probabilities of the feature subset, while approximate search strategy generates only high quality solutions with no guarantee of finding a global optimal solution [1].

    Branch and bound method [2] is one of the most prominent algorithm in exhaustive search. Exhaustive search guarantees optimal solution but this method is not practical for even a medium-sized dataset in finding the optimal subset of features, which is an NP-hard problem [3]. For N number of features, the number of possible solutions will be exponential to 2n. Hence research and efforts have been shifted to metaheuristic algorithms, which are considered as a subclass of approximate methods. The fiction has shown that the metaheuristic algorithms are proficient in handling large size problem instances with acceptable solutions within a judicious time [47]. Hybridized metaheuristic is used for feature selection and Naïve Bayes classifier is used for classification accuracy with multi class in this work.

    The major challenge in feature selection is to select the minimal set of features with modicum or no loss of classification accuracy. While literature has shown abundant developments in achieving this [8 – 10]. The basis of comparison is rather limited in terms of number of features, classification accuracy, and generalization in isolation. The importance of generalization is to investigate their effect on the performance of different classifiers.

    The Objectives of this work is as follows: first is to design a Hybridized Metaheuristic Algorithm (BA)for feature selection which exploits Naïve Bayes classifier to escort algorithm with highest classification accuracy, second to evaluate the performance of the proposed hybrid algorithm against other well-known feature selection algorithms in terms of feature selection and third in terms of generalization with multi class dataset, compare the accuracy of all three algorithms with Naïve Bayes classifiers using Weka Tool.


    The applications of metaheuristic algorithms has shown high effectiveness as well as efficiency to solve complex and large problems in feature selection.

    Generally, there are two categories of metaheuristic algorithms: (i) single solution-based metaheuristic

    algorithm (SBM) which manipulate and transform single solution during the search.(ii) population-based metaheuristic algorithm (PBM) in which whole population of solution is evolved.

    Hill Climbing (HC) algorithm [1,11] is one of the simplest and oldest SBM method used in feature selection. In order to improve the quality of solution This algorithm starts with a random initial solution and swaps the current solution with a neighbouring solution in the following iteration. When all the neighbouring candidate subsets are poorer than the current solution then searching will stop, which means that the algorithm will be most probably trapped in local optimum. Simulated Annealing (SA) is proposed [10]In order to overcome this problem. SA accepts the worse moves that commensurate to the parameter determined at the initial stage, called the temperature, which is inversely proportional to the change of the fitness function. Modified SA algorithm called the Great Deluge Algorithm (GDA) is proposed [12] in more recent work to provide a deterministic acceptance function of the neighbouring solutions. Tabu Search (TS) also accepts non improving solutions to escape from local optima. TS stores in formation related to the search process, which is a list of all previous solutions or moves in what is termed as Tabu list [13, 14]. Nonetheless, Hill Climbing and Simulated Annealing suffer from two major disadvantages. First, they often converge towards local optima and second they can be very sensitive to the initial solution [1].

    Different from SBM, PBM signifies an iterative improvement in a population of solutions that works as follows. Firstly, the population is initialized then a new population of solution is generated. Next, the new population is integrated into the existing one by using some selection procedures. When certain criterion is satisfied the search process will terminate. Genetic Algorithm (GA) [5, 15, 16] is one of the most prominent and oldest population-based solution used in features selection. Crossover and mutation operations are the major roles in GA which used to couple solutions and to arbitrarily adjust the individual content, to boost diversity aiming to decrease the risk of sticking in local optima.

    Ant Colony Optimization (ACO) is another PBM algorithm, which takes form as a multi agent system, building unit of this system represents virtua ants as inspired by the behaviour of real ants. In nature, a chemical trace called pheromone is left on the ground and is used to guide a group of ants heading for the target point since ants are distinguished to see very well [6, 17, 18]. Particle Swarm Optimization (PSO) is another nature-inspired algorithm which simulates the social behaviour of natural creatures such as bird flock ing and fish schooling to discover a place with adequate food [7,19]. Scatter Search (SS) is another PBM method that recombines solution selected from a reference set to generate other solutions by building an initial population satisfactory to the criteria of quality and diversity [20].

    Once feature subset have been searched, each candidate from the resulting subset generated needs to be evaluated based on some predetermined assessment criteria.

    Embedded, Wrapper, Filter, and Hybrid methods are four methods of feature subset evaluations depending on how the searching strategy is being associated with the classification model. Embedded methods incorporate feature selection as a part of the training process and are usually specific to given learning algorithms, and therefore may be more efficient than the other three categories. Wrapper methods use a predictive model to score feature subsets. Each new subset is used to train a model, which is tested on a hold-out set. Filter methods use a proxy measure instead of the error rate to score a feature subset. This measure is chosen to be fast to compute, whilst still capturing the usefulness of the feature set. Filters are usually less computationally intensive than wrappers, but they produce a feature set which is not tuned to a specific type of predictive model. Hybrid methods are a combination of filter and wrapper methods by using a filter method to reduce search space that will be considered by the subsequent wrapper. Thus, we will focus on the Hybrid method in this paper.


    Naïve Bayes Classifier (NB): Naïve Bayes classifier is an attribute evaluator used to classify the features based on the conditional probability. NB is considered as simple classifier based on Bayes Theorem. The Bayesian algorithm is trademarked as Naive because it is founded on Bayes Rule, which has a strong hypothesis that the features are conditionally independent from each other with respect to the class [21]. This classifier belongs to wrapper method. NB has proven its effectiveness in various domains such as text classification, improving search engine quality, image processing, fault prediction, and medical diagnoses.

    Naive Bayes classifier works as follows: let U be a vector of random variables denoting the observed attribute values in the training set U = [u1,u2,u3 . . .,un ] to certain class label C in the training set. The probability of each class given the vector of observed values for the predictive attributes can be computed using the following formula 1:

    P(U) (U | V)

    P(V| U) = —————————-, j=1, (1)

    Where P(U) is the prior probability of class Uj, P(U/Vj) is the posterior probability of U conditioned on Vj,P(Vi) is the prior probability of class Vi, P(U/Vi) is the posterior probability of U conditioned on Viand P(Vj|U) is the class conditional probability density function. Basically put, the conditional independence assumption assumes that each variable in the dataset is conditionally independent of the other. This is simple to compute for test cases and to estimate from training data as follows in formula (2):


    P(U/Vj)= P(Ui/ Vj ) j=1,2,3C (2) i=1

    where Ui is the value of the ith attribute in U and n is the number of attributes. Vj is the value ofjth attribute in V. The probability distribution over the set of features is calculated using the following formula (3):


    P(u) = P(Ci) P(U/Ci) (3)


    where k be the number of classes, and Ci is the ith class. Following are the several characteristics of Naïve Bayes classifier: (i) Computational efficiency is high compared to other wrapper methods because it is inexpensive since it is considered linear time complexity classifier. (ii)Noise, missing values can be handled in the dataset with high capability. (iii) Low variance with less searching.

    Bat Algorithm (BA): Bat algorithm is based on the echolocation behavior of microbats with varying pulse rates of emission and loudness.BA was first presented in

    [22] and it was found to outperform Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) in evaluation using benchmark functions. BA has also been successfully applied to tough optimization problems such as motor wheel optimization problems, clustering problems, global engineering optimizations, and constrained optimization tasks.

    The idealization behind echolocation of microbats can be summarized as follows: Each virtual bat flies randomly with a velocity Viat position (solution) xiwith a varying frequency [fmin, fmax] or wavelength [min, max]and loudness Ai, pulse rate riLoudness can be different in many ways, the loudness differs from a large (positive) A0 to a minimum constant value Amin.As it searches and finds its prey. Once prey is found it changes frequency, loudness and pulse emission ratewhere ri [0, 1]. Search is intensified by a local random walk. Selection of the best bat continues until certain stopping conditions are met. This essentially uses a frequency-tuning technique to control the dynamic behavior of a swarm of bats, and the balance between exploration and exploitation can be controlled by tuning algorithm-dependent parameters in bat algorithm.

    where Vt is the velocity of ith bat,Vt-1 is the previous velocity of iht bat, (x* -xt ) is the difference between the length of global best bat and length of the ithbat where length is the number of selected features, fi is the frequency. In (x* -xt ), if the difference is positive the global best bat has more features than those of ith bat, with reference to formula (5) when the result is summed with previous velocity, it will accelerate the ith bat towards the global best bat. If the difference is negative, then the ith bat has more features than that of the global best bat. Therefore, with reference to formula (5) when the result is summed with previous velocity, it will decrease the velocity of ith bat and help to attract towards the global best bat. In proposed algorithm maximum velocity (Vmax) will set to (1/3) * N, where N is the number of features.





    i i

    i i

    Bat Position (x):













    Each bat position is encoded as binary string with length N, where N is the total number of features. Each bit encodes a single feature. As shown in Fig (I), bit 1 implies the corresponding feature is selected and 0 implies corresponding feature is not selected.

    Feature selected Feature not selected

    Fig (I): Bat position encoding as a binary bit string

    The positions are categorized into two groups according to bit difference between ith bat and global best bat in order to align exploration and exploitation during searching. From formula (6), the bats position is adjusted depending on one of the following condition:

    1. Velocity of ith bat is lower or equal to the number of different bits, then ith bat will copy some of the features from global best bat, thus moving towards global best bat, while still exploring new sarch space.

    2. Velocity of the ith bat is higher than the velocity of global best bat, then ith bat will import all the features from global best bat to be same as the global best bat with a few different bits to facilitate further exploitation.

    A. Parameter Description:

    In this work, the balance between exploration and

    xt = xt-1 + vt

    i i i

    i i i

    t ht


    t-1 ht

    exploitation can be controlled by tuning algorithm- dependent parameters. Following are the parameters used in Hybrid Bat Algorithm:

    Frequency (f):

    Frequency is represented as real number, defined in formula (4). The maximum and minimum frequency can be chosen based on the application domain, [0 , 1] is a random number.

    fi = fmin + (fmax fmin) (4)

    Velocity (V):

    Velocity suggests the number of bat attributes that should change at a certain amount of time defined in formula (5) and represented by positive integer number. The bats communicate with each other through the global best solution and move towards the global best solution.

    where x i is the i bat position, x i is the previous i bat position, vt is the velocity of the ith bat, t is the time.



    Loudness (Ai):

    Loudness is the change in number of features at certain time during local search around the global best bat, as well as local search around the ith bat. The formula for the loudness is shown in formula (7).

    xnew = xold + At (7)

    where xnew is the position of new bat, xold is the position of old bat (previous local search position of same bat), At is the average loudness of all the bats at certain iteration, [1,1] is a random number. The value for sound loudness (A) ranges between the maximum and minimum loudness.

    When bat starts approaching the best solution the loudness value will decrease. The following formula (8)

    Vt = Vt-1 + (x – xt )f (5)



    i i * i i

    shows that the amount of decrease is determined by : At+1i = At (8)

    where At+1iis the loudness of new bat solution, is A. Flow Diagram:

    Input the Dataset with Multi Class

    Input the Dataset with Multi Class



    constant value, At is the loudness of old bat solution. The

    Initialize parameters: A, Amin, Amax, r, fmin, fmax Pmax, Imax, , , , Vmax,Vmin,,,,

    Initialize parameters: A, Amin, Amax, r, fmin, fmax Pmax, Imax, , , , Vmax,Vmin,,,,

    Generate Pmax number of bats

    Generate Pmax number of bats

    maximum and minimum loudness value is determined based on domain of application and size of the dataset. In proposed algorithm, the maximum loudness has been determined empirically as (1/5) * N, where N is the number of features in certain dataset.

    Pulse Emission (ri):

    Evaluate fitness function for all the bats and generate the current best bat (x*)

    Evaluate fitness function for all the bats and generate the current best bat (x*)

    The role of ri is to decide whether local search around the global best bat solution should be skipped or otherwise. If pulse rate is high then the amount of conducting local search will be reduced. Therefore when the bat approaches best solution, pulse rate value will increase

    and subsequently reduce the chances to conduct a local No

    search around the global best. The following formula (9)

    shows that the amount of increase is determined by .

    rt+1i = r0

    rt+1i = r0

    i [1 exp(t)] (9)



    where rt+1i is the pulse rate of new bat solution , r0 is the pulse rate of previous bat solution, t is the time, is the

    constants value. No

    Fitness Function (Fi):

    Compute frequency, new velocity, and new position of bat

    Compute frequency, new velocity, and new position of bat

    Random local walk around global best bat by calculating loudness in searching local solution

    Random local walk around global best bat by calculating loudness in searching local solution

    Fitness function as shown in formula (10) can be used by each candidate solution, where P (Vj /U)is the classification accuracy from Naïve Bayes classifier (1), TF is the total number of all features and SF is the selected features from the dataset. [0, 1] and = 1-. Classification accuracy is having more importance because it gives more weight than size of the subset.

    Fi = . P (Vj /U) + . (10)


    While stop conYditeiosn not met


    If (Population size Pmax)



    If (Rand > ri)


    The irrelevant features, along with redundant features, severely affect the accuracy of the learning machines. Thus, feature subset selection should be able to identify and remove as much of the irrelevant and redundant information as possible. Keeping these in mind, in future this work develops a Hybrid Bat algorithm with Naïve Bayes classifier to deal with both irrelevant and redundant features in the dataset, and obtain a good feature subset with more classification accuracy compared to other feature selection algorithms as well as with other classifiers.


    Calculate the loudness of all the bats

    Calculate the fitness function with new Fi,new solution

    If (Rand < Ai) No

    && No

    Accept new solution, update ri and Ai

    Accept new solution, update ri and Ai


    Find the current best solution (x*)

    Output with best solution

    Fig (II): Hybrid Metaheuristic algorithm with Naïve Bayes


    The results of proposed BANB is compared against other two feature selection algorithms such as Exhaustive

    Search and Genetic Search in terms of number of relevant features selected from the original dataset and classification accuracy obtained from Naïve Bayes classifier using Weka Tool.

    1. Dataset Description:

      This work explores an experimental study on Process Glass dataset with Multi Class (11 attributes, 214 instances) [23], taken from UCI repository for feature selection.

    2. Performance Evaluation:

    The proposed Hybridised Metaheuristic algorithm (BA)is implemented in Net beans IDE. With reference to the base paper[24] and Flow Diagram Fig (II), the experiment was employed with the population size (Pmax)of5, where decrease loudness and increase pulse rate were set to 0.6. The initial value of pulse rate set to 0.2. The value of set to 0.9, value set to 0.1. The fmax and fmin were set to 3 kHz,10 kHz respectively. The Amax and Amin value were set to 3, 1 respectively.

    Following Table I provides the comparison results.

    Table I

    Feature Selection Techniques

    Number of Attributes Selected

    Naïve Bayes Classification Accuracy




    Exhaustive Search



    Genetic Search



    From Table I, BANB selects minimal number of relevant features compared to other two algorithms with highest classification accuracy.


In this paper, a Hybridized Metaheuristic algorithm for Feature selection was presented with multi class dataset. The Hybrid Bat Algorithm employed with Naïve Bayes Classifier to intelligently select most efficient and relevant features that could maximize the classification accuracy while ignoring the irrelevant features and noisy features. From the experiments, this work concluded that the proposed Metaheuristic Bat Algorithm with Naive Bayes Classifier outperformed other two algorithms with selection of minimal number of relevant features and highest classification accuracy. For the future work, we plan to explore with very high-dimensional datasets and study some foral properties of feature space.


  1. E.G.Talbi,Metaheuristics:FromDesigntoImplementation,John WileyandSons,2009.

  2. B. Yu and B. Yuan, A more efficient branch and boundalgorithmforfeatureselection,PatternRecognition,vol.26,no. 6,pp.883889,1993.

  3. E.AmaldiandV.Kann,Ontheapproximabilityofminimizing nonzero variables or unsatisfied relations in linear systems, Theoretical Computer Science, vol. 209, no. 1-2, pp. 237260, 1998.

  4. R. Jensen and Q. Shen, Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches, IEEE

    Transactions on Knowledge and Data Engineering, vol. 16, no. 12,pp.14571471,2004.

  5. J.YangandV.Honavar,Featuresubsetselectionusinggeneticalgorithm

    ,IEEEIntelligentSystemsandTheirApplications,vol. 13,no.2,pp.44 48,1998.

  6. L.Ke,Z.Feng,andZ.Ren,Anefficientantcolonyoptimization approach to attribute reduction in rough set theory,PatternRecognitionLetters,vol.29,no.9,pp.13511357,2008.

  7. X.Wang,J.Yang,X.Teng,W.Xia,andR.Jensen,Featureselection based on rough sets and particle swarm optimization,PatternRecognitionLetters,vol.28,no.4,pp.459 471,2007.

  8. M. Anbarasi, E. Anupriya, and N. Iyengar, Enhancedpredictionofheartdiseasewithfeaturesubsetselectionusingg enetic algorithm, International Journal of Engineering Science and Technology,vol.2,pp.53705376,2010.

  9. H. Vafaie and I. F. Imam, Feature selection methods: genetic algorithms vs greedy-like search, in Proceedings of the International Conference on Fuzzy and Intelligent Control Systems Louisville,1994.

  10. J.C.W.DebuseandV.J.RaywardSmith,Featuresubsetselectionwithina simulatedannealingdataminingalgorithm,JournalofIntelligentInform ationSystems,vol.9,no.1,pp.5781,1997.

  11. R.CaruanaandD.Freitag,Greedyattributeselection,inProceedings of the 11th International Conference on Machine Learning,pp.2836.

  12. N.S.JaddiandS.Abdullah,Nonlineargreatdelugealgorithmforroughse tattributereduction,JournalofInformationScience andEngineering,vol.29,pp.4962,2013.

  13. M. A. Tahir, A. Bouridane, and F. Kurugollu, Simultaneous feature selection and feature weighting using Hybrid Tabu Search/K-nearest neighbor classifier, Pattern Recognition Letters,vol.28,no.4,pp.438446,2007.

  14. A.R.Hedar,J.Wang,andM.Fukushima,Tabusearchforattribute reduction in rough set theory, Soft Computing, vol. 12, no.9,pp.909918,2008.

  15. A.JainandD.Zongker,Featureselection:evaluation,application,ands mallsampleperformance,IEEETransactionsonPatternAnalysisandM achineIntelligence,vol.19,no.2,pp.153158, 1997.

  16. R. Leardi, Application of a genetic algorithm to feature selection under full validation conditions and to outlier detection, JournalofChemometrics,vol.8,pp.6579,1994.

  17. R. Jensen and Q. Shen, Finding rough set reducts with Ant colony optimization, in Proceedings of the UK Workshop on ComputationalIntelligence,pp.1522,2003.

  18. R.K.SivagaminathanandS.Ramakrishnan,Ahybridapproachforfeatur esubsetselectionusingneuralnetworksandantcolonyoptimization,Ex pertSystemswithApplications,vol.33,no. 1,pp.4960,2007.

  19. E.G.Talbi,L.Jourdan,J.GarcaNieto,andE.Alba,Comparison of population based metaheuristics for feature selection: application to microarray data classification, inProceedingsofthe6thIEEE/ACSInternationalConferenceonComput erSystemsandApplications(AICCSA08),pp.4552,April2008.

  20. J.Wang,A.R.Hedar,G.Zheng,andS.Wang,Scattersearchforroughseta ttributereduction,inProceedingsoftheInternational Joint Conference on Computational Sciences and Optimization (CSO09),pp.531 535,chn,April2009.

  21. G.I.Webb,Na¨vebayes,inEncyclopediaofMachineLearning, C. Sammut and G. I. Webb, Eds., pp. 713714, Springer, New York,NY,USA,2010.

  22. X. S. Yang, A new metaheuristic bat-inspired algorithm, in Nature Inspired Cooperative Strategies For Optimization (NICSO10),J.Gonz alez,D.Pelta,C.Cruz,G.Terrazas,andN.rasnogor,Eds.,pp.65 74,Springer,Berlin,Germany,2010.

  23. . UCI Repository https:archive.ics.uci.edu/ml/.

  24. X. S. Yang, Naïve Bayes Guided Bat algorithm for featureselection.TheScientificWorldJournalVolume2013,ArticleID 325973,http://dx.doi.org/10.1155/2013/325973.

Leave a Reply

Your email address will not be published. Required fields are marked *