A Binary Grasshopper Optimization Algorithm for Feature Selection

Download Full-Text PDF Cite this Publication

Text Only Version

 

A Binary Grasshopper Optimization Algorithm for Feature Selection

Reyhaneh Yaghobzadeh1, Seyyed Reza Kamel *1, Mojtaba Asgari1, Hassan Saadatmand2
1. Department of Computer Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran.
2. Departmentof Electric, Mashhad Branch, Mashhad University, Mashhad, Iran.

Abstract:

Introduction: The aim of the feature selection is to find the most important information from a set of given features. It has been used widely in recent years. The set of features primarily has associated features that influence classification performance and increase training time. So, the feature selection is necessary to remove inappropriate features and improve classification performance. In this paper, a nature-inspired feature selection method is proposed based on the behavior of grasshoppers. Grasshopper optimization algorithm is expressed for continuous problems as standard.

Methods: In this paper, continuous grasshopper optimization algorithm is converted into binary by using a nonlinear mapping function. Also, the binary grasshopper optimization algorithm will be converted into a powerful algorithm by combining and retaining exploration and exploitation.

Result: the dataset taken from the UCI repository is used. By comparing the proposed method with GA, Binary Bat Algorithm (BBA), we indicate that it has had better performance than previous methods in terms of accuracy, reduction rate, precision, and computational time.

Conclusion: In this paper, the feature selection problem is presented by the binary grasshopper optimization algorithm. The results are compared at different algorithms. It was concluded that the binary grasshopper optimization algorithm is the best method for feature selection

Keywords: Feature Selection, Binary Grasshopper Optimization Algorithm, Classification.

  1. INTRODUCTIONThe main purpose of feature selection is to understand the basic process that creates the data to select a subset of the relevant features of the available items[1]. High-dimensional data offer serious challenges for existing learning practices [2]. To reduce the dimension, feature selection uses a wide technique. It selects a small subset of the relevant features from the real set, which usually improves the educational performance of the model. Using the feature selection technique, the cost of learning computing could also be reduced [2]. To date, many feature selection techniques have used meta heuristic algorithms to search for optimal subset such as BBA [3], binary dragonfly algorithm [4], and so on. Meta heuristic algorithms have two concepts of exploration and exploitation, by preserving and combining these two concepts, the algorithm can be converted into a powerful one. Unfortunately, in other algorithms there is no proper interaction between these two concepts. For example, the genetic algorithm combines these two concepts but failed to establish a proper interaction between the two. Given the features such as easy implementation and quick synchronization, grasshopper optimization algorithm has attracted more attention. In this paper, the feature selection is discussed by combining these two concepts and using the binary grasshopper optimization algorithm. The mentioned algorithm is also able to solve real problems with unknown space [19]. In the second section, we will discuss the literature review, in the third section we will present the proposed method and in the fourth section we will analyze and evaluate the results.
  2. LITERATURE REVIEWIn this section, we will review the literature on the metaheuristic and binary algorithms for feature selection.

    In [3], a binary version of the bat algorithm is proposed. A comparative study with two dimensions of PSO and GA with more than 22 benchmark function has been conducted to achieve the result. In addition, the Wilcoxon non-parametric statistical test was significant at a significant level of 5% to determine whether the results of the proposed algorithm differ from other algorithms. The results suggest that the proposed binary algorithm BBA can significantly outperform others in most benchmark functions.

    In [5], a version of the PSO algorithm is presented that operates on discrete binary variables. In the binary version, trajectories are variations in the probability of coordination occurrence in a value of zero or one. The proposed algorithm is able to solve many different problems very quickly. The new version of the algorithm enhances the capabilities of the continuous-mode algorithm and can improve each continuous or discrete function.

    In [6], a nature-inspired feature selection technique has been proposed based on the behavior of bats. This view combines exploration power with OPF classification rate to find a set of features that maximize precision in a valid set. The algorithm is implemented on 5 datasets and compared with the PSO, FFA and GSA algorithms. It has been more successful in three datasets than other algorithms, and in the other two, this algorithm has been the second successful algorithm.

    In [7], a new feature selection method is proposed based on cuckoo’s behavior, which is the binary version of the cuckoo search algorithm. This algorithm has been implemented on 2 datasets based on the concept of the thief exploration in power

    distribution systems and its superiority over other nature-inspired optimization techniques has been proven. This algorithm is also perfect for feature selection and quickest approach to industrial datasets.

    In [7], a new binary version of the synthetic Binary Coordinate Ascent algorithm based on genetic operators (GA-ABC) is proposed to solve binary optimization problems. In this method, the global-localized search capability of the primary Artificial Bee Colony Algorithm (ABC) in a binary domain has been improved. The efficiency of this algorithm has been tested on two well-known optimization problems: dynamic image clustering and 0-1 Knapsack problems. The results show that GB-ABC algorithm is the best algorithm for binary optimization compared to other binary optimization algorithms.

    In [8], a wrapper feature selection algorithm is proposed based on the binary dragonfly algorithm. Dragonfly algorithm is a new particle swarm algorithm that emulates the behavior of dragonflies. The results of this algorithm are compared with PSO and GA algorithms and prove that this algorithm has a high performance than the other two ones.

    In [9], both the binary genetic and the Floating Point (FP) algorithms are examined using a dynamic control problem. The results show that the FP presentation of the genetic algorithm is faster and more consistent than its binary mode and is more accurate than it. Also, FP mode can achieve a higher performance than binary mode by certain operators.

    In [10], a new feature selection algorithm is proposed based on Ant Colony Optimization Algorithm (ACO). In this algorithm, the efficiency of the classifier and the selected vector length are taken as the ACO heuristic information. This algorithm selects the subset of the optimal feature, without using the previous information of the features, and its implementation is easy and its computational complexity is very low due to the use of a simple classifier. The results show that this algorithm has a better performance than genetic-based algorithms with fewer features and low implementation time.

    In [11], a new binary version of the Gray Wolf Optimization Algorithm (GWO) is presented which selects the subset of the optimal feature subset for the purposes of classification. This algorithm has been compared with PSO and GA algorithms and its ability to search for feature space has been proven to find optimal features. Also, the efficiency of the features selected by this algorithm is higher than that of two other algorithms.

    In [12], binary optimizer types of Ant Lion Optimizer (ALO) algorithm are proposed, which selects a subset of optimal features for the purpose of the classification. This algorithm imitates the process of ant lions hunting. This algorithm can search optimally for the feature space and achieve a better solution than PSO, GA, ALO, and BBA.

    In [13], a new spam detection method is proposed that focuses on reducing the incorrect labeling error of non-spam tags as spam. In this technique, the wrapper feature selection method is used. The results indicate that this method is effective and reduces the incorrect labeling error.

    In [14], the rate of the harmony search algorithm and Optimal Path Forest (OPT) classifier has been used to arrive at a quick and accurate new perspective for feature selection. Compared to other pattern recognition and feature selection algorithms, this proposed hybrid algorithm performs better.

    In [15], a new hybrid approach is proposed for simultaneous feature selection and weighting features based on the K-NN law on the TS heuristic algorithm. The results show a great improvement in the efficiency and classification accuracy. Experiments also show that the TS heuristic algorithm is more powerful than the simple TS algorithm and sequential search algorithms.

    In [16], a new perspective is proposed based on the Gravitation Search Algorithm (GSA) for feature selection. This algorithm combines the GSA optimization behavior with the rate of the OPF classifier to provide a precise and fast framework for features selection. The results indicate that this algorithm is as powerful as other similar algorithms.

    In this section, the feature selection meta heuristic algorithms were investigated, and it was shown that each of these algorithms has some disadvantages for feature selection. In the next section, we will present the proposed method.

  3. METHODS

In the past, the research literature and their problems were reviewed. This section provides a brief explanation of grasshopper optimization algorithm and the proposed method.

    1. Standard grasshopper optimization algorithmGrasshoppers are from the insect family. They are known as pests that cause damage to agricultural products. Grasshopper life cycle is shown in Figure (1)[17].

      Fig. 1. Grasshopper life cycle

      Equation (1) has been used to model the behavior of grasshoppers [18].

      = + +(1)

      Where is the location of grasshoppers, is the social interaction between the grasshoppers, is the gravitational force of the grasshoppers and A defines the direction of the wind direction. These show the location of a grasshopper. To create random behavior, we can use the equation (2), in which r can randomly vary between 0 and 1:

      = 1 + 2 + 3(2)

      Si is the social interaction that is calculated according to equation (3), where donates the distance between i-th grasshopper and j-th grasshopper.

      N

      Si = s (dij)d

      ij

      j=1 j i

      (3)

      In equation (3), denotes the distance between i-th grasshopper and j-th grasshopper, and is calculated as = | |.

      S function is a mapping for the distance between the grasshoppers obtained according to equation (4):

      r

      s (r) = fe l er

      (4)

      In which f represents the intensity of gravity, the general formula of which is in accordance with equation (5):

      N

      xj xi

      Xi = s (|xj xi|) d geg + uew

      j=1 ij

      j i

      (5)
      = (| |) +

      2

      =1

      ( )

      (6)
      = (| |) +

      2

      =1

      ( )

      (6)

       

      In this equation, is the number of grasshoppers, but since grasshoppers move on the ground, their location should not be more than a threshold, and thus the modified equation (6) is used:

      =(7)
      =(7)

       

      This equation of the location of a grasshopper is defined based on its current location, target location, and location of all other grasshoppers. In this equation, C is one of the important parameters of the grasshopper optimization algorithm, a reduction factor, by which the safe, and repulsion and gravity regions are affected. The update for this parameter is obtained from equation (7).

      The steps of the standard grasshopper optimization algorithm are shown in Figure (2).

      Fig. 2. Grasshopper optimization algorithm function

    2. Binary grasshopper optimization algorithmIn the standard grasshopper optimization algorithm, solutions are updated to continuous values. For feature selection in this algorithm, the search space is modeled as a Boolean network. A solution to the problem of selecting or not selecting a feature is using the binary vector. This algorithm consists of a vector containing zero and one, in which, one indicates that the feature is selected and zero indicates that the feature is not selected. We construct this binary vector according to Figure 3. As shown in Figure 3, this algorithm is transformed into a binary grasshopper optimization algorithm in three main steps :
      1. update the parameter c,
      2. nonlinear mapping of v,
      3. updating new location of grasshoppers.Fig. 3. Binary grasshopper optimization algorithm steps

        To construct this algorithm, the modified equations (8), (9) and (10) are used. In Equation (8), instead of using + , a T label is used. T is the best grasshopper that has ever existed. Equation (9) shows the social interaction of grasshoppers behavior in which:

        = +(8)
        N c

        S1 = 2 |XiXj|dj

        i#j

        (9)
        2 10

        = | tan1( )|

        2

        (10)
        2 10

        = | tan1( )|

        2

        (10)

         

        Since the decision variable in binary grasshopper optimization algorithm is only between 0 and 1, we use a nonlinear function called V that maps the social behavior of grasshoppers to another function or space and is calculated in accordance with equation (10).

        Finally, we need to update the location of grasshoppers. Equation (6) is obtained from this change and modification and is calculated according to Equation (11).

        = { >

        <

        (11)

        The proposed mathematical formulation is able to perform exploration and exploitation in search space. However, there should be a mechanism to adjust the level of exploration and exploitation of search factors.

        The reason for the simultaneous use of exploration and exploitation in this algorithm is to have an overview of the search space initially and to consider all the possible solutions and a partial look at the desired space at the end of the algorthm to find the optimal solution. In nature, locusts move first and seek food locally, because they do not have wings at the larva stage. Then they move freely in the air and explore a region with a larger scale. In the case of using an exploration algorithm without exploitation, only a random search is performed that is not appropriate for final implementations and obtaining the final optimal solution. Also, the use of an exploitation algorithm without considering an exploratory algorithm leads to finding a part of the optimal local solutions. However, in randomized optimization algorithms, optimal exploitation is done first due to the need to find more promising areas of search space. After finding more promising areas, exploitation requires search factors to search locally to find a good approximation of the global optimum. In equation (11), T contributes to exploitation and convergence, and

        helps preserving the diversity and exploration.

    3. Binary grasshopper optimization algorithm with the parameter c update

Updating the parameter plays an important role in the interaction between the two concepts of exploration and exploitation.

Updating this parameter is obtained in accordance with equation (12):

= + ((1 )(1)) (1 )(12)

In the above equation, cinf is a value close to zero (approximately 0.0004), which shows the final value of c that ends it. It also has the number of algorithm repetition and the values of w vary between 0 and 1. The switch between exploration and exploitation in this algorithm is performed by changing the value of the parameter w between 0 to 1. That is, if w is greater than 0.5, exploration is more important, but if w is less than 0.5, exploitation is more important. How to update the parameter is shown in Figure 4.

Fig. 4. Updating the parameter

As shown in Figure 4, changing the parameter value gives different charts. For example, with a value of 0.5 immediately with 60 iteration it has the value of 0.2, and actually ignores the exploitation. If it is equal to 1, then an interaction is established linearly between the two concepts. If we want to approach it to the concept of exploitation, we need to set the value at more than one, for example, equal to 2, which is here 0.9 at 60 iteration.

In the next section, we analyze and evaluate the datasets used, the settings and benchmarks for comparing, analyzing and evaluating the overall results.

  1. RESULTIn this section, we analyze the data used, as well as analyze the results and benchmarks evaluated.
      1. DatasetsIn this study, we use 7 datasets according to Table 1.

        Table 1. Dataset

        Number of featuresNumber of recordsDatasetsRow
        9788diabet1
        10699bcw2
        411055biodeg3
        19155hepatitis4
        41208sonar5
        574061spambase6
        56303Z_Alizadeh_sani7
      2. Settings and comparison benchmarks

    To perform simulation operations, the datasets are first entered the simulation environment and classified after normalization and validation. In the end, the results and benchmarks are compared with all the classified data. In order to analyze the performance of the proposed system, four factors of accuracy, precision, computational time and reduction rate are used as follows:

    +

    =

    + T + +

    (13)
    +

    =

    + T + +

    (13)

     

    • Accuracy: The most important benchmark to determine the performance of a classification algorithm. It shows how many percent of the total set of test records is properly categorized. The accuracy benchmark is calculated according to equation (13).
      =

      +

      (14)
      =

      +

      (14)

       

    • Precision: Indicates the algorithm ability to recognize the positive category. Here, the positive category is the patient one.
    • Computational time: indicates the time required to run the application. The less computational time the algorithm performs better.
      ns

      Reduction rate = 1

      nf

      (15)
      ns

      Reduction rate = 1

      nf

      (15)

       

    • Reduction rate: shows the reduction percentage in the features and is obtained from equation (15).

    According to equation (15), is the number of features selected by algorithm and is the total number of features.

  2. DISCUSSIONIn this section, we analyze and evaluate the results obtained from the four methods of feature selection with three different classifiers on different datasets.
      1. Accuracy benchmarkIn this section, we examine the accuracy benchmark in the different datasets in four methods of feature selection. This comparison is shown in Table 2. As seen in the table in the diabetes dataset [18], the use of the feature selection technique of BBA along with the Tree classifier provides better accuracy than other algorithms. Among the advantages of this algorithm in features selection on a diabetes dataset is having a high convergence rate compared to other meta heuristic algorithms. It also uses frequency regulation to solve problems and provides the ability to work with high-dimensional data.

        Table 2. Comparison of the accuracy benchmark

        The datasets used AccuracyFeature Selection AlgorithmsRow
        AlizadehSpambaseSonarHepatitBiodegBcwDiabet
        73.7790.2183.3361.2984.8397.1497The proposed method1
        83.5088.5885.7161.2981.0494.2895GA2
        78.6889.7871.4961.2975.7294.2898.75BBA3
        83.6089.3459.5267.7475.2597.8597. 5The proposed method with the parameter c update4

        In the dataset for cancer [19], the BGA with the update of the parameter c has shown a remarkable accuracy, which is due to maintain two benchmarks of exploration and exploitation more than three other algorithms. Hepatitis data are more accurate by using the feature selection method of BGOA_C and the update of the parameter along with the Tree classifier compared using the other three methods. Here the superior feature of the binomial grasshopper optimization algorithm can be mentioned that is combining and maintaining two benchmarksof exploration and exploitation. In the Sonar dataset [20], the genetic algorithm provides optimal accuracy due to its easy implementation as well as the optimum runtime. In the two spam base and Alizadeh datasets [21], the proposed method offers an optimal accuracy too, by retaining the two concepts of combination and exploitation. In general, it can be concluded that the accuracy benchmark in different datasets has shown a better performance using the proposed method, along with the updating the parameter .

      2. The reduction rate and precision benchmarks in the datasetsThe reduction rate and precision benchmarks for different datasets and different methods of features selection and the use of Decision TREE classifiers are shown in Table 3. The two reduction rate and precision benchmarks have been placed side by side. In fact, precision is a benchmark that requires calibration, and this calibration is done by the reduction rate, which is the same as the concept of feature selection. In the feature selection, unnecessary data are deleted and the necessary data are identified, so the precision of a model depends on the reduction rate by the feature selection.

        Table 3. Comparison of the precision and reduction rate in the feature selection algorithms

        The dataset used Percision/ReductionFeature Selection AlgorithmsRow
        AlizadehSpambaseSonarHepatitisBiodegBcwDiabetic
        P=88.63P=92.53P=73.33P=81.81P=78.94P=97.77P=96.92The proposed method1
        R=62%R=46%R=45%R=58%R=61%R=60%R=54%
        P=84.09P=89.43P=93.33P=81.81P=76.31P=86.15P=86.15GA2
        R=53%R=62%R=64%R=79%R=59%R=64%R=64%
        P=84.09P=91.83P=60P=81.81P=60.52P=97.77P=98.46BBA3
        R=57%R=53%R=60%R=79%R=49%R=60%R=44
        P=84.09P=93.80P=46.66P=90.90P=77.63P=98.89P=95.38The proposed method with the parameter c update4
        R=42%R=50%R=60%R=70%R=49%R=20%R=54%

        In the same condition, the algorithm shows a higher performance if the precision rate is higher and also the reduction rate is lower. As shown in Table 3, the BBA performs better than the other ones for the diabetes dataset, which contains 788 records and 9 features. The reason for this can be attributed to the advantage of the BBA, which uses parameter control compared to most of the meta heuristic algorithms that use constant parameters. Also, this algorithm performs better in the dataset that contains large amounts of noise, such as the diabetes dataset. In the cancer dataset, the proposed method, the binary grasshopper optimization algorithm, with the parameter c update, with a precision of 98.89 and a significant reduction rate of 20, showed a much better performance than other algorithms. Also in Biodegg [22], Hepatitis [23], Spam base, and Alizadeh datasets, the proposed method, the binary grasshopper optimization algorithm, as well as the BBA with the update of the parameter , have shown a significant precision and reduction rate. In the sonar dataset, the GA with a precision of 93.33 and a reduction rate of 64 had a better performance. In general, it can be said that the binary grasshopper optimization algorithm has been able to show better performance than other algorithms with updating the parameter , due to maintaining the two benchmarks of exploration and exploitation with the significant precision and reduction rate.

      3. The computational time benchmark in the datasetThe time needed to run the algorithms in different data with different methods of feature selection and using the TREE classifier is shown in Table 4.

        Table 4. Comparison of the computational time benchmark

        The datasets used AccuracyFeature Selection AlgorithmsRow
        AlizadehSpambaseSonarHepatitBiodegBcwDiabet
        58.7185.7359.5758.01464.8631.8665.05The proposed method1
        65.5393.9063.8562.4272.5633.5668.92GA2
        59.2885.8058.2337.4667.0238.2165.90BBA3
        59.7185.4458.1058.7966.6532.2465.85The proposed method with the parameter c update4

        The lower the computational time, the algorithm performs better. As can be seen in the table, the binary grasshopper optimization algorithm, with the update of parameter C in different datasets has shown a more efficient and cost-effective performance in terms of runtime than the other algorithms.

      4. Average accuracy

    Figure 5 shows the average accuracy for 4 feature selection methods. As seen in the chart, the average accuracy is GA: 84.2, BBA: 81.4, GoA:83.93, GOA_c 92. As can be seen, grasshopper optimization algorithm with a parameter update has had the highest average accuracy, and then the GA has had a good average accuracy.

    94

    92

    92

    90

    88

    86

    83.93 84.02

    84

    82 81.04

    80

    78

    76

    74

    GOA GA BBA GOA_C

    Average accuracy 83.93 84.02 81.04 92

    Fig. 5. Average accuracy

  3. CONCLUSION

In this paper, the feature selection problem is presented by the binary grasshopper optimization algorithm. The proposed method is performed on 7 datasets taken from the UCI repository and the benchmarks of accuracy, computational time, precision, and reduction rate are investigated. The results are compared at different algorithms. It was concluded that the binary grasshopper optimization algorithm is the best method for feature selection.

REFERENCES

  1. Zawbaa, H.M., E. Emary, and B. Parv. Feature selection based on antlion optimization algorithm. in Complex Systems (WCCS), 2015 Third World Conference on. 2015. IEEE.
  2. Pandya, R. and J.J.I.J.o.C.A. Pandya, C5. 0 algorithm to improved decision tree with feature selection and reduced error pruning. 2015. 117(16): p. 18- 21.
  3. Mirjalili, S., et al., Binary bat algorithm. 2014. 25(3-4): p. 663-681.
  4. Rodrigues, D., et al. BCS: A binary cuckoo search algorithm for feature selection. in Circuits and Systems (ISCAS), 213 IEEE International Symposium on. 2013. IEEE.
  5. Kennedy, J. and R.C. Eberhart. A discrete binary version of the particle swarm algorithm. in Systems, Man, and Cybernetics, 1997. Computational Cybernetics and Simulation., 1997 IEEE International Conference on. 1997. IEEE.
  6. Nakamura, R.Y., et al. BBA: a binary bat algorithm for feature selection. in 2012 25th SIBGRAPI conference on graphics, Patterns and Images. 2012. IEEE.
  7. Ozturk, C., E. Hancer, and D.J.I.S. Karaboga, A novel binary artificial bee colony algorithm based on genetic operators. 2015. 297: p. 154-170.
  8. Mafarja, M.M., et al. Binary dragonfly algorithm for feature selection. in New Trends in Computing Sciences (ICTCS), 2017 International Conference on. 2017. IEEE.
  9. Mirjalili, S., A.J.S. Lewis, and E. Computation, S-shaped versus V-shaped transfer functions for binary particle swarm optimization. 2013. 9: p. 1-14.
  10. Kanan, H.R., K. Faez, and S.M. Taheri. Feature selection using ant colony optimization (ACO): a new method and comparative study in the application of face recognition system. in Industrial Conference on Data Mining. 2007. Springer.
  11. Emary, E., H.M. Zawbaa, and A.E.J.N. Hassanien, Binary grey wolf optimization approaches for feature selection. 2016. 172: p. 371-381.
  12. Emary, E., H.M. Zawbaa, and A.E.J.N. Hassanien, Binary ant lion approaches for feature selection. 2016. 213: p. 54-65.
  13. Zhang, Y., et al., Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. 2014. 64: p. 22-31.
  14. Ramos, C.C., et al., A novel algorithm for feature selection using harmony search and its application for non-technical losses detection. 2011. 37(6): p. 886-894.
  15. Tahir, M.A., A. Bouridane, and F.J.P.R.L. Kurugollu, Simultaneous feature selection and feature weighting using Hybrid Tabu Search/K-nearest neighbor classifier. 2007. 28(4): p. 438-446.
  16. Papa, J.P., et al. Feature selection through gravitational search algorithm. in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. 2011. IEEE.
  17. Saremi, S., S. Mirjalili, and A.J.A.i.E.S. Lewis, Grasshopper optimisation algorithm: theory and application. 2017. 105: p. 30-47.
  18. htpp://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes
  19. https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/
  20. https://archive.ics.uci.edu/ml/support/Connectionist+Bench+(Sonar,+Mines+vs.+Rocks)
  21. https://archive.ics.uci.edu/ml/datasets/Z-Alizadeh+Sani
  22. https://archive.ics.uci.edu/ml/machine-learning-databases/00254/

Leave a Reply

Your email address will not be published. Required fields are marked *