A Binary Grasshopper Optimization Algorithm for Feature Selection

Introduction: The aim of the feature selection is to find the most important information from a set of given features. It has been used widely in recent years. The set of features primarily has associated features that influence classification performance and increase training time. So, the feature selection is necessary to remove inappropriate features and improve classification performance. In this paper, a nature-inspired feature selection method is proposed based on the behavior of grasshoppers. Grasshopper optimization algorithm is expressed for continuous problems as standard. Methods: In this paper, continuous grasshopper optimization algorithm is converted into binary by using a nonlinear mapping function. Also, the binary grasshopper optimization algorithm will be converted into a powerful algorithm by combining and retaining exploration and exploitation. Result: the dataset taken from the UCI repository is used. By comparing the proposed method with GA, Binary Bat Algorithm (BBA), we indicate that it has had better performance than previous methods in terms of accuracy, reduction rate, precision, and computational time. Conclusion: In this paper, the feature selection problem is presented by the binary grasshopper optimization algorithm. The results are compared at different algorithms. It was concluded that the binary grasshopper optimization algorithm is the best method for feature selection


INTRODUCTION
The main purpose of feature selection is to understand the basic process that creates the data to select a subset of the relevant features of the available items [1]. High-dimensional data offer serious challenges for existing learning practices [2]. To reduce the dimension, feature selection uses a wide technique. It selects a small subset of the relevant features from the real set, which usually improves the educational performance of the model. Using the feature selection technique, the cost of learning computing could also be reduced [2]. To date, many feature selection techniques have used meta heuristic algorithms to search for optimal subset such as BBA [3], binary dragonfly algorithm [4], and so on. Meta heuristic algorithms have two concepts of exploration and exploitation, by preserving and combining these two concepts, the algorithm can be converted into a powerful one. Unfortunately, in other algorithms there is no proper interaction between these two concepts. For example, the genetic algorithm combines these two concepts but failed to establish a proper interaction between the two. Given the features such as easy implementation and quick synchronization, grasshopper optimization algorithm has attracted more attention. In this paper, the feature selection is discussed by combining these two concepts and using the binary grasshopper optimization algorithm. The mentioned algorithm is also able to solve real problems with unknown space [19]. In the second section, we will discuss the literature review, in the third section we will present the proposed method and in the fourth section we will analyze and evaluate the results.

LITERATURE REVIEW
In this section, we will review the literature on the metaheuristic and binary algorithms for feature selection. In [3], a binary version of the bat algorithm is proposed. A comparative study with two dimensions of PSO and GA with more than 22 benchmark function has been conducted to achieve the result. In addition, the Wilcoxon non-parametric statistical test was significant at a significant level of 5% to determine whether the results of the proposed algorithm differ from other algorithms. The results suggest that the proposed binary algorithm BBA can significantly outperform others in most benchmark functions.
In [5], a version of the PSO algorithm is presented that operates on discrete binary variables. In the binary version, trajectories are variations in the probability of coordination occurrence in a value of zero or one. The proposed algorithm is able to solve many different problems very quickly. The new version of the algorithm enhances the capabilities of the continuous-mode algorithm and can improve each continuous or discrete function.
In [6], a nature-inspired feature selection technique has been proposed based on the behavior of bats. This view combines exploration power with OPF classification rate to find a set of features that maximize precision in a valid set. The algorithm is implemented on 5 datasets and compared with the PSO, FFA and GSA algorithms. It has been more successful in three datasets than other algorithms, and in the other two, this algorithm has been the second successful algorithm.
In [7], a new feature selection method is proposed based on cuckoo's behavior, which is the binary version of the cuckoo search algorithm. This algorithm has been implemented on 2 datasets based on the concept of the thief exploration in power distribution systems and its superiority over other nature-inspired optimization techniques has been proven. This algorithm is also perfect for feature selection and quickest approach to industrial datasets.
In [7], a new binary version of the synthetic Binary Coordinate Ascent algorithm based on genetic operators (GA-ABC) is proposed to solve binary optimization problems. In this method, the global-localized search capability of the primary Artificial Bee Colony Algorithm (ABC) in a binary domain has been improved. The efficiency of this algorithm has been tested on two well-known optimization problems: dynamic image clustering and 0-1 Knapsack problems. The results show that GB-ABC algorithm is the best algorithm for binary optimization compared to other binary optimization algorithms.
In [8], a wrapper feature selection algorithm is proposed based on the binary dragonfly algorithm. Dragonfly algorithm is a new particle swarm algorithm that emulates the behavior of dragonflies. The results of this algorithm are compared with PSO and GA algorithms and prove that this algorithm has a high performance than the other two ones.
In [9], both the binary genetic and the Floating Point (FP) algorithms are examined using a dynamic control problem. The results show that the FP presentation of the genetic algorithm is faster and more consistent than its binary mode and is more accurate than it. Also, FP mode can achieve a higher performance than binary mode by certain operators.
In [10], a new feature selection algorithm is proposed based on Ant Colony Optimization Algorithm (ACO). In this algorithm, the efficiency of the classifier and the selected vector length are taken as the ACO heuristic information. This algorithm selects the subset of the optimal feature, without using the previous information of the features, and its implementation is easy and its computational complexity is very low due to the use of a simple classifier. The results show that this algorithm has a better performance than genetic-based algorithms with fewer features and low implementation time.
In [11], a new binary version of the Gray Wolf Optimization Algorithm (GWO) is presented which selects the subset of the optimal feature subset for the purposes of classification. This algorithm has been compared with PSO and GA algorithms and its ability to search for feature space has been proven to find optimal features. Also, the efficiency of the features selected by this algorithm is higher than that of two other algorithms.
In [12], binary optimizer types of Ant Lion Optimizer (ALO) algorithm are proposed, which selects a subset of optimal features for the purpose of the classification. This algorithm imitates the process of ant lions hunting. This algorithm can search optimally for the feature space and achieve a better solution than PSO, GA, ALO, and BBA.
In [13], a new spam detection method is proposed that focuses on reducing the incorrect labeling error of non-spam tags as spam. In this technique, the wrapper feature selection method is used. The results indicate that this method is effective and reduces the incorrect labeling error.
In [14], the rate of the harmony search algorithm and Optimal Path Forest (OPT) classifier has been used to arrive at a quick and accurate new perspective for feature selection. Compared to other pattern recognition and feature selection algorithms, this proposed hybrid algorithm performs better.
In [15], a new hybrid approach is proposed for simultaneous feature selection and weighting features based on the K-NN law on the TS heuristic algorithm. The results show a great improvement in the efficiency and classification accuracy. Experiments also show that the TS heuristic algorithm is more powerful than the simple TS algorithm and sequential search algorithms.
In [16], a new perspective is proposed based on the Gravitation Search Algorithm (GSA) for feature selection. This algorithm combines the GSA optimization behavior with the rate of the OPF classifier to provide a precise and fast framework for features selection. The results indicate that this algorithm is as powerful as other similar algorithms.
In this section, the feature selection meta heuristic algorithms were investigated, and it was shown that each of these algorithms has some disadvantages for feature selection. In the next section, we will present the proposed method.

METHODS
In the past, the research literature and their problems were reviewed. This section provides a brief explanation of grasshopper optimization algorithm and the proposed method.

3.1.
Standard grasshopper optimization algorithm Grasshoppers are from the insect family. They are known as pests that cause damage to agricultural products. Grasshopper life cycle is shown in Figure (  Equation (1) has been used to model the behavior of grasshoppers [18]. = + + (1) Where is the location of grasshoppers, is the social interaction between the grasshoppers, is the gravitational force of the grasshoppers and A defines the direction of the wind direction. These show the location of a grasshopper. To create random behavior, we can use the equation (2), in which r can randomly vary between 0 and 1: Si is the social interaction that is calculated according to equation (3), where donates the distance between i-th grasshopper and j-th grasshopper.
In equation (3), denotes the distance between i-th grasshopper and j-th grasshopper, and is calculated as = | − |. S function is a mapping for the distance between the grasshoppers obtained according to equation (4): In which f represents the intensity of gravity, the general formula of which is in accordance with equation (5): In this equation, is the number of grasshoppers, but since grasshoppers move on the ground, their location should not be more than a threshold, and thus the modified equation (6) is used: (6) This equation of the location of a grasshopper is defined based on its current location, target location, and location of all other grasshoppers. In this equation, C is one of the important parameters of the grasshopper optimization algorithm, a reduction factor, by which the safe, and repulsion and gravity regions are affected. The update for this parameter is obtained from equation (7).
The steps of the standard grasshopper optimization algorithm are shown in Figure (2).

3.2.
Binary grasshopper optimization algorithm In the standard grasshopper optimization algorithm, solutions are updated to continuous values. For feature selection in this algorithm, the search space is modeled as a Boolean network. A solution to the problem of selecting or not selecting a feature is using the binary vector. This algorithm consists of a vector containing zero and one, in which, one indicates that the feature is selected and zero indicates that the feature is not selected. We construct this binary vector according to Figure 3. As shown in Figure 3, this algorithm is transformed into a binary grasshopper optimization algorithm in three main steps : 1. update the parameter c, 2. nonlinear mapping of v, 3. updating new location of grasshoppers. To construct this algorithm, the modified equations (8), (9) and (10) are used. In Equation (8), instead of using + , a T label is used. T is the best grasshopper that has ever existed. Equation (9) shows the social interaction of grasshoppers' behavior in which: Since the decision variable in binary grasshopper optimization algorithm is only between 0 and 1, we use a nonlinear function called V that maps the social behavior of grasshoppers to another function or space and is calculated in accordance with equation (10).
Finally, we need to update the location of grasshoppers. Equation (6) is obtained from this change and modification and is calculated according to Equation (11).
The proposed mathematical formulation is able to perform exploration and exploitation in search space. However, there should be a mechanism to adjust the level of exploration and exploitation of search factors.
The reason for the simultaneous use of exploration and exploitation in this algorithm is to have an overview of the search space initially and to consider all the possible solutions and a partial look at the desired space at the end of the algorithm to find the optimal solution. In nature, locusts move first and seek food locally, because they do not have wings at the larva stage. Then they move freely in the air and explore a region with a larger scale. In the case of using an exploration algorithm without exploitation, only a random search is performed that is not appropriate for final implementations and obtaining the final optimal solution. Also, the use of an exploitation algorithm without considering an exploratory algorithm leads to finding a part of the optimal local solutions. However, in randomized optimization algorithms, optimal exploitation is done first due to the need to find more promising areas of search space. After finding more promising areas, exploitation requires search factors to search locally to find a good approximation of the global optimum. In equation (11), T contributes to exploitation and convergence, and helps preserving the diversity and exploration.

Binary grasshopper optimization algorithm with the parameter c update
Updating the parameter plays an important role in the interaction between the two concepts of exploration and exploitation. Updating this parameter is obtained in accordance with equation (12): In the above equation, cinf is a value close to zero (approximately 0.0004), which shows the final value of c that ends it. It also has the number of algorithm repetition and the values of w vary between 0 and 1. The switch between exploration and exploitation in this algorithm is performed by changing the value of the parameter w between 0 to 1. That is, if w is greater than 0.5, exploration is more important, but if w is less than 0.5, exploitation is more important. How to update the parameter is shown in Figure 4.  Figure 4, changing the parameter value gives different charts. For example, with a value of 0.5 immediately with 60 iteration it has the value of 0.2, and actually ignores the exploitation. If it is equal to 1, then an interaction is established linearly between the two concepts. If we want to approach it to the concept of exploitation, we need to set the value at more than one, for example, equal to 2, which is here 0.9 at 60 iteration.
In the next section, we analyze and evaluate the datasets used, the settings and benchmarks for comparing, analyzing and evaluating the overall results.
4. RESULT In this section, we analyze the data used, as well as analyze the results and benchmarks evaluated.

Datasets
In this study, we use 7 datasets according to Table 1

Settings and comparison benchmarks
To perform simulation operations, the datasets are first entered the simulation environment and classified after normalization and validation. In the end, the results and benchmarks are compared with all the classified data. In order to analyze the performance of the proposed system, four factors of accuracy, precision, computational time and reduction rate are used as follows:  Accuracy: The most important benchmark to determine the performance of a classification algorithm. It shows how many percent of the total set of test records is properly categorized. The accuracy benchmark is calculated according to equation (13).
 Precision: Indicates the algorithm ability to recognize the positive category. Here, the positive category is the patient one.
= +  Computational time: indicates the time required to run the application. The less computational time the algorithm performs better.  Reduction rate: shows the reduction percentage in the features and is obtained from equation (15).
According to equation (15), is the number of features selected by algorithm and is the total number of features.

DISCUSSION
In this section, we analyze and evaluate the results obtained from the four methods of feature selection with three different classifiers on different datasets.

Accuracy benchmark
In this section, we examine the accuracy benchmark in the different datasets in four methods of feature selection. This comparison is shown in Table 2. As seen in the table in the diabetes dataset [18], the use of the feature selection technique of BBA along with the Tree classifier provides better accuracy than other algorithms. Among the advantages of this algorithm in features selection on a diabetes dataset is having a high convergence rate compared to other meta heuristic algorithms. It also uses frequency regulation to solve problems and provides the ability to work with high-dimensional data. In the dataset for cancer [19], the BGA with the update of the parameter c has shown a remarkable accuracy, which is due to maintain two benchmarks of exploration and exploitation more than three other algorithms. Hepatitis data are more accurate by using the feature selection method of BGOA_C and the update of the parameter along with the Tree classifier compared using the other three methods. Here the superior feature of the binomial grasshopper optimization algorithm can be mentioned that is combining and maintaining two benchmarks of exploration and exploitation. In the Sonar dataset [20], the genetic algorithm provides optimal accuracy due to its easy implementation as well as the optimum runtime. In the two spam base and Alizadeh datasets [21], the proposed method offers an optimal accuracy too, by retaining the two concepts of combination and exploitation. In general, it can be concluded that the accuracy benchmark in different datasets has shown a better performance using the proposed method, along with the updating the parameter .

The reduction rate and precision benchmarks in the datasets
The reduction rate and precision benchmarks for different datasets and different methods of features selection and the use of Decision TREE classifiers are shown in Table 3. The two reduction rate and precision benchmarks have been placed side by side. In fact, precision is a benchmark that requires calibration, and this calibration is done by the reduction rate, which is the same as the concept of feature selection. In the feature selection, unnecessary data are deleted and the necessary data are identified, so the precision of a model depends on the reduction rate by the feature selection. In the same condition, the algorithm shows a higher performance if the precision rate is higher and also the reduction rate is lower. As shown in Table 3, the BBA performs better than the other ones for the diabetes dataset, which contains 788 records and 9 features. The reason for this can be attributed to the advantage of the BBA, which uses parameter control compared to most of the meta heuristic algorithms that use constant parameters. Also, this algorithm performs better in the dataset that contains large amounts of noise, such as the diabetes dataset. In the cancer dataset, the proposed method, the binary grasshopper optimization algorithm, with the parameter c update, with a precision of 98.89 and a significant reduction rate of 20, showed a much better performance than other algorithms. Also in Biodegg [22], Hepatitis [23], Spam base, and Alizadeh datasets, the proposed method, the binary grasshopper optimization algorithm, as well as the BBA with the update of the parameter , have shown a significant precision and reduction rate. In the sonar dataset, the GA with a precision of 93.33 and a reduction rate of 64 had a better performance. In general, it can be said that the binary grasshopper optimization algorithm has been able to show better performance than other algorithms with updating the parameter , due to maintaining the two benchmarks of exploration and exploitation with the significant precision and reduction rate.

The computational time benchmark in the dataset
The time needed to run the algorithms in different data with different methods of feature selection and using the TREE classifier is shown in Table 4. The lower the computational time, the algorithm performs better. As can be seen in the table, the binary grasshopper optimization algorithm, with the update of parameter C in different datasets has shown a more efficient and cost-effective performance in terms of runtime than the other algorithms. Figure 5 shows the average accuracy for 4 feature selection methods. As seen in the chart, the average accuracy is GA: 84.2, BBA: 81.4, GoA:83.93, GOA_c 92. As can be seen, grasshopper optimization algorithm with a parameter update has had the highest average accuracy, and then the GA has had a good average accuracy.

IJERTV9IS030420
(  6. CONCLUSION In this paper, the feature selection problem is presented by the binary grasshopper optimization algorithm. The proposed method is performed on 7 datasets taken from the UCI repository and the benchmarks of accuracy, computational time, precision, and reduction rate are investigated. The results are compared at different algorithms. It was concluded that the binary grasshopper optimization algorithm is the best method for feature selection.