A Comprehensive Review of Literature in the field of feature selection and Genetic Algorithms for classification

DOI : 10.17577/IJERTV13IS040176

Download Full-Text PDF Cite this Publication

Text Only Version

A Comprehensive Review of Literature in the field of feature selection and Genetic Algorithms for classification

Published by : http://www.ijert.org

International Journal of Engineering Research & Technology (IJERT)

ISSN: 2278-0181

Vol. 13 Issue 4, April 2024

Amrita Priyam, Kahkashan Kouser

Birla Institute of Technology, Mesra, Ranchi

Abstract- In machine learning the process of selecting features is an important task. This is a method of selecting a subset of

  1. FEATURE SELECTION

    relevant or significant variables and features. Feature selection is applicable in multiple areas such as Diabetes Prediction,

    FOR DISEASE

    PREDICTION

    anomaly detection, Bioinformatics, image processing, etc. where high dimensional data is generated. This paper gives a literature review on different feature selection methods. Further, the genetic algorithms have been applied in various domain for classification purposes. This paper also presents an extensive review of such papers.

    Keywords: Features, Feature Extraction, Feature Selection, Genetic Algorithm, Optimization

    1. INTRODUCTION

    This is a vast field of research and different types of studies have been carried out on Feature selection. Feature

    selection/extraction is an important step in machine learning tasks. Feature selection has been proven to be an effective and efficient way to prepare high-dimensional data for data mining and machine learning. The ordinary genetic algorithm is an optimization procedure working in binary search spaces, i.e., the search spaces consisting of binary strings, but after some coding it can be also applied to continuous search spaces. Unlike classical hill-climbers it does not evaluate and improve a single solution but, instead, it analyzes and modifies a population (that is, a set) of solutions at the same time. The power of this intrinsic parallelism of genetic search is amplified by the mechanics of population modification, allowing the genetic algorithms to attack successfully on NP-hard problems. The papers relevant to these domains are now presented in the following sections.

    In the paper Ritesh Jha et al. have used Transfer Learning with Feature Extraction Modules Network, for classifying medical images. Further the authors have considered Feature representations from General Feature Extraction Module (GFEM) and Specific Feature Extraction Module (SFEM) as an input to a projection head and the classification module to learn the target data. The aim is to extract representations at different levels of hierarchy and use them for the final representation learning. [1]

    The authors in [2] have identified the various feature sets for the diabetes prediction.Further in this paper the algorithms K nearest neighbours, Decision Tree algorithm ,SVM are studied in detail.

    In the paper authored by Vandana Bhattacharjee et al.

    [3]several studies that have explored feature selection and missing value imputation techniques for predicting diabetes have been carried out.

    Some notable findings from these studies include: effectiveness of feature selection and comparison of different feature selection techniques.

    In the paper ,A Feature Selection Algorithm Integrating Maximum Classification Information and Minimum

    Interaction Feature Dependency Information, the author Li Zhang, has mentioned about the categories of feature selection algorithm wrappers that use the learning algorithm itself to evaluate the usefulness of features and filters that evaluate features according to heuristics based on general characteristics of the data.[4]

    By adding the minimum constraint on the kernel norm of the self-representation coefficient matrix, a new unsupervised feature selection method named low-rank regularized self- representation (LRRSR) is proposed, by Wenyuan Li, Lai Wei ,which can effectively discover the overall structure of

    the data.[5]

    Also from Feature selection and processing methods correlation-based feature subset selection (CRFS) can be developed. This technique applies, two correlation measures for the feature subset-based selection, a) feature-class correlation b) feature-feature correlation With the help of heuristic-based best-first search, any number of features are combined. These two correlation measures are used for evaluation of subsets. [6]

    This work by Dash and Huan Liu,focuses on inconsistency measure according to which a feature subset is inconsistent if there exist at least two instances with same feature values but with different class labels. They compared inconsistency measure with other measures and studied different search strategies such as exhaustive, complete, heuristic and random search, that can be applied to this measure. They have also conducted an empirical study to examine the pros and cons of these search methods[7].

    In the paper Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm written by Sharifai and Zainol the challenges of high dimensional and imbalanced data set has been considered. For overcome this problem feature selection is used. Further they proposed a new method called Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm (rCBR-BGOA); rCBR-BGOA has employed an ensemble of multi-filters coupled with the Correlation-Based Redundancy method to select optimal feature subsets.[8]

    In the paper,by Gang Kou et al. an MCDM-based approach to evaluate feature selection methods for text classification has been used. For their empirical study, they chose 10 feature selection methods, 9 binary classification measures, 7 multi-class classification measures, and 5 MCDM methods to validate each evaluation approach .[9]

    The simulated annealing search is used by some researchers for creating feature subsets. Lin et al. [10] applied simulated annealing search for generating feature subsets. Also for choosing better feature subset, supervised learning algorithm back-propagation network (BPN) is used as evaluation measure.

    For marketing application Meiri & Zahavi [11] applied simulated annealing during feature selection process. Tabu search in the feature selection is proposed by Zhang & Sun [48]. Some feature selection algorithm uses genetic algorithm for generating feature subsets and uses supervised machine learning algorithm for assessment of feature

    subsets. For mining medical dataset, Welikala et al[12] proposed feature selection by aid of genetic algorithm and SVM(support vector machine) . For EEG(electroencephalogram ) signal class Erguzel et al[13] apply artificial neural network and genetic algorithm.

    In credit risk assessment Oreski & Oreski [14] has given feature selection method on the basis of neural networks and genetic algorithm.

    Li et al. emphasized the differences and similarities of most existing feature selection algorithms for generic data. They categorized them into four groups: similarity based, information theoretical based, sparse learning based and statistical based methods.[15]

    For handwritten digit recognition application, Das et al [16] proposed feature selection by using genetic algorithm with support vector machine.

    Pudjihartono et al.[17] have the following view If the dataset contains a relatively low number of features (e.g., tens to hundreds), applying wrapper methods likely results in the best predictive performance.

    In this case, model selection algorithms can be applied to identify which wrapper algorithm is the best. By contrast, for the typical SNP genotype dataset with up to a million features, computational limitations mean that directly applying wrapper or embedded methods might not be computationally practical even though they model feature dependencies and tend to produce better classifier accuracy than filter methods.

    Yu and Liu [18] proposed a method which selects good features for classification based on a novel concept, predominant correlation, and then present a fast algorithm with less than quadratic time complexity.

    In the paper by Xue et al.[19] proposed a new Particle Swarm Optimization based feature selection approach for selecting a smaller number of features and achieving similar or even better classification performance than using all features and traditional/existing feature selection methods. In order to achieve this goal, the authors proposed three new initialisation strategies, which are motivated by forward selection and backward selection, and three new personal and global best updating mechanisms, which consider both the number of feature and the classification performance to overcome the limitation of the traditional updating mechanism.

    Zhang et al., [20] proposed particle swarm optimization search based feature selection for sleep disorder diagnosis system.

    It is a general observation that not having right set of particles in the swarm may result in sub-optimal solutions, affecting the accuracies of classifiers. To address this issue, Mallenahalli et.al.,[21] proposed a novel tunable swarm size approach to reconfigure the particles in a standard PSO, based on the data sets, in real time. The proposed algorithm is named as Tunable Particle Swarm Size Optimization Algorithm (TPSO). It is a wrapper based approach wherein an Alternating Decision Tree (ADT).

    Holte[22] have generate One R a rule- based feature selection that generates one rule for every feature and choose rule with minimum error.

    The article by Yang and Moody[23] introduces two new nonlinear feature selection methods, namely Joint Mutual Information Maximisation (JMIM) and Normalised Joint Mutual Information Maximisation (NJMIM); both these methods use mutual information and the maximum of the minimum criterion, which alleviates the problem of overestimation of the feature significance as demonstrated both theoretically and experimentally. The proposed methods are compared using eleven publically available datasets with five competing methods.

    For identifying relevant features, it finds out joint mutual information among particular feature and the target-class.

    In the article by Shijin Li et al.[24] a hybrid feature selection strategy based on genetic algorithm and support vector machine (GASVM), which formed a wrapper to search for the best combination of bands with higher classification accuracy is proposed. In addition, band grouping based on conditional mutual information between adjacent bands was utilized to counter for the high correlation between the bands and further reduced the computational cost of the genetic algorithm. During the post-processing phase, the branch and bound algorithm was employed to filter out those irrelevant band groups.

    For identifying redundant feature Danasingh,et al., has proposed clustering techniques can be employed in order to overcome the problems in the redundancy analysis phase of the feature selection process.The clustering techniques in feature selection are used to group the similar features in order to remove the redundant features from the dataset to improve the accuracy of the classifier.[25

    A mutual data-based MRMR( max- relevancy min- redundancy )based feature selection is proposed by Peng et al. [26]. According to him for identifying the feature relevance between the individual feature and target-class, mutual information is used. The mutually exclusive condition is also used for identifying redundant features.

    In the paper, a feature relevant-redundant weight (RRW) is constructed to extract the important relevant and redundant information is proposed [27]. A novel feature relevance is defined based on the weight, which contains more comprehensive information from the dynamically changing features. Additionally, a feature evaluation criterion is presented via maximizing the feature relevance and minimizing the feature redundancy.

    Fleuret[28]proposed in his paper a very fast feature selection technique based on conditional mutual information. By picking features which maximize their mutual information with the class to predict conditional to any feature already picked, it ensures the selection of features which are both individually informative and two-by-two weakly dependant. We show that this feature selection method outperforms other classical algorithms, and that a naive Bayesian classifier built with features selected that way achieves error rates similar to those of state-of-the-art methods such as boosting or SVMs.

  2. REVIEW OF LITERATURE ON GENETIC

    ALGORITHMS

    The paper by Kabir et al., [29] presents a new hybrid genetic algorithm (HGA) for feature selection (FS), called as HGAFS. The vital aspect of this algorithm is the selection of salient feature subset within a reduced size. HGAFS incorporates a new local search operation that is devised and embedded in HGA to fine-tune the search in FS process.

    In the paper by Varzaneh et,al.,[30] A novel feature selection model is proposed. In the first step, the features are scored based on the minimum redundancy maximum relevance ( mRMR )feature selection approach, and those with a higher score are selected. In the second step, the Wrapper feature selection approach is used to extract the best features based on the Improved Equilibrium Optimization (IMEO) algorithm.

    The paper by Hermo et al.,[31] lossless federated version of the classic minimum redundancy maximum relevance (mRMR) feature selection algorithm, called federated mRMR (fed-mRMR), which, without losing any effectiveness of the original mRMR method, is proposed. It is applicable to federated learning approaches and capable of dealing with data that are not independent and identically distributed (non-IID data).

    Among the various categories of feature selection algorithms, Genetic algorithm is used to solve real world optimization problems.

    The Concept of Genetic Algorithms (GA) Although research in genetic algorithms has a twenty-year history it has been just recently that theoretical advances and several spectacular successes in practical application attracted more attention to this field and caused its rapid growth. [32]

    In Paper [33] to verify the performance of GA on the transfer CNN tasks, three datasets (Dataset 1, 2 and 3) are tested. The genetic operations show a significant improvement in the average accuracy on all the given datasets. The accuracy in the first generation is barely better than a random choice. While after the system converged to the best individual achieved an accuracy of 97%. Around the 14th generation, the system converged and gives the average recognition accuracies at 93%, 90%, and 87% of the three datasets, respectively. The average recognition accuracy is updated from 76% to 88% by generation.

    In the paper by Sharma et.al,[34] have defined three goals in their study:The first is to design and build a GAAN to forecast stock data. The system is divided into the GA and the ANN modules. The weights of the ANN are optimized using GA, and resulting ANN is used to make predictions. The second is to validate the model with actual stock market data to check the performance of the model. Validation is performed on the model using stock data during COVID-19 from March 1, 2020, to October 8, 2020. The third is to use GANN to predict the Dow30 and NASDAQ100 indices closing prices for the next day. GA-based classifier performs almost as powerfully as the Adam optimizer, with a negligible difference.

    In the paper of [35] [36] [37] , the genetic algorithm-based weighted average metho, can be implemented in the

    prediction of multiple models. The comparison has been done between Particle swarm optimization (PSO), Differential Evolution (DE) and Genetic algorithm (GA). The genetic algorithm outperforms weighted average methods. The other comparison has been done between the classical ensemble method and GA based weighted average method and deduced that GA based weighted average method outperforms.

    In the research paper by M.Gupta et al.,[38] hyper- parameter adjustment of the Decision Tree (DT) ,machine learning model and Artificial Neural Network (ANN) model and Genetic Algorithm is used.

    In the article by Krzysztof Drachal and Micha Pawowski[39] have shown the relevance of Genetic Algorithm in this fieid of forecasting commodity prices. In this article, three groups of commodities are discussed: energy commodities, metals, and agricultural products. The advantages and disadvantages of genetic algorithms and their hybrids are presented, and further conclusions concerning their possible improvements and other future applications are discussed.

    In the paper by Hussein et al.,[40] have presented a comprehensive survey of using genetic algorithms (GA) for feature selection in pattern recognition applications, with a special focus on character recognition. According to the authors Many search algorithms have been used for feature selection. Among those, GA have proven to be an effective computational method, especially in situations where the search space is uncharacterized (mathematically), not fully understood, or/and highly dimensional.

    Krishna and Murty[41] proposed a novel hybrid genetic algorithm (GA) that finds a globally optimal partition of a given data into a specified number of clusters. GA's used earlier in clustering employ either an expensive crossover operator to generate valid child chromosomes from parent chromosomes or a costly fitness function or both. To circumvent these expensive operations, they hybridize GA with a classical gradient descent algorithm used in clustering, viz. K-means algorithm. Hence, the name genetic K-means algorithm (GKA).

    Alzubaidi et.al.,[42] have proposed a hybrid feature selection approach to breast cancer diagnosis which combines a Genetic Algorithm (GA) with Mutual Information (MI) for selecting the best combination of cancer predictors, with maximal discriminative capability. The selected features are then input into a classifier to predict whether a patient has breast cancer. Using a publicly available breast cancer dataset, experiments were performed to evaluate the performance of the Genetic Algorithm based on the Mutual Information approach with two different machine learning classifiers, namely the k-Nearest Neighbor (K-NN), and Support vector machine (SVM), each tuned using different distance measures and kernel functions, respectively. The results revealed that the proposed hybrid approach is highly accurate for predicting breast cancer, and it is very promising for predicting other cancers using clinical data.

    Rostami and Moradi[43] presented a clustering based genetic algorithm for feature selection (CGAFS). Their proposed algorithm worked in three steps. In the first step,

    Subset size was determined. In the second step, features were divided into clusters using k-means clustering algorithm. Finally, in the third step, features were selected using genetic algorithm with a new clustering based repair operation. The performance of the proposed method had been assessed on five benchmark classification problems. They also compared the performance of CGAFS with the results obtained from four existing well-known feature selection algorithms. The results show that the CGAFS produces consistently better classification accuracies.

    Desale et. al., applied two main categories for feature fusion in NIDS: filters and wrappers [44]. The filters are applied through statistical methods, information theory based methods, or searching techniques such as Principal Component Analysis (PCA), Latent Dirichlet Allocation (LDA), Independent Component Correlation Algorithm (ICA), and Correlation-Based Feature Selection (CFS). The wrapper uses a machine learning algorithm to evaluate and fuse features to identify the best subset representing the original dataset. The wrapper is based on two parts: feature search and evaluation algorithms. The wrapper approach is generally considered to generate better feature subsets but costs more computing and storage resources than the filter. In the paper by Oreski,[45] an advanced novel heuristic algorithm is presented, the hybrid genetic algorithm withneural networks (HGA-NN), which is used to identify an optimum feature subset and to increase the classification accuracy and scalability in credit risk assessment.

    Fung et.al.,[46]describes a formal fuzzy genetic algorithm to overcome the traditional problems in feature classification and selection and provides fuzzy templates for the identification of the smallest subset of features.

    Anusha [47] explained the problem of identifying deviation point in a data set that exhibit non-standard behaviour is referred to as outlier. his paper presents a reference point based outlier detection algorithm using multi-objective evolutionary clustering technique(MOODA). In this algorithm, it assigns a deviation degree to each data point using the sum of distances between referential points to detect distant subspaces where outliers may exist.

    Identification of local regions from where optimal discriminating features can be extracted is one of the major tasks in the area of pattern recognition.Das et.al.,[48] proposed a methodology where local regions of varying heights and widths are created dynamically. Genetic algorithm (GA) is then applied on these local regions to sample the optimal set of local regions from where an optimal feature set can be extracted that has the best discriminating features.

    In this paper[49], Zhao et.al., described the asymptotic behaviors of support vector machines .SVM are fused with genetic algorithm (GA) and the feature chromosomes are generated, which thereby directs the search of genetic algorithm to the straight line of optimal generalization error in the superparameter space. On this basis, a new approach based on genetic algorithm with feature chromosomes, termed GA with feature chromosomes, is proposed to simultaneously optimize the feature subset and the parameters for SVM.

    The aim of the paper by Chen et.al.,[50] is to present an efficient and effective method of constructing SVM classifier, so that SVMs can be applied into wider range of practical use and provide promising results. In this paper, a coarse-grained parallel genetic algorithm (CGPGA) is used to jointly select feature subset and optimize parameters for SVMs. The key idea of CGPGA is to divide the whole GA population into several separate subpopulations, and each subpopulation can search the whole solution space in parallel way. After every certain number of generations, best individual of each subpopulation will migrate to other subpopulations. The distributed topology and the migration policy can significantly accelerate the process of feature selection and parameter optimization, so as to increase classification accuracy of SVM.

    Ebrahimi et.al.,[51] introduced one population-based meta- heuristic method. The presented method is based on the lifestyle of the Lion and the genetic algorithms structure; therefore we called it as New Genetic Algorithm (NGA). The initial population of solutions distribute between some groups (pride) and the best solution within each group is called commander. Each child in each group (earned by mutation or crossover) challenges all commanders to substitute with them. After a finite number of iterations, the best commander is the final solution.

  3. CONCLUSION

In this paper we have presented a literature review of research papers in the field of feature selection and genetic algorithms, applied for the classificatin task in various domains. This will be useful for researchers working in the areas of health analytics, intrusion detection, disease prediction in crops, community detection and many more.

REFERENCES:

  1. Ritesh Jha, Vandana Bhattacharjee, and Abhijit Mustafi, Transfer Learning with Feature Extraction Modules for Improved Classifier Performance on Medical Image Data, Scientific Programming, Hindawi, Volume 2022, Article ID 4983174 | https://doi.org/10.1155/2022/49831742022 [SCIE, IF 1.672].

  2. RashmikManchiraju and Vandana Bhattacharjee, A Machine Learning Approach to Diabetes Prediction with Feature Selection, International Journal of Engineering Research & Technology (IJERT) Volume 11,

    Issue 7, July 2022

  3. Vandana Bhattacharjee, Ankita Priya and Umesh Prasad (2023) Evaluating the Performance of Machine Learning Models for Diabetes Prediction with Feature Selection and Missing Values Handling, International Journal of Microsystems and IoT, Vol. 1, Issue 1, 26 June 2023

  4. Li Zhang, "A Feature Selection Algorithm Integrating Maximum Classification Information and Minimum Interaction Feature Dependency Information", Computational Intelligence and Neuroscience, vol. 2021, Article ID 3569632, 10 pages, 2021. https://doi.org/10.1155/2021/3569632

  5. "Unsupervised Feature Selection Based on Low-Rank Regularized Self- Representation" written by Wenyuan Li, Lai Wei, published by Open Access Library Journal, Vol.7 No.4, 2020

  6. Hall, M.A. (1999) Correlation-Based Feature Selection for Machine Learning. PhD Thesis, University of Waikato, Hamilton.

  7. Manoranjan Dash, Huan Liu, Consistency-based search in feature selection, Artificial Intelligence, Volume 151, Issues 12, 2003, Pages

    155-176, ISSN 0004-3702,https://doi.org/10.1016/S0004-

    3702(03)00079-1.

    (https://www.sciencedirect.com/science/article/pii/S000437020300079 1)

  8. Garba Abdulrauf Sharifai and Zurinahni Zainol Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm Genes 2020, 11(7),

    717; https://doi.org/10.3390/genes11070717

  9. Gang Kou, Pei Yang, et al.Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision- making methods, Applied Soft Computing, Volume 86, 2020, 105836,

    ISSN 1568-4946, https://doi.org/10.1016/j.asoc.2019.105836.

  10. Lin et al. A simulated-annealing-based approach for simultaneous parameter optimization and feature selection of back-propagation networks, Expert Systems with Applications 34(2):1491-1499, 2008 DOI: 10.1016/j.eswa.2007.01.014

  11. Ronen Meiri, Jacob Zahavi, Using simulated annealing to optimize the feature selection problem in marketing applications, European Journal of Operational Research, Volume 171, Issue 3, 2006, Pages 842-858,

    ISSN 0377-2217, ISSN 0377-2217,

    https://doi.org/10.1016/j.ejor.2004.09.010.

  12. Welikala, R.A, Fraz, MM, Dehmeshki, J, Hoppe, A, Tah, V, Mann, S, Williamson, TH & Barman, SA , Genetic algorithm based feature selection combined with dual classification for the automated detection of proliferative diabetic retinopathy, Computerized Medical Imaging and Graphics, vol. 43, pp.64-77, 2015.

  13. Erguzel, TT, Ozekes, S, Tan, O & Gultekin, S 2015, Feature Selection and Classification of Electroencephalographic Signals an Artificial Neural Network and Genetic Algorithm Based Approach, Clinical EEG and Neuroscience, vol. 46, no.4, pp.321- 326.

  14. Oreski, S & Oreski, G 2014, Genetic algorithm-based heuristic for feature selection in credit risk assessment, Expert systems with applications, vol. 41, no.4, pp.2052- 2064.

  15. Li et al. Feature Selection, 2016,A Data PerspectiveACM Computing Surveys 50(6) DOI: 10.1145/3136625

  16. Das, N, Sarkar, R, Basu, S, Kundu, M, Nasipuri, M & Basu, DK 2012,

    A genetic algorithm based region sampling for selection of local features in handwritten digit recognition application, Applied Soft Computing, vol.12, no.5, pp.1592-1606.

  17. Pudjihartono et al. A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction Volume 2 – 2022 | https://doi.org/10.3389/fbinf.2022.927312

  18. Yu and Liu, Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution, Machine Learning, Proceedings of the Twentieth International Conference (ICML 2003), August 21-24, 2003,

  19. Bing Xue, Mengjie Zhang, Will N. Browne, Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms, Applied Soft Computing, Volume 18, 2014, Pages 261-276, ISSN 1568-4946,

    https://doi.org/10.1016/j.asoc.2013.09.018

  20. Zhang et. al.,Journal of Intelligent & Fuzzy Systems: Applications in Engineering and TechnologyVolume 31Issue 62016pp 2807 2812https://doi.org/10.3233/JIFS-169162

  21. Mallenahalli et.al., A Tunable Particle Swarm Size Optimization Algorithm

    2018 IEEE Congress on Evolutionary Computation (CEC)Jul 2018Pages 17https://doi.org/10.1109/CEC.2018.8477694

  22. Holte, R.C. Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning 11, 6390 (1993). https://doi.org/10.1023/A:1022631118932

  23. Bennasar, et al. Feature selection using Joint Mutual Information Mximisation, Volume 42, Issue 22, 2015, Pages 8520-8532, ISSN

    0957-4174,

    https://doi.org/10.1016/j.eswa.2015.07.007.

  24. Shijin Li et al.2011,An effective feature selection method for hyperspectral image classification based on genetic algorithm and support vector machine, Knowledge-Based Systems24(1):40-48 DOI: 10.1016/j.knosys.2010.07.003

  25. Danasingh, A.A.G.S., Subramanian, A. & Epiphany, J. Identifying redundant features using unsupervised learning for high-dimensional data. SN Appl. Sci. 2, 1367 (2020). https://doi.org/10.1007/s42452-

    020-3157-6

  26. Peng, et al.2020, Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-

    making methods, Applied Soft Computing Volume 86, 105836, ISSN 1568-4946, https://doi.org/10.1016/j.asoc.2019.105836.

  27. Shijie Zhao,et al.,2022,A feature selection method via relevant- redundant weight, Expert Systems with Applications ,Volume 207, 117923, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2022.117923.

  28. Fleuret,2004, Fast Binary Feature Selection with Conditional Mutual Information The Journal of Machine Learning ResearchVolume 5pp 1531-1555

  29. Kabir, et al., 2011,A new local search based hybrid genetic algorithm for feature selection,

    Neurocomputing, Volume 74, Issue 17, Pages 2914-2928, ISSN 0925-

    2312, https://doi.org/10.1016/j.neucom.2011.03.034

  30. Varzaneh, et al.,2022, A new hybrid feature selection based on Improved Equilibrium Optimization, Chemometrics and Intelligent Laboratory Systems,

    Volume 228, 104618, ISSN 0169-7439,

    https://doi.org/10.1016/j.chemolab.2022.104618.

  31. Hermo et al.,2024, Fed-mRMR: A lossless federated feature selection methods, Information Science, Volume 669, https://doi.org/10.1016/j.ins.2024.120609.\

  32. Goldberg, David E. and John H. Holland.1988, Genetic algorithms

    and Machine Learning. Machine Learning 3 ,95-99.

  33. Li, et al.,2021, Genetic Algorithm Based Hyper-Parameters Optimization for Transfer Convolutional Neural Network.International Conference on Advanced Algorithms and Neural Networks (AANN 2022), Vol. 12285, SPIE, Bellingham. https://doi.org/10.1117/12.2637170].

  34. Sharma, et al. , Integration of genetic algorithm with artificial neural network for stock market forecasting. Int J Syst Assur Eng Manag 13 (Suppl 2), 828841 (2022). https://doi.org/10.1007/s13198-021-01209-

    5

  35. Andoyo, F.A. and Arifudin, R. (2021) Optimization of Classification Accuracy Using K-Means and Genetic Algorithm by Integrating C4.5 Algorithm for Diagnosis Breast Cancer Disease. Journal of Advances in Information Systems and Technology, 3, 1-8. https://doi.org/10.15294/jaist.v3i1.49011

  36. Alalayah, K.M.A., Almasani, S.A.M. and Qaid, W.A.A. (2018) Breast Cancer Diagnosis based on Genetic Algorithms and Neural Networks. International Journal of Computer Applications, 180, 42-44. https://doi.org/10.5120/ijca2018916605]

  37. Chauhan, P. and Swami, A. (2018) Breast Cancer Prediction Using Genetic Algorithm Based Ensemble Approach. 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Bengaluru, https://doi.org/10.1109/ICCCNT.2018.8493927]

  38. Gupta, M., Rajnish, K. & Bhattacharjee, V. Software fault prediction with imbalanced datasets using SMOTE-Tomek sampling technique and Genetic Algorithm models. Multimed Tools Appl (2023). https://doi.org/10.1007/s11042-023-16788-7

  39. Krzysztof DrachalMicha Pawowski,A Review of the Applications of Genetic Algorithms to Forecasting Prices of Commodities, January 2021Economies 9(1):6

    DOI: 10.3390/economies9010006

  40. Huaaein et.al., Genetic algorithms for feature selection and weighting, a review and study, February 2001 DOI: 10.1109/ICDAR.2001.953980 [41]Krishna and Murty, Genetic K-Means AlgorithmFebruary 1999IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics: a publication of the IEEE Systems, Man, and Cybernetics Society

29(3):433-9, DOI: 10.1109/3477.764879

  1. Alzubaidi, Abeer et al. Breast Cancer Diagnosis Using a Hybrid Genetic Algorithm for Feature Selection Based on Mutual Information. 2016 International Conference on Interactive Technologies and Games (ITAG) (2016): 70-76.

  2. Rostami and Moradi , A clustering based genetic algorithm for feature selection Conference: Information and Knowledge Technology (IKT), 2014 6th Conference, DOI: 10.1109/IKT.2014.7030343

  3. K. S. Desale , R. Ade, "Genetic algorithm based feature selection approach for effective intrusion detection system", IEEE 2015 International Conference on Computer Communication and Informatics,2015 .

  4. Oreski, S & Oreski, G 2014, Genetic algorithm-based heuristic for feature selection in credit risk assessment, Expert systems with applications, vol. 41, no.4, pp.2052- 2064.

  5. Fung,et.al., "Fuzzy genetic algorithm approach to feature selection problem," Proceedings of 6th International Fuzzy Systems Conference,

    Barcelona, Spain, 1997, pp. 441-446 vol.1, doi: 10.1109/FUZZY.1997.616408.

  6. M. Anusha , Multi-objective Optimization to Detect Outliers with Referential Point using Evolutionary Clustering Techniques, International Journal of Computer Sciences and Engineering, Vol.7, Issue.4, pp.731-735, 2019.

  7. Das et.al., A genetic algorithm based region sampling for selection of local features in handwritten digit recognition application Applied Soft Computing Volume 12 Issue 5May, 2012 pp 1592https://doi.org/10.1016/j.asoc.2011.11.030

  8. Mingyuan Zhao et.al., Feature selection and parameter optimization for support vector machines: A new approach based on genetic algorithm with feature chromosomes May 2011Expert Systems with Applications 38(5):5197-5204 DOI: 10.1016/j.eswa.2010.10.041

  9. Chen et.al., A Parallel Genetic Algorithm Based Feature Selection and Parameter Optimization for Support Vector Machine, 2016,Scientific Programming 2016(2):1 DOI: 10.1155/2016/2739621

  10. Ebrahimi et.al., Solving NP hard problems using a new genetic algorithm, Int. J. Nonlinear Anal. Appl. 14 (2023) 1, 275285 ISSN:

2008-6822 (electronic)