Feature selection via computational intelligence techniques


Algin R., ALKAYA A. F., AĞAOĞLU M.

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, cilt.39, sa.5, ss.6205-6216, 2020 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 39 Sayı: 5
  • Basım Tarihi: 2020
  • Doi Numarası: 10.3233/jifs-189090
  • Dergi Adı: JOURNAL OF INTELLIGENT & FUZZY SYSTEMS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Aerospace Database, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, zbMATH, Civil Engineering Abstracts
  • Sayfa Sayıları: ss.6205-6216
  • Anahtar Kelimeler: Feature selection, computational intelligence, dimensionality reduction, meta-heuristics, classification algorithms, subset evaluators, PARTICLE SWARM OPTIMIZATION, FEATURE SUBSET-SELECTION, DIFFERENTIAL EVOLUTION, ROUGH SETS, SEARCH, ALGORITHMS
  • Marmara Üniversitesi Adresli: Evet

Özet

Feature selection (FS) has become an essential task in overcoming high dimensional and complex machine learning problems. FS is a process used for reducing the size of the dataset by separating or extracting unnecessary and unrelated properties from it. This process improves the performance of classification algorithms and reduces the evaluation time by enabling the use of small sized datasets with useful features during the classification process. FS aims to gain a minimal feature subset in a problem domain while retaining the accuracy of the original data. In this study, four computational intelligence techniques, namely, migrating birds optimization (MBO), simulated annealing (SA), differential evolution (DE) and particle swarm optimization (PSO) are implemented for the FS problem as search algorithms and compared on the 17 well-known datasets taken from UCI machine learning repository where the dimension of the tackled datasets vary from 4 to 500. This is the first time that MBO is applied for solving the FS problem. In order to judge the quality of the subsets generated by the search algorithms, two different subset evaluation methods are implemented in this study. These methods are probabilistic consistency-based FS (PCFS) and correlation-based FS (CFS). Performance comparison of the algorithms is done by using three well-known classifiers; k-nearest neighbor, naive bayes and decision tree (C4.5). As a benchmark, the accuracy values found by classifiers using the datasets with all features are used. Results of the experiments show that our MBO-based filter approach outperforms the other three approaches in terms of accuracy values. In the experiments, it is also observed that as a subset evaluator CFS outperforms PCFS and as a classifier C4.5 gets better results when compared to k-nearest neighbor and naive bayes.