LAFS: A Fast, Differentiable Approach to Feature Selection Using Learnable Attention


Topçuoğlu H., EVREN A. A., TUNA E., USTAOĞLU E.

Entropy, cilt.28, sa.1, 2026 (SCI-Expanded, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 28 Sayı: 1
  • Basım Tarihi: 2026
  • Doi Numarası: 10.3390/e28010020
  • Dergi Adı: Entropy
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, INSPEC, zbMATH, Directory of Open Access Journals
  • Anahtar Kelimeler: attention mechanism, deep learning, feature selection, information theory, tabular data
  • Marmara Üniversitesi Adresli: Evet

Özet

Feature selection is a critical preprocessing step for mitigating the curse of dimensionality in machine learning. Existing methods present a difficult trade-off: filter methods are fast but often suboptimal as they evaluate features in isolation, while wrapper methods are powerful but computationally prohibitive due to their iterative nature. In this paper, we propose LAFS (Learnable Attention for Feature Selection), a novel, end-to-end differentiable framework that achieves the performance of wrapper methods at the speed of simpler models. LAFS employs a neural attention mechanism to learn a context-aware importance score for all features simultaneously in a single forward pass. To encourage the selection of a sparse and non-redundant feature subset, we introduce a novel hybrid loss function that combines the standard classification objective with an information-theoretic entropic regularizer on the attention weights. We validate our approach on real-world high-dimensional benchmark datasets. Our experiments demonstrate that LAFS successfully identifies complex feature interactions and handles multicollinearity. In general comparison, LAFS achieves very close and accurate results to state-of-the-art RFE-LGBM and embedded FSA methods. Our work establishes a new point on the accuracy-efficiency frontier, demonstrating that attention-based architectures provide a compatible solution to the feature selection problem.