Optimizing k-anonymity with automated generalization trees: a study on classification utility


Saleh T., KORÇAK Ö.

International Journal of Machine Learning and Cybernetics, cilt.17, sa.2, 2026 (SCI-Expanded, Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 17 Sayı: 2
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1007/s13042-025-02910-8
  • Dergi Adı: International Journal of Machine Learning and Cybernetics
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC
  • Anahtar Kelimeler: Data generalization, Data privacy, Information loss, k-anonymity, Machine learning
  • Marmara Üniversitesi Adresli: Evet

Özet

Maintaining data privacy is a crucial and rising concern for many organizations and individuals. To address the issue of privacy, many regulations are enforced, which have direct impact on data-driven services, research and development. Anonymizing data, particularly through k-anonymization, is a common method to protect privacy. However, this process often adds noise to the data, impacting model performance due to necessary alterations. Our study introduces an automated anonymization framework that upholds the essential k-anonymity property, without requiring manual construction of generalization trees for suppression and generalization. Specifically, we propose a method that automatically constructs optimal generalization hierarchies, minimizing information loss while ensuring anonymity. We further examine how enforcing data anonymity affects machine learning model performance. To assess this impact, we use a specialized information loss metric for machine learning. Our results shows that our automated generalization strategy can outperform other well established algorithms that utilize manually generated generalization trees.