Optimizing k-anonymity with automated generalization trees: a study on classification utility

Saleh, Taj; KORÇAK, ÖMER

doi:10.1007/s13042-025-02910-8

Optimizing k-anonymity with automated generalization trees: a study on classification utility

Saleh T., KORÇAK Ö.

International Journal of Machine Learning and Cybernetics, cilt.17, sa.2, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 17 Sayı: 2
Basım Tarihi: 2026
Doi Numarası: 10.1007/s13042-025-02910-8
Dergi Adı: International Journal of Machine Learning and Cybernetics
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC
Anahtar Kelimeler: Data generalization, Data privacy, Information loss, k-anonymity, Machine learning
Marmara Üniversitesi Adresli: Evet

Özet

Maintaining data privacy is a crucial and rising concern for many organizations and individuals. To address the issue of privacy, many regulations are enforced, which have direct impact on data-driven services, research and development. Anonymizing data, particularly through k-anonymization, is a common method to protect privacy. However, this process often adds noise to the data, impacting model performance due to necessary alterations. Our study introduces an automated anonymization framework that upholds the essential k-anonymity property, without requiring manual construction of generalization trees for suppression and generalization. Specifically, we propose a method that automatically constructs optimal generalization hierarchies, minimizing information loss while ensuring anonymity. We further examine how enforcing data anonymity affects machine learning model performance. To assess this impact, we use a specialized information loss metric for machine learning. Our results shows that our automated generalization strategy can outperform other well established algorithms that utilize manually generated generalization trees.