International Journal of Machine Learning and Cybernetics, cilt.17, sa.2, 2026 (SCI-Expanded, Scopus)
Maintaining data privacy is a crucial and rising concern for many organizations and individuals. To address the issue of privacy, many regulations are enforced, which have direct impact on data-driven services, research and development. Anonymizing data, particularly through k-anonymization, is a common method to protect privacy. However, this process often adds noise to the data, impacting model performance due to necessary alterations. Our study introduces an automated anonymization framework that upholds the essential k-anonymity property, without requiring manual construction of generalization trees for suppression and generalization. Specifically, we propose a method that automatically constructs optimal generalization hierarchies, minimizing information loss while ensuring anonymity. We further examine how enforcing data anonymity affects machine learning model performance. To assess this impact, we use a specialized information loss metric for machine learning. Our results shows that our automated generalization strategy can outperform other well established algorithms that utilize manually generated generalization trees.