An Investigation and Performance Evaluation of Aggregation Algorithms in Federated Learning Architecture


Onlu A. O., Akca B., BÜYÜKTANIR B., YILDIZ K., Baydogmus G. K.

2025 Innovations in Intelligent Systems and Applications Conference, ASYU 2025, Bursa, Türkiye, 10 - 12 Eylül 2025, (Tam Metin Bildiri) identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/asyu67174.2025.11208434
  • Basıldığı Şehir: Bursa
  • Basıldığı Ülke: Türkiye
  • Anahtar Kelimeler: Data Privacy, Distributed Machine Learning, Federated Aggregation Algorithms, Federated Learning, Model Aggregation
  • Marmara Üniversitesi Adresli: Evet

Özet

Centralized data collection, the basis of traditional machine learning (ML), raises significant privacy and security concerns. Federated Learning (FL) addresses these issues by enabling decentralized model training, where data remains on local devices. A central component of FL is aggregation algorithms, which combine locally trained models to update a global model, aiming to optimize performance, reduce communication costs, and enhance privacy. FL systems face major challenges stemming from statistical (non-IID data) and systemlevel (resource heterogeneity) variations. This study systematically reviews key aggregation algorithms in FL, analyzing their principles, strengths, limitations, and use cases. It evaluates algorithms including FedAvg, FedProx, FedMA, FedBE, Fed-Per, and Agnostic Federated Learning (AFL) across five visual datasets with diverse characteristics (Digit Recognizer, CIFAR-100, Butterfly, Animal, Satellite). Experiments were conducted in a four-client federated setting using CNN-based architectures with transfer learning. Performance was assessed using standard metrics: accuracy, precision, recall, and F1 score. Results show no single algorithm excels universally. FedPer performed best on MNIST (98.57%) and Butterfly (81.09%) due to its personalization capabilities. FedAvg showed strong results on CIFAR-100(73.20%) and Animal (99.16%) datasets, while FedProx provided consistent performance (e.g., 74.44% on CIFAR-100) by addressing heterogeneity. FedBE's performance varied with dataset structure, and FedMA generally underperformed. AFL, though potentially useful in risk-sensitive contexts, lacked clear metric superiority. Overall, this study highlights that selecting an FL aggregation algorithm requires evaluating multiple factors, including dataset traits, client heterogeneity, communication overhead, and personalization needs. The findings aim to support informed algorithm selection and guide future FL research.