Leveraging Unlabeled Data in Federated Learning: A Review

ULUDAĞ, KÜBRA; Erdem, Cigdem; KORÇAK, ÖMER

doi:10.1109/access.2025.3623444

Leveraging Unlabeled Data in Federated Learning: A Review

ULUDAĞ K., Erdem C. E., KORÇAK Ö.

IEEE Access, cilt.13, ss.181719-181743, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Derleme
Cilt numarası: 13
Basım Tarihi: 2025
Doi Numarası: 10.1109/access.2025.3623444
Dergi Adı: IEEE Access
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
Sayfa Sayıları: ss.181719-181743
Anahtar Kelimeler: Federated learning, privacy protection, pseudo-labeling, self-supervised learning, semi-supervised learning
Marmara Üniversitesi Adresli: Evet

Özet

In centralized machine learning, both the data and model to be trained reside on a single server, which may cause problems regarding data privacy as sensitive or personal data need to be transferred from clients to the server. Federated learning has been proposed to provide a solution to this problem by allowing the training of a model without the data leaving the clients. This training takes place between a coordinating server and the clients by continuously exchanging the model parameters instead of exchanging data. In real-life applications, the data on some of the clients or the server may be partially labeled or completely unlabeled, which poses a severe challenge to federated learning. In this paper, we present a survey of recently proposed methods that leverage unlabeled data in a federated learning setting to improve model performance. We also present a novel taxonomy of the methods that leverage unlabeled data based on whether the unlabeled data is assigned a pseudo-label during the process or not. We summarize the datasets, main data modalities, and application areas of federated learning with unlabeled data methods in the literature and highlight future research directions. We believe that this survey will be a useful guide for researchers planning to work on federated learning with partially labeled data.