IEEE Access, cilt.13, ss.181719-181743, 2025 (SCI-Expanded, Scopus)
In centralized machine learning, both the data and model to be trained reside on a single server, which may cause problems regarding data privacy as sensitive or personal data need to be transferred from clients to the server. Federated learning has been proposed to provide a solution to this problem by allowing the training of a model without the data leaving the clients. This training takes place between a coordinating server and the clients by continuously exchanging the model parameters instead of exchanging data. In real-life applications, the data on some of the clients or the server may be partially labeled or completely unlabeled, which poses a severe challenge to federated learning. In this paper, we present a survey of recently proposed methods that leverage unlabeled data in a federated learning setting to improve model performance. We also present a novel taxonomy of the methods that leverage unlabeled data based on whether the unlabeled data is assigned a pseudo-label during the process or not. We summarize the datasets, main data modalities, and application areas of federated learning with unlabeled data methods in the literature and highlight future research directions. We believe that this survey will be a useful guide for researchers planning to work on federated learning with partially labeled data.