A General Theory for Client Sampling in Federated Learning

Autor:	Fraboni, Yann, Vidal, Richard, Kameni, Laetitia, Lorenzi, Marco
Přispěvatelé:	E-Patient : Images, données & mOdèles pour la médeciNe numériquE (EPIONE), Inria Sophia Antipolis - Méditerranée (CRISAM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Accenture Labs [Sophia Antipolis], ANR-19-CE45-0006,FED-BIOMED,Apprentissage statistique fédéré pour une nouvelle generation de méta-analyses de données biomédicales sécurisés et à grande échelle(2019), ANR-19-P3IA-0002,3IA@cote d'azur,3IA Côte d'Azur(2019)
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	FOS: Computer and information sciences Computer Science - Machine Learning Artificial Intelligence (cs.AI) Computer Science - Distributed Parallel and Cluster Computing Computer Science - Artificial Intelligence [INFO]Computer Science [cs] Distributed Parallel and Cluster Computing (cs.DC) Machine Learning (cs.LG)
Zdroj:	International Workshop on Trustworthy Federated Learning in Conjunction with IJCAI 2022 (FL-IJCAI'22) International Workshop on Trustworthy Federated Learning in Conjunction with IJCAI 2022 (FL-IJCAI'22), Jul 2022, Vienna, Austria
Popis:	International audience; While client sampling is a central operation of current state-of-the-art federated learning (FL) approaches, the impact of this procedure on the convergence and speed of FL remains under-investigated. In this work, we provide a general theoretical framework to quantify the impact of a client sampling scheme and of the clients heterogeneity on the federated optimization. First, we provide a unified theoretical ground for previously reported sampling schemes experimental results on the relationship between FL convergence and the variance of the aggregation weights. Second, we prove for the first time that the quality of FL convergence is also impacted by the resulting covariance between aggregation weights. Our theory is general, and is here applied to Multinomial Distribution (MD) and Uniform sampling, two default unbiased client sampling schemes of FL, and demonstrated through a series of experiments in non-iid and unbalanced scenarios. Our results suggest that MD sampling should be used as default sampling scheme, due to the resilience to the changes in data ratio during the learning process, while Uniform sampling is superior only in the special case when clients have the same amount of data.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8436a2d586daef422d873e2d0cc0872b https://hal.science/hal-03500307v2/document Zobrazit plný text záznamu