SAFA: a Semi-Asynchronous Protocol for Fast Federated Learning with Low Overhead
release_cjin5wkwl5gy3o2hdhqmwmmhlq
by
Wentai Wu, Ligang He, Weiwei Lin, Rui Mao, Carsten Maple, Stephen Jarvis
2021
Abstract
Federated learning (FL) has attracted increasing attention as a promising
approach to driving a vast number of end devices with artificial intelligence.
However, it is very challenging to guarantee the efficiency of FL considering
the unreliable nature of end devices while the cost of device-server
communication cannot be neglected. In this paper, we propose SAFA, a
semi-asynchronous FL protocol, to address the problems in federated learning
such as low round efficiency and poor convergence rate in extreme conditions
(e.g., clients dropping offline frequently). We introduce novel designs in the
steps of model distribution, client selection and global aggregation to
mitigate the impacts of stragglers, crashes and model staleness in order to
boost efficiency and improve the quality of the global model. We have conducted
extensive experiments with typical machine learning tasks. The results
demonstrate that the proposed protocol is effective in terms of shortening
federated round duration, reducing local resource wastage, and improving the
accuracy of the global model at an acceptable communication cost.
In text/plain
format
Archived Content
There are no accessible files associated with this release. You could check other releases for this work for an accessible version.
Know of a fulltext copy of on the public web? Submit a URL and we will archive it
1910.01355v4
access all versions, variants, and formats of this works (eg, pre-prints)