HADFL: Heterogeneity-aware Decentralized Federated Learning Framework
release_pzvrflrfbrde5a7qspp227d2r4
by
Jing Cao, Zirui Lian, Weihong Liu, Zongwei Zhu, Cheng Ji
2021
Abstract
Federated learning (FL) supports training models on geographically
distributed devices. However, traditional FL systems adopt a centralized
synchronous strategy, putting high communication pressure and model
generalization challenge. Existing optimizations on FL either fail to speedup
training on heterogeneous devices or suffer from poor communication efficiency.
In this paper, we propose HADFL, a framework that supports decentralized
asynchronous training on heterogeneous devices. The devices train model locally
with heterogeneity-aware local steps using local data. In each aggregation
cycle, they are selected based on probability to perform model synchronization
and aggregation. Compared with the traditional FL system, HADFL can relieve the
central server's communication pressure, efficiently utilize heterogeneous
computing power, and can achieve a maximum speedup of 3.15x than
decentralized-FedAvg and 4.68x than Pytorch distributed training scheme,
respectively, with almost no loss of convergence accuracy.
In text/plain
format
Archived Files and Locations
application/pdf 1.5 MB
file_pxaq6ivogffqdmyv2guw3elfzm
|
arxiv.org (repository) web.archive.org (webarchive) |
2111.08274v1
access all versions, variants, and formats of this works (eg, pre-prints)