Byzantine Machine Learning Made Easy by Resilient Averaging of Momentums
release_hqh4cbzdkjegxfn6prqajti76u
by
Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, Rafael Pinot, John Stephan
2022
Abstract
Byzantine resilience emerged as a prominent topic within the distributed
machine learning community. Essentially, the goal is to enhance distributed
optimization algorithms, such as distributed SGD, in a way that guarantees
convergence despite the presence of some misbehaving (a.k.a., Byzantine)
workers. Although a myriad of techniques addressing the problem have been
proposed, the field arguably rests on fragile foundations. These techniques are
hard to prove correct and rely on assumptions that are (a) quite unrealistic,
i.e., often violated in practice, and (b) heterogeneous, i.e., making it
difficult to compare approaches.
We present RESAM (RESilient Averaging of Momentums), a unified
framework that makes it simple to establish optimal Byzantine resilience,
relying only on standard machine learning assumptions. Our framework is mainly
composed of two operators: resilient averaging at the server and
distributed momentum at the workers. We prove a general theorem stating
the convergence of distributed SGD under RESAM. Interestingly, demonstrating
and comparing the convergence of many existing techniques become direct
corollaries of our theorem, without resorting to stringent assumptions. We also
present an empirical evaluation of the practical relevance of RESAM.
In text/plain
format
Archived Files and Locations
application/pdf 6.6 MB
file_knaszrw5ljg2zetioleawoshda
|
arxiv.org (repository) web.archive.org (webarchive) |
2205.12173v1
access all versions, variants, and formats of this works (eg, pre-prints)