A Fast Anderson-Chebyshev Mixing Method for Nonlinear Optimization
release_brs7lhrrrvdytln2iuctpuwhje
by
Zhize Li, Jian Li
2018
Abstract
Anderson mixing (or Anderson acceleration) is an efficient acceleration
method for fixed point iterations x_t+1=G(x_t), e.g., gradient descent can
be viewed as iteratively applying the operation G(x) = x-α∇ f(x).
It is known that Anderson mixing is quite efficient in practice and can be
viewed as an extension of Krylov subspace methods for nonlinear problems. In
this paper, we show that Anderson mixing with Chebyshev polynomial parameters
can achieve the optimal convergence rate
O(√(κ)1/ϵ), which improves the previous result
O(κ1/ϵ) provided by [Toth and Kelley, 2015] for
quadratic functions. Then, we provide a convergence analysis for minimizing
general nonlinear problems. Besides, if the hyperparameters (e.g., the
Lipschitz smooth parameter L) are not available, we propose a guessing
algorithm for guessing them dynamically and also prove a similar convergence
rate. Finally, the experimental results demonstrate that the proposed
Anderson-Chebyshev mixing method converges significantly faster than other
algorithms, e.g., vanilla gradient descent (GD), Nesterov's Accelerated GD.
Also, these algorithms combined with the proposed guessing algorithm (guessing
the hyperparameters dynamically) achieve much better performance.
In text/plain
format
Archived Files and Locations
application/pdf 977.6 kB
file_j5elhmvatnfubehd3gfmvxwsom
|
arxiv.org (repository) web.archive.org (webarchive) |
1809.02341v1
access all versions, variants, and formats of this works (eg, pre-prints)