Learning to Accelerate by the Methods of Step-size Planning release_bzqaueek45cwzi6fhij7cj2yp4

by Hengshuai Yao

Released as a article .

2022  

Abstract

Gradient descent is slow to converge for ill-conditioned problems and non-convex problems. An important technique for acceleration is step-size adaptation. The first part of this paper contains a detailed review of step-size adaptation methods, including Polyak step-size, L4, LossGrad, Adam, IDBD, and Hypergradient descent, and the relation of step-size adaptation to meta-gradient methods. In the second part of this paper, we propose a new class of methods of accelerating gradient descent that have some distinctiveness from existing techniques. The new methods, which we call step-size planning, use the update experience to learn an improved way of updating the parameters. The methods organize the experience into K steps away from each other to facilitate planning. From the past experience, our planning algorithm, Csawg, learns a step-size model which is a form of multi-step machine that predicts future updates. We extends Csawg to applying step-size planning multiple steps, which leads to further speedup. We discuss and highlight the projection power of the diagonal-matrix step-size for future large scale applications. We show for a convex problem, our methods can surpass the convergence rate of Nesterov's accelerated gradient, 1 - √(μ/L), where μ, L are the strongly convex factor of the loss function F and the Lipschitz constant of F', which is the theoretical limit for the convergence rate of first-order methods. On the well-known non-convex Rosenbrock function, our planning methods achieve zero error below 500 gradient evaluations, while gradient descent takes about 10000 gradient evaluations to reach a 10^-3 accuracy. We discuss the connection of step-size planing to planning in reinforcement learning, in particular, Dyna architectures.
In text/plain format

Archived Files and Locations

application/pdf  4.2 MB
file_dc2gfkeetjhxxomhauh7gzo2za
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2022-04-15
Version   v3
Language   en ?
arXiv  2204.01705v3
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: ebf35f6c-1ed6-4770-a6f4-d49255c964fa
API URL: JSON