Neural Network Training as an Optimal Control Problem: An Augmented Lagrangian Approach
release_cgjyaps5vndsrislrfgjj7dlh4
by
Brecht Evens, Puya Latafat, Andreas Themelis, Panagiotis Patrinos
2021
Abstract
Training of neural networks amounts to nonconvex optimization problems that
are typically solved by using backpropagation and (variants of) stochastic
gradient descent. In this work we propose an alternative approach by viewing
the training task as a nonlinear optimal control problem. Under this lens,
backpropagation amounts to the sequential approach (single shooting) to optimal
control, where the states variables have been eliminated. It is well known that
single shooting may lead to ill conditioning, and for this reason the
simultaneous approach (multiple shooting) is typically preferred. Motivated by
this hypothesis, an augmented Lagrangian algorithm is developed that only
requires an approximate solution to the Lagrangian subproblems up to a
user-defined accuracy. By applying this framework to the training of neural
networks, it is shown that the inner Lagrangian subproblems are amenable to be
solved using Gauss-Newton iterations. To fully exploit the structure of neural
networks, the resulting linear least squares problems are addressed by
employing an approach based on forward dynamic programming. Finally, the
effectiveness of our method is showcased on regression datasets.
In text/plain
format
Archived Files and Locations
application/pdf 244.7 kB
file_teph5xclc5dhti3gwuiyx3c3u4
|
arxiv.org (repository) web.archive.org (webarchive) |
2103.14343v1
access all versions, variants, and formats of this works (eg, pre-prints)