Information-Theoretic Generalization Bounds for Stochastic Gradient Descent
release_ryxp7zbud5awtg6exqn5cvpdhy
by
Gergely Neu, Gintare Karolina Dziugaite, Mahdi Haghifam, Daniel M. Roy
2021
Abstract
We study the generalization properties of the popular stochastic optimization
method known as stochastic gradient descent (SGD) for optimizing general
non-convex loss functions. Our main contribution is providing upper bounds on
the generalization error that depend on local statistics of the stochastic
gradients evaluated along the path of iterates calculated by SGD. The key
factors our bounds depend on are the variance of the gradients (with respect to
the data distribution) and the local smoothness of the objective function along
the SGD path, and the sensitivity of the loss function to perturbations to the
final output. Our key technical tool is combining the information-theoretic
generalization bounds previously used for analyzing randomized variants of SGD
with a perturbation analysis of the iterates.
In text/plain
format
Archived Files and Locations
application/pdf 365.2 kB
file_7smqeii35bgl5lx6i2dbsfggou
|
arxiv.org (repository) web.archive.org (webarchive) |
2102.00931v3
access all versions, variants, and formats of this works (eg, pre-prints)