Stability Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel
release_hucfnweu65d2hm6xraoq6urb6y
by
Dominic Richards, Ilja Kuzborskij
2021
Abstract
We revisit on-average algorithmic stability of Gradient Descent (GD) for
training overparameterised shallow neural networks and prove new generalisation
and excess risk bounds without the Neural Tangent Kernel (NTK) or
Polyak-Łojasiewicz (PL) assumptions. In particular, we show oracle type
bounds which reveal that the generalisation and excess risk of GD is controlled
by an interpolating network with the shortest GD path from initialisation (in a
sense, an interpolating network with the smallest relative norm). While this
was known for kernelised interpolants, our proof applies directly to networks
trained by GD without intermediate kernelisation. At the same time, by relaxing
oracle inequalities developed here we recover existing NTK-based risk bounds in
a straightforward way, which demonstrates that our analysis is tighter.
Finally, unlike most of the NTK-based analyses we focus on regression with
label noise and show that GD with early stopping is consistent.
In text/plain
format
Archived Files and Locations
application/pdf 704.5 kB
file_2ykee2lo25c7bkpwmuren23lvi
|
arxiv.org (repository) web.archive.org (webarchive) |
2107.12723v1
access all versions, variants, and formats of this works (eg, pre-prints)