Stability Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel release_hucfnweu65d2hm6xraoq6urb6y

by Dominic Richards, Ilja Kuzborskij

Released as a article .

2021  

Abstract

We revisit on-average algorithmic stability of Gradient Descent (GD) for training overparameterised shallow neural networks and prove new generalisation and excess risk bounds without the Neural Tangent Kernel (NTK) or Polyak-Łojasiewicz (PL) assumptions. In particular, we show oracle type bounds which reveal that the generalisation and excess risk of GD is controlled by an interpolating network with the shortest GD path from initialisation (in a sense, an interpolating network with the smallest relative norm). While this was known for kernelised interpolants, our proof applies directly to networks trained by GD without intermediate kernelisation. At the same time, by relaxing oracle inequalities developed here we recover existing NTK-based risk bounds in a straightforward way, which demonstrates that our analysis is tighter. Finally, unlike most of the NTK-based analyses we focus on regression with label noise and show that GD with early stopping is consistent.
In text/plain format

Archived Files and Locations

application/pdf  704.5 kB
file_2ykee2lo25c7bkpwmuren23lvi
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2021-07-27
Version   v1
Language   en ?
arXiv  2107.12723v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: ce01b85f-b422-4b34-8889-90edf226c43b
API URL: JSON