Kernel and Rich Regimes in Overparametrized Models
release_fgze2767q5aj7o2tly3u2kw5ni
by
Blake Woodworth, Suriya Gunasekar, Pedro Savarese, Edward Moroshko,
Itay Golan, Jason Lee, Daniel Soudry, Nathan Srebro
2020
Abstract
A recent line of work studies overparametrized neural networks in the "kernel
regime," i.e. when the network behaves during training as a kernelized linear
predictor, and thus training with gradient descent has the effect of finding
the minimum RKHS norm solution. This stands in contrast to other studies which
demonstrate how gradient descent on overparametrized multilayer networks can
induce rich implicit biases that are not RKHS norms. Building on an observation
by Chizat and Bach, we show how the scale of the initialization controls the
transition between the "kernel" (aka lazy) and "rich" (aka active) regimes and
affects generalization properties in multilayer homogeneous models. We provide
a complete and detailed analysis for a simple two-layer model that already
exhibits an interesting and meaningful transition between the kernel and rich
regimes, and we demonstrate the transition for more complex matrix
factorization models and multilayer non-linear networks.
In text/plain
format
Archived Content
There are no accessible files associated with this release. You could check other releases for this work for an accessible version.
Know of a fulltext copy of on the public web? Submit a URL and we will archive it
1906.05827v3
access all versions, variants, and formats of this works (eg, pre-prints)