Identity Crisis: Memorization and Generalization under Extreme
Overparameterization
release_d65nrx25tjdkvdmecfwzxuzame
by
Chiyuan Zhang and Samy Bengio and Moritz Hardt and Michael C. Mozer
and Yoram Singer
2019
Abstract
We study the interplay between memorization and generalization of
overparameterized networks in the extreme case of a single training example and
the task of an input pattern mapping to itself on the output. We examine
fully-connected and convolutional networks (FCN and CNN), both linear and
nonlinear, initialized randomly and then trained to minimize the reconstruction
error. The trained networks stereotypically take one of the two forms: the
constant function (memorization) and the identity function (generalization). We
show that different architectures exhibit strikingly different inductive
biases, which are sensitive to many architectural decisions. For example, CNNs
of up to 10 layers are able to generalize from a single example, whereas FCNs
cannot learn the identity function reliably from 60k examples. Deeper CNNs
often fail, but nonetheless do astonishing work to memorize the training
output: because CNN biases are location invariant, the model must progressively
grow an output pattern from the image boundaries via the coordination of many
layers. Our work helps to quantify and visualize inductive biases due to
architectural choices such as depth, kernel width, and number of channels.
In text/plain
format
Archived Files and Locations
application/pdf 12.4 MB
file_uwtm3fu2r5bkjgrrvcde47w5ha
|
arxiv.org (repository) web.archive.org (webarchive) |
1902.04698v3
access all versions, variants, and formats of this works (eg, pre-prints)