Positively Scale-Invariant Flatness of ReLU Neural Networks
release_7ok3ds7lczad7h5bkm4kinbr3e
by
Mingyang Yi, Qi Meng, Wei Chen, Zhi-ming Ma, Tie-Yan Liu
2019
Abstract
It was empirically confirmed by Keskar et al.SharpMinima that flatter
minima generalize better. However, for the popular ReLU network, sharp minimum
can also generalize well SharpMinimacan. The conclusion demonstrates
that the existing definitions of flatness fail to account for the complex
geometry of ReLU neural networks because they can't cover the Positively
Scale-Invariant (PSI) property of ReLU network. In this paper, we formalize the
PSI causes problem of existing definitions of flatness and propose a new
description of flatness - PSI-flatness. PSI-flatness is defined on the
values of basis paths GSGD instead of weights. Values of basis paths
have been shown to be the PSI-variables and can sufficiently represent the ReLU
neural networks which ensure the PSI property of PSI-flatness. Then we study
the relation between PSI-flatness and generalization theoretically and
empirically. First, we formulate a generalization bound based on PSI-flatness
which shows generalization error decreasing with the ratio between the largest
basis path value and the smallest basis path value. That is to say, the minimum
with balanced values of basis paths will more likely to be flatter and
generalize better. Finally. we visualize the PSI-flatness of loss surface
around two learned models which indicates the minimum with smaller PSI-flatness
can indeed generalize better.
In text/plain
format
Archived Files and Locations
application/pdf 890.5 kB
file_napy7wr6nzgbbj4be2znfikqca
|
arxiv.org (repository) web.archive.org (webarchive) |
1903.02237v1
access all versions, variants, and formats of this works (eg, pre-prints)