Orthogonal Over-Parameterized Training
release_th7powibiveynhydh6i242bre4
by
Weiyang Liu, Rongmei Lin, Zhen Liu, James M. Rehg, Li Xiong, Adrian Weller, Le Song
2020
Abstract
The inductive bias of a neural network is largely determined by the
architecture and the training algorithm. To achieve good generalization, how to
effectively train a neural network is of great importance. We propose a novel
orthogonal over-parameterized training (OPT) framework that can provably
minimize the hyperspherical energy which characterizes the diversity of neurons
on a hypersphere. By maintaining the minimum hyperspherical energy during
training, OPT can greatly improve the network generalization. Specifically, OPT
fixes the randomly initialized weights of the neurons and learns an orthogonal
transformation that applies to these neurons. We propose multiple ways to learn
such an orthogonal transformation, including unrolling orthogonalization
algorithms, applying orthogonal parameterization, and designing
orthogonality-preserving gradient descent. Interestingly, OPT reveals that
learning a proper coordinate system for neurons is crucial to generalization
and may be more important than learning specific relative positions among
neurons. We provide some insights on why OPT yields better generalization.
Extensive experiments validate the superiority of OPT.
In text/plain
format
Archived Files and Locations
application/pdf 21.3 MB
file_bk4q2726ijd6plsz2npvpatwuy
|
arxiv.org (repository) web.archive.org (webarchive) |
2004.04690v3
access all versions, variants, and formats of this works (eg, pre-prints)