Orthogonal Over-Parameterized Training release_th7powibiveynhydh6i242bre4

by Weiyang Liu, Rongmei Lin, Zhen Liu, James M. Rehg, Li Xiong, Adrian Weller, Le Song

Released as a article .

2020  

Abstract

The inductive bias of a neural network is largely determined by the architecture and the training algorithm. To achieve good generalization, how to effectively train a neural network is of great importance. We propose a novel orthogonal over-parameterized training (OPT) framework that can provably minimize the hyperspherical energy which characterizes the diversity of neurons on a hypersphere. By maintaining the minimum hyperspherical energy during training, OPT can greatly improve the network generalization. Specifically, OPT fixes the randomly initialized weights of the neurons and learns an orthogonal transformation that applies to these neurons. We propose multiple ways to learn such an orthogonal transformation, including unrolling orthogonalization algorithms, applying orthogonal parameterization, and designing orthogonality-preserving gradient descent. Interestingly, OPT reveals that learning a proper coordinate system for neurons is crucial to generalization and may be more important than learning specific relative positions among neurons. We provide some insights on why OPT yields better generalization. Extensive experiments validate the superiority of OPT.
In text/plain format

Archived Files and Locations

application/pdf  21.3 MB
file_bk4q2726ijd6plsz2npvpatwuy
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2020-07-19
Version   v3
Language   en ?
arXiv  2004.04690v3
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 2ae1ba12-c9de-4ac1-847e-d09bdac27e69
API URL: JSON