Information Newton's flow: second-order optimization method in
probability space
release_lpnszmnf7zcfhpmvhpor4gznwa
by
Yifei Wang, Wuchen Li
2020
Abstract
We introduce a framework for Newton's flows in probability space with
information metrics, named information Newton's flows. Here two information
metrics are considered, including both the Fisher-Rao metric and the
Wasserstein-2 metric. Several examples of information Newton's flows for
learning objective/loss functions are provided, such as Kullback-Leibler (KL)
divergence, Maximum mean discrepancy (MMD), and cross entropy. The asymptotic
convergence results of proposed Newton's methods are provided. A known fact is
that overdamped Langevin dynamics correspond to Wasserstein gradient flows of
KL divergence. Extending this fact to Wasserstein Newton's flows of KL
divergence, we derive Newton's Langevin dynamics. We provide examples of
Newton's Langevin dynamics in both one-dimensional space and Gaussian families.
For the numerical implementation, we design sampling efficient variational
methods to approximate Wasserstein Newton's directions. Several numerical
examples in Gaussian families and Bayesian logistic regression are shown to
demonstrate the effectiveness of the proposed method.
In text/plain
format
Archived Files and Locations
application/pdf 3.1 MB
file_li4ajrk6czaopjoj4es2txyubi
|
arxiv.org (repository) web.archive.org (webarchive) |
2001.04341v2
access all versions, variants, and formats of this works (eg, pre-prints)