Distributed Submodular Maximization
release_rjddycqffrcz7j62btwea3c3tu
by
Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, Andreas Krause
2014
Abstract
Many large-scale machine learning problems--clustering, non-parametric
learning, kernel machines, etc.--require selecting a small yet representative
subset from a large dataset. Such problems can often be reduced to maximizing a
submodular set function subject to various constraints. Classical approaches to
submodular optimization require centralized access to the full dataset, which
is impractical for truly large-scale problems. In this paper, we consider the
problem of submodular function maximization in a distributed fashion. We
develop a simple, two-stage protocol GreeDi, that is easily implemented using
MapReduce style computations. We theoretically analyze our approach, and show
that under certain natural conditions, performance close to the centralized
approach can be achieved. We begin with monotone submodular maximization
subject to a cardinality constraint, and then extend this approach to obtain
approximation guarantees for (not necessarily monotone) submodular maximization
subject to more general constraints including matroid or knapsack constraints.
In our extensive experiments, we demonstrate the effectiveness of our approach
on several applications, including sparse Gaussian process inference and
exemplar based clustering on tens of millions of examples using Hadoop.
In text/plain
format
Archived Files and Locations
application/pdf 992.8 kB
file_sux5t55jibckjgkgxwccp2ggjq
|
arxiv.org (repository) web.archive.org (webarchive) |
1411.0541v1
access all versions, variants, and formats of this works (eg, pre-prints)