Distributed Submodular Maximization release_rjddycqffrcz7j62btwea3c3tu

by Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, Andreas Krause

Released as a article .

2014  

Abstract

Many large-scale machine learning problems--clustering, non-parametric learning, kernel machines, etc.--require selecting a small yet representative subset from a large dataset. Such problems can often be reduced to maximizing a submodular set function subject to various constraints. Classical approaches to submodular optimization require centralized access to the full dataset, which is impractical for truly large-scale problems. In this paper, we consider the problem of submodular function maximization in a distributed fashion. We develop a simple, two-stage protocol GreeDi, that is easily implemented using MapReduce style computations. We theoretically analyze our approach, and show that under certain natural conditions, performance close to the centralized approach can be achieved. We begin with monotone submodular maximization subject to a cardinality constraint, and then extend this approach to obtain approximation guarantees for (not necessarily monotone) submodular maximization subject to more general constraints including matroid or knapsack constraints. In our extensive experiments, we demonstrate the effectiveness of our approach on several applications, including sparse Gaussian process inference and exemplar based clustering on tens of millions of examples using Hadoop.
In text/plain format

Archived Files and Locations

application/pdf  992.8 kB
file_sux5t55jibckjgkgxwccp2ggjq
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2014-11-03
Version   v1
Language   en ?
arXiv  1411.0541v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 4cc618a2-a03d-4c9a-96fe-a39b926c3b4a
API URL: JSON