Max-Sum Diversification, Monotone Submodular Functions and Dynamic
Updates
release_oqtr4m64zjewhd4xz7fulfa4uy
by
Allan Borodin, Aadhar Jain, Hyun Chul Lee, Yuli Ye
2012
Abstract
Result diversification is an important aspect in web-based search, document
summarization, facility location, portfolio management and other applications.
Given a set of ranked results for a set of objects (e.g. web documents,
facilities, etc.) with a distance between any pair, the goal is to select a
subset S satisfying the following three criteria: (a) the subset S
satisfies some constraint (e.g. bounded cardinality); (b) the subset contains
results of high "quality"; and (c) the subset contains results that are
"diverse" relative to the distance measure. The goal of result diversification
is to produce a diversified subset while maintaining high quality as much as
possible. We study a broad class of problems where the distances are a metric,
where the constraint is given by independence in a matroid, where quality is
determined by a monotone submodular function, and diversity is defined as the
sum of distances between objects in S. Our problem is a generalization of the
max sum diversification problem studied in GoSh09 which in turn is
a generaliztion of the max sum p-dispersion problem studied extensively
in location theory. It is NP-hard even with the triangle inequality. We propose
two simple and natural algorithms: a greedy algorithm for a cardinality
constraint and a local search algorithm for an arbitary matroid constraint. We
prove that both algorithms achieve constant approximation ratios.
In text/plain
format
Archived Files and Locations
application/pdf 245.1 kB
file_am5w7rndzzcxzkygnropxbj644
|
archive.org (archive) |
1203.6397v1
access all versions, variants, and formats of this works (eg, pre-prints)