Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits
release_p5j45bcqxrbgpjwafonvnq5pfq
by
Alexander Luedtke ,
Antoine Chambaz
2016
Abstract
We study a generalization of the multi-armed bandit problem with multiple
plays where there is a cost associated with pulling each arm and the agent has
a budget at each time that dictates how much she can expect to spend. We derive
an asymptotic regret lower bound for any uniformly efficient algorithm in our
setting. We then study a variant of Thompson sampling for Bernoulli rewards and
a variant of KL-UCB for both single-parameter exponential families and bounded,
finitely supported rewards. We show these algorithms are asymptotically
optimal, both in rate and leading problem-dependent constants, including in the
thick margin setting where multiple arms fall on the decision boundary.
In text/plain
format
Archived Files and Locations
application/pdf 851.5 kB
file_jjdkcax7zfdgleerqrs4ijntwm
|
arxiv.org (repository) web.archive.org (webarchive) |
1606.09388v1
access all versions, variants, and formats of this works (eg, pre-prints)