Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits
release_r6kewsgdjnhadg2f3udztbjeo4
by
Alexander Luedtke ,
Antoine Chambaz
2017
Abstract
We study a generalization of the multi-armed bandit problem with multiple
plays where there is a cost associated with pulling each arm and the agent has
a budget at each time that dictates how much she can expect to spend. We derive
an asymptotic regret lower bound for any uniformly efficient algorithm in our
setting. We then study a variant of Thompson sampling for Bernoulli rewards and
a variant of KL-UCB for both single-parameter exponential families and bounded,
finitely supported rewards. We show these algorithms are asymptotically
optimal, both in rate and leading problem-dependent constants, including in the
thick margin setting where multiple arms fall on the decision boundary.
In text/plain
format
Archived Files and Locations
application/pdf 710.0 kB
file_ec7ph4ss3jhkpazazee33rkgfu
|
arxiv.org (repository) web.archive.org (webarchive) |
1606.09388v2
access all versions, variants, and formats of this works (eg, pre-prints)