Bootstrapped Thompson Sampling and Deep Exploration
release_c6lyaztjjvhd3jvmc6xfm72yc4
by
Ian Osband, Benjamin Van Roy
2015
Abstract
This technical note presents a new approach to carrying out the kind of
exploration achieved by Thompson sampling, but without explicitly maintaining
or sampling from posterior distributions. The approach is based on a bootstrap
technique that uses a combination of observed and artificially generated data.
The latter serves to induce a prior distribution which, as we will demonstrate,
is critical to effective exploration. We explain how the approach can be
applied to multi-armed bandit and reinforcement learning problems and how it
relates to Thompson sampling. The approach is particularly well-suited for
contexts in which exploration is coupled with deep learning, since in these
settings, maintaining or generating samples from a posterior distribution
becomes computationally infeasible.
In text/plain
format
Archived Files and Locations
application/pdf 418.9 kB
file_msomju3wfjdppofyjrqec2ybym
|
arxiv.org (repository) web.archive.org (webarchive) |
1507.00300v1
access all versions, variants, and formats of this works (eg, pre-prints)