Bootstrapped Thompson Sampling and Deep Exploration release_c6lyaztjjvhd3jvmc6xfm72yc4

by Ian Osband, Benjamin Van Roy

Released as a article .

2015  

Abstract

This technical note presents a new approach to carrying out the kind of exploration achieved by Thompson sampling, but without explicitly maintaining or sampling from posterior distributions. The approach is based on a bootstrap technique that uses a combination of observed and artificially generated data. The latter serves to induce a prior distribution which, as we will demonstrate, is critical to effective exploration. We explain how the approach can be applied to multi-armed bandit and reinforcement learning problems and how it relates to Thompson sampling. The approach is particularly well-suited for contexts in which exploration is coupled with deep learning, since in these settings, maintaining or generating samples from a posterior distribution becomes computationally infeasible.
In text/plain format

Archived Files and Locations

application/pdf  418.9 kB
file_msomju3wfjdppofyjrqec2ybym
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2015-07-01
Version   v1
Language   en ?
arXiv  1507.00300v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 4443b698-abe6-49c8-bdf5-da5bc9cd65a4
API URL: JSON