An Optimal Computing Budget Allocation Tree Policy for Monte Carlo Tree Search release_tb6xevspffhdzecvlfskulavqi

by Yunchuan Li, Michael C. Fu, Jie Xu

Released as a article .

2020  

Abstract

We analyze a tree search problem with an underlying Markov decision process, in which the goal is to identify the best action at the root that achieves the highest cumulative reward. We present a new tree policy that optimally allocates a limited computing budget to maximize a lower bound on the probability of correctly selecting the best action at each node. Compared to widely used Upper Confidence Bound (UCB) tree policies, the new tree policy presents a more balanced approach to manage the exploration and exploitation trade-off when the sampling budget is limited. Furthermore, UCB assumes that the support of reward distribution is known, whereas our algorithm relaxes this assumption. Numerical experiments demonstrate the efficiency of our algorithm in selecting the best action at the root.
In text/plain format

Archived Files and Locations

application/pdf  455.2 kB
file_6he3h2v3wbbn3el2bx7fnxmuu4
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2020-09-25
Version   v1
Language   en ?
arXiv  2009.12407v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 6275b9ba-d27a-46c2-928b-0e463c03cff6
API URL: JSON