Online Markov decision processes with Kullback-Leibler control cost release_bbct44wmeba3dmz3fxdmfl2aua

by Peng Guan and Maxim Raginsky and Rebecca Willett

Released as a article .

2014  

Abstract

This paper considers an online (real-time) control problem that involves an agent performing a discrete-time random walk over a finite state space. The agent's action at each time step is to specify the probability distribution for the next state given the current state. Following the set-up of Todorov, the state-action cost at each time step is a sum of a state cost and a control cost given by the Kullback-Leibler (KL) divergence between the agent's next-state distribution and that determined by some fixed passive dynamics. The online aspect of the problem is due to the fact that the state cost functions are generated by a dynamic environment, and the agent learns the current state cost only after selecting an action. An explicit construction of a computationally efficient strategy with small regret (i.e., expected difference between its actual total cost and the smallest cost attainable using noncausal knowledge of the state costs) under mild regularity conditions is presented, along with a demonstration of the performance of the proposed strategy on a simulated target tracking problem. A number of new results on Markov decision processes with KL control cost are also obtained.
In text/plain format

Archived Files and Locations

application/pdf  893.5 kB
file_wlmr5hc75vagdemkbvd3t3pa6m
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2014-01-14
Version   v1
Language   en ?
arXiv  1401.3198v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: e19b983d-23da-4557-a22f-b6ea42990307
API URL: JSON