BibTeX
CSL-JSON
MLA
Harvard
Strong Polynomiality of the Value Iteration Algorithm for Computing
Nearly Optimal Policies for Discounted Dynamic Programming
release_gss3vsncgvd3tcx2nu4xj2hjqa
by
Eugene A. Feinberg, Gaojin He
Released
as a article
.
2020
Abstract
This note provides upper bounds on the number of operations required to
compute by value iterations a nearly optimal policy for an infinite-horizon
discounted Markov decision process with a finite number of states and actions.
For a given discount factor, magnitude of the reward function, and desired
closeness to optimality, these upper bounds are strongly polynomial in the
number of state-action pairs, and one of the provided upper bounds has the
property that it is a non-decreasing function of the value of the discount
factor.
In text/plain
format
Archived Files and Locations
application/pdf 219.6 kB
file_eyd4qqezarecbexjcaelhnvdla
|
arxiv.org (repository) web.archive.org (webarchive) |
Read Archived PDF
Preserved and Accessible
arXiv
2001.10174v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
access all versions, variants, and formats of this works (eg, pre-prints)
Cite This
Lookup Links