Strong Polynomiality of the Value Iteration Algorithm for Computing Nearly Optimal Policies for Discounted Dynamic Programming release_gss3vsncgvd3tcx2nu4xj2hjqa

by Eugene A. Feinberg, Gaojin He

Released as a article .

2020  

Abstract

This note provides upper bounds on the number of operations required to compute by value iterations a nearly optimal policy for an infinite-horizon discounted Markov decision process with a finite number of states and actions. For a given discount factor, magnitude of the reward function, and desired closeness to optimality, these upper bounds are strongly polynomial in the number of state-action pairs, and one of the provided upper bounds has the property that it is a non-decreasing function of the value of the discount factor.
In text/plain format

Archived Files and Locations

application/pdf  219.6 kB
file_eyd4qqezarecbexjcaelhnvdla
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2020-01-28
Version   v1
Language   en ?
arXiv  2001.10174v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 70dfc4a1-fe16-40fd-aaf9-e1a887ac6eaf
API URL: JSON