More Efficient Exploration with Symbolic Priors on Action Sequence Equivalences release_jg6vugxkijfvhbczttdvrvieoq

by Toby Johnstone, Nathan Grinsztajn, Johan Ferret, Philippe Preux

Released as a article .



Incorporating prior knowledge in reinforcement learning algorithms is mainly an open question. Even when insights about the environment dynamics are available, reinforcement learning is traditionally used in a tabula rasa setting and must explore and learn everything from scratch. In this paper, we consider the problem of exploiting priors about action sequence equivalence: that is, when different sequences of actions produce the same effect. We propose a new local exploration strategy calibrated to minimize collisions and maximize new state visitations. We show that this strategy can be computed at little cost, by solving a convex optimization problem. By replacing the usual epsilon-greedy strategy in a DQN, we demonstrate its potential in several environments with various dynamic structures.
In text/plain format

Archived Files and Locations

There are no accessible files associated with this release. You could check other releases for this work for an accessible version.

"Dark" Archived
Save Paper Now!

Know of a fulltext copy of on the public web? Submit a URL and we will archive it

Type  article
Stage   submitted
Date   2021-11-07
Version   v2
Language   en ?
arXiv  2110.10632v2
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 54843558-11bd-4600-8860-04e6cf31bc4e