More Efficient Exploration with Symbolic Priors on Action Sequence Equivalences release_znpghk2iuje6rafucizivj4mfu

by Toby Johnstone, Nathan Grinsztajn, Johan Ferret, Philippe Preux

Released as a article .



Incorporating prior knowledge in reinforcement learning algorithms is mainly an open question. Even when insights about the environment dynamics are available, reinforcement learning is traditionally used in a tabula rasa setting and must explore and learn everything from scratch. In this paper, we consider the problem of exploiting priors about action sequence equivalence: that is, when different sequences of actions produce the same effect. We propose a new local exploration strategy calibrated to minimize collisions and maximize new state visitations. We show that this strategy can be computed at little cost, by solving a convex optimization problem. By replacing the usual epsilon-greedy strategy in a DQN, we demonstrate its potential in several environments with various dynamic structures.
In text/plain format

Archived Files and Locations

application/pdf  809.2 kB
file_olvdnw5yfrfrtglpeg2uxpzdfq (repository) (webarchive)
Read Archived PDF
Type  article
Stage   submitted
Date   2021-10-20
Version   v1
Language   en ?
arXiv  2110.10632v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 46723cd7-6fbb-44ef-bc34-4a82cd33d597