Incorporating prior knowledge in reinforcement learning algorithms is mainly
an open question. Even when insights about the environment dynamics are
available, reinforcement learning is traditionally used in a tabula rasa
setting and must explore and learn everything from scratch. In this paper, we
consider the problem of exploiting priors about action sequence equivalence:
that is, when different sequences of actions produce the same effect. We
propose a new local exploration strategy calibrated to minimize collisions and
maximize new state visitations. We show that this strategy can be computed at
little cost, by solving a convex optimization problem. By replacing the usual
epsilon-greedy strategy in a DQN, we demonstrate its potential in several
environments with various dynamic structures.
Archived Files and Locations
There are no accessible files associated with this release. You could check other releases for this work for an accessible version.
Know of a fulltext copy of on the public web? Submit a URL and we will archive it
access all versions, variants, and formats of this works (eg, pre-prints)