Strategically Efficient Exploration in Competitive Multi-agent Reinforcement Learning
release_lpbjs45eevcpvodibns7kthkym
by
Robert Loftin, Aadirupa Saha, Sam Devlin, Katja Hofmann
2021
Abstract
High sample complexity remains a barrier to the application of reinforcement
learning (RL), particularly in multi-agent systems. A large body of work has
demonstrated that exploration mechanisms based on the principle of optimism
under uncertainty can significantly improve the sample efficiency of RL in
single agent tasks. This work seeks to understand the role of optimistic
exploration in non-cooperative multi-agent settings. We will show that, in
zero-sum games, optimistic exploration can cause the learner to waste time
sampling parts of the state space that are irrelevant to strategic play, as
they can only be reached through cooperation between both players. To address
this issue, we introduce a formal notion of strategically efficient exploration
in Markov games, and use this to develop two strategically efficient learning
algorithms for finite Markov games. We demonstrate that these methods can be
significantly more sample efficient than their optimistic counterparts.
In text/plain
format
Archived Files and Locations
application/pdf 1.1 MB
file_gages53o2fe6bmjxf3no6d2smu
|
arxiv.org (repository) web.archive.org (webarchive) |
2107.14698v1
access all versions, variants, and formats of this works (eg, pre-prints)