Strategically Efficient Exploration in Competitive Multi-agent Reinforcement Learning release_lpbjs45eevcpvodibns7kthkym

by Robert Loftin, Aadirupa Saha, Sam Devlin, Katja Hofmann

Released as a article .

2021  

Abstract

High sample complexity remains a barrier to the application of reinforcement learning (RL), particularly in multi-agent systems. A large body of work has demonstrated that exploration mechanisms based on the principle of optimism under uncertainty can significantly improve the sample efficiency of RL in single agent tasks. This work seeks to understand the role of optimistic exploration in non-cooperative multi-agent settings. We will show that, in zero-sum games, optimistic exploration can cause the learner to waste time sampling parts of the state space that are irrelevant to strategic play, as they can only be reached through cooperation between both players. To address this issue, we introduce a formal notion of strategically efficient exploration in Markov games, and use this to develop two strategically efficient learning algorithms for finite Markov games. We demonstrate that these methods can be significantly more sample efficient than their optimistic counterparts.
In text/plain format

Archived Files and Locations

application/pdf  1.1 MB
file_gages53o2fe6bmjxf3no6d2smu
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2021-07-30
Version   v1
Language   en ?
arXiv  2107.14698v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 91ac3863-bb57-4cd6-b93b-9edfa1206052
API URL: JSON