Learning Cooperation and Online Planning Through Simulation and Graph Convolutional Network
release_3rw4kdve6zgldpwkmmbosov2x4
by
Rafid Ameer Mahmud, Fahim Faisal, Saaduddin Mahmud, Md. Mosaddek Khan
2021
Abstract
Multi-agent Markov Decision Process (MMDP) has been an effective way of
modelling sequential decision making algorithms for multi-agent cooperative
environments. A number of algorithms based on centralized and decentralized
planning have been developed in this domain. However, dynamically changing
environment, coupled with exponential size of the state and joint action space,
make it difficult for these algorithms to provide both efficiency and
scalability. Recently, Centralized planning algorithm FV-MCTS-MP and
decentralized planning algorithm Alternate maximization with
Behavioural Cloning (ABC) have achieved notable performance in solving MMDPs.
However, they are not capable of adapting to dynamically changing environments
and accounting for the lack of communication among agents, respectively.
Against this background, we introduce a simulation based online planning
algorithm, that we call SiCLOP, for multi-agent cooperative environments.
Specifically, SiCLOP tailors Monte Carlo Tree Search (MCTS) and uses
Coordination Graph (CG) and Graph Neural Network (GCN) to learn cooperation and
provides real time solution of a MMDP problem. It also improves scalability
through an effective pruning of action space. Additionally, unlike FV-MCTS-MP
and ABC, SiCLOP supports transfer learning, which enables learned agents to
operate in different environments. We also provide theoretical discussion about
the convergence property of our algorithm within the context of multi-agent
settings. Finally, our extensive empirical results show that SiCLOP
significantly outperforms the state-of-the-art online planning algorithms.
In text/plain
format
Archived Files and Locations
application/pdf 744.6 kB
file_mlw6vehtjbbnxgjq75h37keifu
|
arxiv.org (repository) web.archive.org (webarchive) |
2110.08480v1
access all versions, variants, and formats of this works (eg, pre-prints)