{"DOI":"10.2478/jaiscr-2020-0013","abstract":"AbstractWe consider the problem of multi agents cooperating in a partially-observable environment. Agents must learn to coordinate and share relevant information to solve the tasks successfully. This article describes Asynchronous Advantage Actor-Critic with Communication (A3C2), an end-to-end differentiable approach where agents learn policies and communication protocols simultaneously. A3C2 uses a centralized learning, distributed execution paradigm, supports independent agents, dynamic team sizes, partially-observable environments, and noisy communications. We compare and show that A3C2 outperforms other state-of-the-art proposals in multiple environments.","author":[{"family":"Sim\u00f5es","given":"David"},{"family":"Lau","given":"Nuno"},{"family":"Reis","given":"Lu\u00eds Paulo"}],"id":"unknown","issued":{"date-parts":[[2020,7,1]]},"page-first":"189","publisher":"Walter de Gruyter GmbH","title":"Multi Agent Deep Learning with Cooperative Communication","type":"article-journal","volume":"10"}