Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning release_7amibqzhqfd5peikwrc5alvhrm

by Anuj Mahajan, Mikayel Samvelyan, Lei Mao, Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Animashree Anandkumar

Entity Metadata (schema)

abstracts[] {'sha1': 'fe4d3bd3edcb64c9750378be58bdf1b568d227c2', 'content': "Reinforcement Learning in large action spaces is a challenging problem.\nCooperative multi-agent reinforcement learning (MARL) exacerbates matters by\nimposing various constraints on communication and observability. In this work,\nwe consider the fundamental hurdle affecting both value-based and\npolicy-gradient approaches: an exponential blowup of the action space with the\nnumber of agents. For value-based methods, it poses challenges in accurately\nrepresenting the optimal value function. For policy gradient methods, it makes\ntraining the critic difficult and exacerbates the problem of the lagging\ncritic. We show that from a learning theory perspective, both problems can be\naddressed by accurately representing the associated action-value function with\na low-complexity hypothesis class. This requires accurately modelling the agent\ninteractions in a sample efficient way. To this end, we propose a novel\ntensorised formulation of the Bellman equation. This gives rise to our method\nTesseract, which views the Q-function as a tensor whose modes correspond to the\naction spaces of different agents. Algorithms derived from Tesseract decompose\nthe Q-tensor across agents and utilise low-rank tensor approximations to model\nagent interactions relevant to the task. We provide PAC analysis for\nTesseract-based algorithms and highlight their relevance to the class of rich\nobservation MDPs. Empirical results in different domains confirm Tesseract's\ngains in sample efficiency predicted by the theory.", 'mimetype': 'text/plain', 'lang': 'en'}
container
container_id
contribs[] {'index': 0, 'creator_id': None, 'creator': None, 'raw_name': 'Anuj Mahajan', 'given_name': None, 'surname': None, 'role': 'author', 'raw_affiliation': None, 'extra': None}
{'index': 1, 'creator_id': None, 'creator': None, 'raw_name': 'Mikayel Samvelyan', 'given_name': None, 'surname': None, 'role': 'author', 'raw_affiliation': None, 'extra': None}
{'index': 2, 'creator_id': None, 'creator': None, 'raw_name': 'Lei Mao', 'given_name': None, 'surname': None, 'role': 'author', 'raw_affiliation': None, 'extra': None}
{'index': 3, 'creator_id': None, 'creator': None, 'raw_name': 'Viktor Makoviychuk', 'given_name': None, 'surname': None, 'role': 'author', 'raw_affiliation': None, 'extra': None}
{'index': 4, 'creator_id': None, 'creator': None, 'raw_name': 'Animesh Garg', 'given_name': None, 'surname': None, 'role': 'author', 'raw_affiliation': None, 'extra': None}
{'index': 5, 'creator_id': None, 'creator': None, 'raw_name': 'Jean Kossaifi', 'given_name': None, 'surname': None, 'role': 'author', 'raw_affiliation': None, 'extra': None}
{'index': 6, 'creator_id': None, 'creator': None, 'raw_name': 'Shimon Whiteson', 'given_name': None, 'surname': None, 'role': 'author', 'raw_affiliation': None, 'extra': None}
{'index': 7, 'creator_id': None, 'creator': None, 'raw_name': 'Yuke Zhu', 'given_name': None, 'surname': None, 'role': 'author', 'raw_affiliation': None, 'extra': None}
{'index': 8, 'creator_id': None, 'creator': None, 'raw_name': 'Animashree Anandkumar', 'given_name': None, 'surname': None, 'role': 'author', 'raw_affiliation': None, 'extra': None}
ext_ids {'doi': None, 'wikidata_qid': None, 'isbn13': None, 'pmid': None, 'pmcid': None, 'core': None, 'arxiv': '2106.00136v1', 'jstor': None, 'ark': None, 'mag': None, 'doaj': None, 'dblp': None, 'oai': None, 'hdl': None}
files[] {'state': 'active', 'ident': 'cmvbcbbrnrhvjciuksftjaz5fu', 'revision': 'b291458e-f013-461e-ad86-fb5033208185', 'redirect': None, 'extra': None, 'edit_extra': None, 'size': 2971767, 'md5': '19fd6daf552d18c737e1889d5a6d2227', 'sha1': '9c477fc2d1fbe27abad912b3001b3c176bf83667', 'sha256': '462c66c2e43ad48de27f16d5400ec445a058317c13bf17f7dc5769727e3b0c42', 'urls': [{'url': 'https://arxiv.org/pdf/2106.00136v1.pdf', 'rel': 'repository'}, {'url': 'https://web.archive.org/web/20210606073301/https://arxiv.org/pdf/2106.00136v1.pdf', 'rel': 'webarchive'}], 'mimetype': 'application/pdf', 'content_scope': None, 'release_ids': ['7amibqzhqfd5peikwrc5alvhrm'], 'releases': None}
filesets []
issue
language en
license_slug CC-BY
number
original_title
pages
publisher
refs []
release_date 2021-05-31
release_stage submitted
release_type article
release_year 2021
subtitle
title Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning
version v1
volume
webcaptures []
withdrawn_date
withdrawn_status
withdrawn_year
work_id zera42sj2vaazamjhdz7p44kwy
As JSON via API

Extra Metadata (raw JSON)

arxiv.base_id 2106.00136
arxiv.categories ['cs.LG']
arxiv.comments 38th International Conference on Machine Learning, PMLR 139, 2021