VDN and QMIX are two popular value-based algorithms for cooperative MARL that
learn a centralized action value function as a monotonic mixing of per-agent
utilities. While this enables easy decentralization of the learned policy, the
restricted joint action value function can prevent them from solving tasks that
require significant coordination between agents at a given timestep. We show
that this problem can be overcome by improving the joint exploration of all
agents during training. Specifically, we propose a novel MARL approach called
Universal Value Exploration (UneVEn) that learns a set of related tasks
simultaneously with a linear decomposition of universal successor features.
With the policies of already solved related tasks, the joint exploration
process of all agents can be improved to help them achieve better coordination.
Empirical results on a set of exploration games, challenging cooperative
predator-prey tasks requiring significant coordination among agents, and
StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where
other state-of-the-art MARL methods fail.
Archived Files and Locations
|application/pdf 5.5 MB ||