Generating Automatic Curricula via Self-Supervised Active Domain
Randomization
release_ghyd7cfgbrf3zenvddldaocs6i
by
Sharath Chandra Raparthy, Bhairav Mehta, Florian Golemo, Liam Paull
2020
Abstract
Goal-directed Reinforcement Learning (RL) traditionally considers an agent
interacting with an environment, prescribing a real-valued reward to an agent
proportional to the completion of some goal. Goal-directed RL has seen large
gains in sample efficiency, due to the ease of reusing or generating new
experience by proposing goals. In this work, we build on the framework of
self-play, allowing an agent to interact with itself in order to make progress
on some unknown task. We use Active Domain Randomization and self-play to
create a novel, coupled environment-goal curriculum, where agents learn through
progressively more difficult tasks and environment variations. Our method,
Self-Supervised Active Domain Randomization (SS-ADR), generates a growing
curriculum, encouraging the agent to try tasks that are just outside of its
current capabilities, while building a domain-randomization curriculum that
enables state-of-the-art results on various sim2real transfer tasks. Our
results show that a curriculum of co-evolving the environment difficulty along
with the difficulty of goals set in each environment provides practical
benefits in the goal-directed tasks tested.
In text/plain
format
Archived Files and Locations
application/pdf 1.4 MB
file_ttzlhe2hnfeodcwwymkkulbuwe
|
arxiv.org (repository) web.archive.org (webarchive) |
2002.07911v1
access all versions, variants, and formats of this works (eg, pre-prints)