Loss is its own Reward: Self-Supervision for Reinforcement Learning release_fcctw5rmhbc6bnoqj7lbqyo5ju

by Evan Shelhamer, Parsa Mahmoudieh, Max Argus, Trevor Darrell

Released as a article .

2017  

Abstract

Reinforcement learning optimizes policies for expected cumulative reward. Need the supervision be so narrow? Reward is delayed and sparse for many tasks, making it a difficult and impoverished signal for end-to-end optimization. To augment reward, we consider a range of self-supervised tasks that incorporate states, actions, and successors to provide auxiliary losses. These losses offer ubiquitous and instantaneous supervision for representation learning even in the absence of reward. While current results show that learning from reward alone is feasible, pure reinforcement learning methods are constrained by computational and data efficiency issues that can be remedied by auxiliary losses. Self-supervised pre-training and joint optimization improve the data efficiency and policy returns of end-to-end reinforcement learning.
In text/plain format

Archived Files and Locations

application/pdf  529.3 kB
file_glm3lzrmo5bexpyookzgizixwa
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2017-03-09
Version   v2
Language   en ?
arXiv  1612.07307v2
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 03b7f734-c913-44c2-85ce-86064cefb0cd
API URL: JSON