Model-Based Regularization for Deep Reinforcement Learning with
Transcoder Networks
release_3a6sczuqhzc75gjajbmrmf7umq
by
Felix Leibfried, Peter Vrancx
2018
Abstract
This paper proposes a new optimization objective for value-based deep
reinforcement learning. We extend conventional Deep Q-Networks (DQNs) by adding
a model-learning component yielding a transcoder network. The prediction errors
for the model are included in the basic DQN loss as additional regularizers.
This augmented objective leads to a richer training signal that provides
feedback at every time step. Moreover, because learning an environment model
shares a common structure with the RL problem, we hypothesize that the
resulting objective improves both sample efficiency and performance. We
empirically confirm our hypothesis on a range of 20 games from the Atari
benchmark attaining superior results over vanilla DQN without model-based
regularization.
In text/plain
format
Archived Files and Locations
application/pdf 715.0 kB
file_mpqg5r3s25h2ln4ezklxtsfrg4
|
arxiv.org (repository) web.archive.org (webarchive) |
1809.01906v1
access all versions, variants, and formats of this works (eg, pre-prints)