Model-Based Regularization for Deep Reinforcement Learning with Transcoder Networks release_3a6sczuqhzc75gjajbmrmf7umq

by Felix Leibfried, Peter Vrancx

Released as a article .

2018  

Abstract

This paper proposes a new optimization objective for value-based deep reinforcement learning. We extend conventional Deep Q-Networks (DQNs) by adding a model-learning component yielding a transcoder network. The prediction errors for the model are included in the basic DQN loss as additional regularizers. This augmented objective leads to a richer training signal that provides feedback at every time step. Moreover, because learning an environment model shares a common structure with the RL problem, we hypothesize that the resulting objective improves both sample efficiency and performance. We empirically confirm our hypothesis on a range of 20 games from the Atari benchmark attaining superior results over vanilla DQN without model-based regularization.
In text/plain format

Archived Files and Locations

application/pdf  715.0 kB
file_mpqg5r3s25h2ln4ezklxtsfrg4
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2018-09-06
Version   v1
Language   en ?
arXiv  1809.01906v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 68b93b6c-e3ed-40cb-a1e7-f3f33c58a8d1
API URL: JSON