Deep Reinforcement Learning with Weighted Q-Learning release_mhybvtbsofelxmd2npfrvql354

by Andrea Cini, Carlo D'Eramo, Jan Peters, Cesare Alippi

Released as a article .

2020  

Abstract

Overestimation of the maximum action-value is a well-known problem that hinders Q-Learning performance, leading to suboptimal policies and unstable learning. Among several Q-Learning variants proposed to address this issue, Weighted Q-Learning (WQL) effectively reduces the bias and shows remarkable results in stochastic environments. WQL uses a weighted sum of the estimated action-values, where the weights correspond to the probability of each action-value being the maximum; however, the computation of these probabilities is only practical in the tabular settings. In this work, we provide the methodological advances to benefit from the WQL properties in Deep Reinforcement Learning (DRL), by using neural networks with Dropout Variational Inference as an effective approximation of deep Gaussian processes. In particular, we adopt the Concrete Dropout variant to obtain calibrated estimates of epistemic uncertainty in DRL. We show that model uncertainty in DRL can be useful not only for action selection, but also action evaluation. We analyze how the novel Weighted Deep Q-Learning algorithm reduces the bias w.r.t. relevant baselines and provide empirical evidence of its advantages on several representative benchmarks.
In text/plain format

Archived Files and Locations

application/pdf  2.4 MB
file_azq7hlacdrayzhkfsnsmnwfury
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2020-03-30
Version   v2
Language   en ?
arXiv  2003.09280v2
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 462c6cb8-0984-4014-82e4-6288151bbdc5
API URL: JSON