Regularization Matters in Policy Optimization
release_ifdtx2wqbffubkxg3n352vnkjy
by
Zhuang Liu, Xuanlin Li, Bingyi Kang, Trevor Darrell
2021
Abstract
Deep Reinforcement Learning (Deep RL) has been receiving increasingly more
attention thanks to its encouraging performance on a variety of control tasks.
Yet, conventional regularization techniques in training neural networks (e.g.,
L_2 regularization, dropout) have been largely ignored in RL methods,
possibly because agents are typically trained and evaluated in the same
environment, and because the deep RL community focuses more on high-level
algorithm designs. In this work, we present the first comprehensive study of
regularization techniques with multiple policy optimization algorithms on
continuous control tasks. Interestingly, we find conventional regularization
techniques on the policy networks can often bring large improvement, especially
on harder tasks. Our findings are shown to be robust against training
hyperparameter variations. We also compare these techniques with the more
widely used entropy regularization. In addition, we study regularizing
different components and find that only regularizing the policy network is
typically the best. We further analyze why regularization may help
generalization in RL from four perspectives - sample complexity, reward
distribution, weight norm, and noise robustness. We hope our study provides
guidance for future practices in regularizing policy optimization algorithms.
Our code is available at https://github.com/xuanlinli17/iclr2021_rlreg .
In text/plain
format
Archived Files and Locations
application/pdf 16.9 MB
file_pqa5hbqfe5cqrnjqv7cx3qlgsi
|
arxiv.org (repository) web.archive.org (webarchive) |
1910.09191v5
access all versions, variants, and formats of this works (eg, pre-prints)