Contextual Latent-Movements Off-Policy Optimization for Robotic Manipulation Skills release_dtra35awg5epblsesa3rtfoej4

by Samuele Tosatto, Georgia Chalvatzaki, Jan Peters

Released as a article .

2022  

Abstract

Parameterized movement primitives have been extensively used for imitation learning of robotic tasks. However, the high-dimensionality of the parameter space hinders the improvement of such primitives in the reinforcement learning (RL) setting, especially for learning with physical robots. In this paper we propose a novel view on handling the demonstrated trajectories for acquiring low-dimensional, non-linear latent dynamics, using mixtures of probabilistic principal component analyzers (MPPCA) on the movements' parameter space. Moreover, we introduce a new contextual off-policy RL algorithm, named LAtent-Movements Policy Optimization (LAMPO). LAMPO can provide gradient estimates from previous experience using self-normalized importance sampling, hence, making full use of samples collected in previous learning iterations. These advantages combined provide a complete framework for sample-efficient off-policy optimization of movement primitives for robot learning of high-dimensional manipulation skills. Our experimental results conducted both in simulation and on a real robot show that LAMPO provides sample-efficient policies against common approaches in literature.
In text/plain format

Archived Files and Locations

application/pdf  2.4 MB
file_fhbjs76edngb5jj5kx6wglm54y
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2022-02-11
Version   v3
Language   en ?
arXiv  2010.13766v3
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 81144d0e-3b53-47ca-8c81-d47e79f40200
API URL: JSON