Efficient Sample Reuse in EM-based Policy Search

Hirotaka Hachiya, Tokyo Institute of Technology, Japan
Jan Peters, Max Planck for Biological Cybernetics, Germany
Masashi Sugiyama, Tokyo Institute of Technology, Japan

Links

Session:
Springer Link:

Abstract

Direct policy search is a promising reinforcement learning framework in particular for controlling in continuous, high-dimensional systems such as anthropomorphic robots. Policy search often requires a large number of samples for obtaining a stable policy update estimator due to its high flexibility. However, this is prohibitive when the sampling cost is expensive. In this paper, we extend a EM-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, called Reward-weighted Regression with sample Reuse, is demonstrated through a robot learning experiment.