Welcome to Journal of University of Chinese Academy of Sciences,Today is

Journal of University of Chinese Academy of Sciences

   

Model-based explorer-learner joint optimization via uncertainty augmentation

XIAO Shixiang1, HUANG Wenzhen2, JIAO Jianbin1   

  1. 1 School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China;
    2 School of Information Science and Technology, Tsinghua University, Beijing 100084, China
  • Received:2023-12-04 Revised:2024-09-05

Abstract: In existing model-based reinforcement learning methods, a single policy is adopted to interact with the real environment and the environment model, which makes agent hard to balance the efficiency of exploring the environment and the stability of policy updating. To address this issue, this paper proposes a Model-based Explorer-Learner Joint Optimization via Uncertainty Augmentation method (MELO-UA). MELO-UA simultaneously optimizes a pair of policies, namely the explorer policy for interacting with the real environment, and the learner policy for interacting with the environment model. During the optimization of the explorer, implicit bonus based on model uncertainty is introduced to enhance the efficiency of exploring the real environment. At the same time, during the optimization of the learner, the model uncertainty is used as a constraint to ensure the stability of the policy optimization. Experimental results on multiple continuous control tasks show that the proposed method has significant advantages in asymptotic performance and sample efficiency compared to state-of-the-art methods.

Key words: deep reinforcement learning, model-based reinforcement learning, sample efficiency, exploration in reinforcement learning, uncertainty, model errors

CLC Number: