Welcome to Journal of University of Chinese Academy of Sciences,Today is

Journal of University of Chinese Academy of Sciences ›› 2024, Vol. 41 ›› Issue (2): 231-240.DOI: 10.7523/j.ucas.2022.047

Previous Articles     Next Articles

Uncertainty-based credit assignment for cooperative multi-agent reinforcement learning

YANG Guangkai1,2, CHEN Hao1,2, ZHANG Mingyi1, YIN Qiyue1,2, HUANG Kaiqi1,2,3   

  1. 1. Center for Research on Intelligence System and Engineering, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;
    2. School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China;
    3. Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China
  • Received:2022-03-18 Revised:2022-04-26 Online:2024-03-15

Abstract: In recent years, multi-agent cooperation under partially observable conditions has attracted extensive attention. As a general paradigm to deal with such tasks, centralized training with decentralized execution faces the core problem of credit assignment. Value decomposition is a representative method within this paradigm. Through the mixing network, the joint state action-value function is decomposed into multiple local observation action-value functions to realize credit assignment, which performs well in many problems. However, the single point estimation of the mixing network parameters maintained by these methods lacks the representation of uncertainty and is thus difficult to effectively deal with the random factors in the environment, resulting in convergence to the suboptimal strategy. To alleviate this problem, this paper performs Bayesian analysis on the mixing network and proposes a method based on uncertainty for multi-agent credit assignment, which guides the credit assignment by explicitly quantifying the uncertainty of parameters. Considering the complex interactions among agents, this paper utilizes the Bayesian hypernetwork to implicitly model the arbitrary complex posterior distribution of the mixing network parameters, to avoid falling into the local optima by specifying the distribution type a priori. This paper compares and analyzes the performance of representative algorithms on multiple maps in StarCraft multi-agent challenge (SMAC) and verifies the effectiveness of the proposed algorithm.

Key words: multi-agent cooperation, deep reinforcement learning, credit assignment, Bayesian hypernetwork

CLC Number: