欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2024, Vol. 41 ›› Issue (1): 117-126.DOI: 10.7523/j.ucas.2022.028

• 电子信息与计算机科学 • 上一篇    下一篇

面向Ad-Hoc协作的局部观测重建方法

陈皓1,2, 杨立昆1,2, 尹奇跃1,2, 黄凯奇1,2,3   

  1. 1. 中国科学院自动化研究所智能系统与工程研究中心, 北京 100190;
    2. 中国科学院大学人工智能学院, 北京 100049;
    3. 中国科学院脑科学与智能技术卓越创新中心, 上海 200031
  • 收稿日期:2022-03-02 修回日期:2022-04-01 发布日期:2022-04-07
  • 通讯作者: 黄凯奇,E-mail:kqhuang@nlpr.ia.ac.cn
  • 基金资助:
    国家自然科学基金(61876181),北京市科技创新计划(Z19110000119043),青年创新促进会、中国科学院和中国科学院项目(QYZDB-SSWJSC006)资助

Local observation reconstruction for Ad-Hoc cooperation

CHEN Hao1,2, YANG Likun1,2, YIN Qiyue1,2, HUANG Kaiqi1,2,3   

  1. 1. CRISE, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;
    2. School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China;
    3 CAS Center for Excellence in Brain Science and Intelligence Technology, Shanghai 200031, China
  • Received:2022-03-02 Revised:2022-04-01 Published:2022-04-07

摘要: 在多智能体强化学习的研究中,如何进行Ad-Hoc协作,也就是说如何适应种类和数量变化的队友,是一个关键问题。现有方法或者有很强的先验知识假设,或者使用硬编码的规则进行合作,缺乏通用性,无法泛化到更一般的Ad-Hoc协作场景。为解决该问题,提出一种面向Ad-Hoc协作的局部观测重建算法,利用注意力机制和采样网络对局部观测进行重建,使得算法认识到并充分利用不同局面中的高维状态表征,实现了在Ad-Hoc协作场景下的零样本泛化。在星际争霸微操环境和Ad-Hoc协作场景上与代表性算法的性能进行对比与分析,验证了算法的有效性。

关键词: 多智能体, 深度强化学习, 信用分配, Ad-Hoc协作

Abstract: In recent years, multi-agent reinforcement learning has received a lot of attention from researchers. In the study of multi-agent reinforcement learning, the question of how to perform ad-hoc cooperation, i.e., how to adapt to a changing variety and number of teammates, is a key problem. Existing methods either have strong prior knowledge assumptions or use hard-coded protocols for cooperation, which lack generality and can not be generalized to more general ad-hoc cooperation scenarios. To address this problem, this paper proposes a local observation reconstruction algorithm for ad-hoc cooperation, which uses attention mechanisms and sampling networks to reconstruct local observations, enabling the algorithm to recognize and make full use of high-dimensional state representations in different situations and achieve zero-shot generalization in ad-hoc cooperation scenarios. In this paper, the performance of the algorithm is compared and analyzed with representative algorithms on the StarCraft micromanagement environment and ad-hoc cooperation scenarios to verify the effectiveness of the algorithm.

Key words: multi-agent, deep reinforcement learning, credit assignment, Ad-Hoc cooperation

中图分类号: