Local observation reconstruction for Ad-Hoc cooperation

doi:10.7523/j.ucas.2022.028

Abstract

Abstract: In recent years, multi-agent reinforcement learning has received a lot of attention from researchers. In the study of multi-agent reinforcement learning, the question of how to perform ad-hoc cooperation, i.e., how to adapt to a changing variety and number of teammates, is a key problem. Existing methods either have strong prior knowledge assumptions or use hard-coded protocols for cooperation, which lack generality and can not be generalized to more general ad-hoc cooperation scenarios. To address this problem, this paper proposes a local observation reconstruction algorithm for ad-hoc cooperation, which uses attention mechanisms and sampling networks to reconstruct local observations, enabling the algorithm to recognize and make full use of high-dimensional state representations in different situations and achieve zero-shot generalization in ad-hoc cooperation scenarios. In this paper, the performance of the algorithm is compared and analyzed with representative algorithms on the StarCraft micromanagement environment and ad-hoc cooperation scenarios to verify the effectiveness of the algorithm.

Key words: multi-agent, deep reinforcement learning, credit assignment, Ad-Hoc cooperation

CLC Number:

TP183

CHEN Hao, YANG Likun, YIN Qiyue, HUANG Kaiqi. Local observation reconstruction for Ad-Hoc cooperation[J]. Journal of University of Chinese Academy of Sciences, 2024, 41(1): 117-126.

References

[1] Gao Y Q, Wang W, Yu N P. Consensus multi-agent reinforcement learning for volt-VAR control in power distribution networks[J]. IEEE Transactions on Smart Grid, 2021, 12(4): 3594-3604. DOI:10.1109/TSG.2021.3058996.
[2] Bhalla S, Ganapathi Subramanian S, Crowley M. Deep multi agent reinforcement learning for autonomous driving[C]//Advances in Artificial Intelligence, 2020: 67-78. DOI:10.1007/978-3-030-47358-7_7.
[3] Ye D Y, Zhang M J, Yang Y. A multi-agent framework for packet routing in wireless sensor networks[J]. Sensors, 2015, 15(5): 10026-10047. DOI:10.3390/s150510026.
[4] Oliehoek F A, Spaan M T J, Vlassis N. Optimal and approximate Q-value functions for decentralized POMDPs[J]. Journal of Artificial Intelligence Research, 2008, 32: 289-353. DOI:10.1613/jair.2447.
[5] Kraemer L, Banerjee B. Multi-agent reinforcement learning as a rehearsal for decentralized planning[J]. Neurocomputing, 2016, 190: 82-94. DOI:10.1016/j.neucom.2016.01.031.
[6] Foerster J, Farquhar G, Afouras T, et al. Counterfactual multi-agent policy gradients[EB/OL]. 2017: arXiv: 1705.08926[cs.AI] (2017-05-24) [2022-03-28]. https://arxiv.org/abs/1705.08926.
[7] Sunehag P, Lever G, Gruslys A, et al. Value-decomposition networks for cooperative multi-agent learning[EB/OL]. 2017: arXiv: 1706.05296 (2017-06-16) [2022-03-28]. https://arxiv.org/abs/1706.05296.
[8] Rashid T, Samvelyan M, Witt C S D, et al. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning[C]//International Conference on Machine Learning, 2018: 4295-4304. DOI: 10.48550/arXiv.1803.11485.
[9] Son K, Kim D, Kang W J, et al. QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning[J]. CoRR, 2019, abs/1905.05408, 2019.
[10] Vinyals O, Babuschkin I, Czarnecki W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019, 575(7782): 350-354. DOI:10.1038/s41586-019-1724-z.
[11] Baker B, Kanitscheider I, Markov T, et al. Emergent tool use from multi-agent autocurricula[J]. CoRR, 2019, abs/1909.07528, 2019.
[12] van der Vaart P, Mahajan A, Whiteson S. Model based multi-agent reinforcement learning with tensor decompositions [EB/OL]. arXiv:2110.14524 (2021-10-27) [2022-03-28]. https://arxiv.org/abs/2110.14524.
[13] Stone P, Kaminka GA, Kraus S, et al. Ad hoc autonomous agent teams: collaboration without pre-coordination[C/OL]//Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010. (2010-07-11) [2022-03-28] https://dl.acm.org/doi/10.5555/2898607.2898847.
[14] Samvelyan M, Rashid T, De Witt C S, et al. The starcraft multi-agent challenge [EB/OL]. arXiv:1902.04043 (2019-02-11) [2022-03-28]. https://arxiv.org/abs/1902.04043.
[15] Oliehoek F A, Amato C. A concise introduction to decentralized POMDPs[M]. Cham: Springer International Publishing, 2016. DOI:10.1007/978-3-319-28929-8.
[16] Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587):484-489. DOI:10.1038/nature16961.
[17] Moravćík M, Schmid M, Burch N, et al. DeepStack: expert-level artificial intelligence in heads-up no-limit poker[J]. Science, 2017, 356(6337): 508-513. DOI:10.1126/science.aam6960.
[18] Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning[J]. CoRR, 2013, abs/1312.5602, 2013.
[19] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533. DOI:10.1038/nature14236.
[20] Yang Y D, Wang J. An overview of multi-agent reinforcement learning from game theoretical perspective[EB/OL]. arXiv:2011.00583 (2020-11-01) [2022-03-28]. https://arxiv.org/abs/2011.00583.
[21] Hernandez-Leal P, Kartal B, Taylor M E. A survey and critique of multiagent deep reinforcement learning[J]. Autonomous Agents and Multi-Agent Systems, 2019, 33(6): 750-797. DOI:10.1007/s10458-019-09421-1.
[22] Yang Y D, Luo J, Wen Y, et al. Diverse auto-curriculum is critical for successful real-world multiagent learning systems [EB/OL]. arXiv:2102.07659 (2021-02-15) [2022-03-28]. https://arxiv.org/abs/2102.07659.
[23] Tan M. Multi-agent reinforcement learning: independent vs. cooperative agents [EB/OL]//Proceedings of the tenth international conference on machine learning，1993: 330-337. (1993-06-27) [2022-03-28] https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.55.8066&rep=rep1&type=pdf&ref=https://githubhelp.com.
[24] Raghu M, Irpan A, Andreas J, et al. Can deep reinforcement learning solve Erdos-Selfridge-Spencer games [EB/OL]. arXiv:1711.02301 (2017-11-07) [2022-03-28]. https://arxiv.org/abs/1711.02301.
[25] Bansal T, Pachocki J, Sidor S, et al. Emergent complexity via multi-agent competition[EB/OL]. arXiv:1710.03748 (2017-10-10) [2022-03-28]. https://arxiv.org/abs/1710.03748.
[26] Leibo J Z, Perolat J, Hughes E, et al. Malthusian reinforcement learning [EB/OL]. arXiv:1812.07019 (2018-12-17) [2022-03-28]. https://arxiv.org/abs/1812.07019.
[27] Leibo J Z, Zambaldi V, Lanctot M, et al. Multi-agent reinforcement learning in sequential social dilemmas [EB/OL]. arXiv:1702.03037 (2017-02-10) [2022-03-28]. https://arxiv.org/abs/1702.03037.
[28] Mordatch I, Abbeel P. Emergence of grounded compositional language in multi-agent populations [EB/OL]. arXiv:1703.04908 (2017-03-15) [2022-03-28]. https://arxiv.org/abs/1703.04908.
[29] Lazaridou A, Peysakhovich A, Baroni M. Multi-agent cooperation and the emergence of (natural) language[EB/OL]. arXiv:1612.07182 (2016-12-21) [2022-03-28]. https://arxiv.org/abs/1612.07182.
[30] Foerster J N, Assael Y M, de Freitas N, et al. Learning to communicate with deep multi-agent reinforcement learning[J]. CoRR, 2016, abs/1605.06676, 2016.
[31] Sukhbaatar S, Szlam A, Fergus R. Learning multiagent communication with backpropagation[J]. CoRR, 2016, abs/1605.07736, 2016.
[32] Peng P, Wen Y, Yang Y D, et al. Multiagent bidirectionally-coordinated nets: emergence of topologyhuman-level coordination in learning to play StarCraft combat games[EB/OL]. arXiv:1703.10069 (2017-03-29) [2022-03-28]. https://arxiv.org/abs/1703.10069.
[33] Palmer G, Tuyls K, Bloembergen D, et al. Lenient multi-agent deep reinforcement learning[EB/OL]. arXiv:1707.04402 (2017-07-14) [2022-03-28]. https://arxiv.org/abs/1707.04402.
[34] Omidshafiei S, Pazis J, Amato C, et al. Deep decentralized multi-task multi-agent reinforcement learning under partial observability [EB/OL]. arXiv:1703.06182 (2017-03-17) [2022-03-28]. https://arxiv.org/abs/1703.06182.
[35] Lanctot M, Zambaldi V, Gruslys A, et al. A unified game-theoretic approach to multiagent reinforcement learning[EB/OL]. arXiv:1711.00832 (2017-10-02) [2022-03-28]. https://arxiv.org/abs/1711.00832.
[36] Hong ZW, Su SY, Shann TY, et al. A deep policy inference q-network for multi-agent systems[EB/OL]. arXiv:1712.07893 (2017-12-21) [2022-03-28]. https://arxiv.org/abs/1712.07893.
[37] Heinrich J, Silver D. Deep reinforcement learning from self-play in imperfect-information games[J]. CoRR, 2016, abs/1603.01121, 2016.
[38] Chen S, Andrejczuk E, Cao Z G, et al. AATEAM: achieving the ad hoc teamwork by employing the attention mechanism[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 7095-7102. DOI:10.1609/aaai.v34i05.6196.
[39] Zhang T, Xu H, Wang X, et al. Multi-agent collaboration via reward attribution decomposition[EB/OL]. arXiv:2010.08531 (2020-10-16) [2022-03-28]. https://arxiv.org/abs/2010.08531.
[40] Mahajan A, Samvelyan M, Gupta T, et al. Generalization in cooperative multi-agent systems[EB/OL]. arXiv:2202.00104 (2022-01-31) [2022-03-28]. https://arxiv.org/abs/2202.00104.
[41] Agmon N, Stone P. Leading ad hoc agents in joint action settings with multiple teammates[C]//AAMAS, 2012: 341-348. DOI: 10.5555/2343576.2343625.
[42] Stone P, Kaminka G A, Rosenschein J S. Leading a best-response teammate in an ad hoc team[C]//Agent-Mediated Electronic Commerce. Designing Trading Strategies and Mechanisms for Electronic Markets, 2010: 132-146. DOI:10.1007/978-3-642-15117-0_10.
[43] Tambe M. Towards flexible teamwork[J]. Journal of Artificial Intelligence Research, 1997, 7: 83-124. DOI:10.1613/jair.433.
[44] Grosz B J, Kraus S. Collaborative plans for complex group action[J]. Artificial Intelligence, 1996, 86(2): 269-357. DOI:10.1016/0004-3702(95)00103-4.