[1] Gao Y Q, Wang W, Yu N P. Consensus multi-agent reinforcement learning for volt-VAR control in power distribution networks[J]. IEEE Transactions on Smart Grid, 2021, 12(4): 3594-3604. DOI:10.1109/TSG.2021.3058996. [2] Bhalla S, Ganapathi Subramanian S, Crowley M. Deep multi agent reinforcement learning for autonomous driving[C]//Advances in Artificial Intelligence, 2020: 67-78. DOI:10.1007/978-3-030-47358-7_7. [3] Ye D Y, Zhang M J, Yang Y. A multi-agent framework for packet routing in wireless sensor networks[J]. Sensors, 2015, 15(5): 10026-10047. DOI:10.3390/s150510026. [4] Oliehoek F A, Spaan M T J, Vlassis N. Optimal and approximate Q-value functions for decentralized POMDPs[J]. Journal of Artificial Intelligence Research, 2008, 32: 289-353. DOI:10.1613/jair.2447. [5] Kraemer L, Banerjee B. Multi-agent reinforcement learning as a rehearsal for decentralized planning[J]. Neurocomputing, 2016, 190: 82-94. DOI:10.1016/j.neucom.2016.01.031. [6] Foerster J, Farquhar G, Afouras T, et al. Counterfactual multi-agent policy gradients[EB/OL]. 2017: arXiv: 1705.08926[cs.AI] (2017-05-24) [2022-03-28]. https://arxiv.org/abs/1705.08926. [7] Sunehag P, Lever G, Gruslys A, et al. Value-decomposition networks for cooperative multi-agent learning[EB/OL]. 2017: arXiv: 1706.05296 (2017-06-16) [2022-03-28]. https://arxiv.org/abs/1706.05296. [8] Rashid T, Samvelyan M, Witt C S D, et al. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning[C]//International Conference on Machine Learning, 2018: 4295-4304. DOI: 10.48550/arXiv.1803.11485. [9] Son K, Kim D, Kang W J, et al. QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning[J]. CoRR, 2019, abs/1905.05408, 2019. [10] Vinyals O, Babuschkin I, Czarnecki W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019, 575(7782): 350-354. DOI:10.1038/s41586-019-1724-z. [11] Baker B, Kanitscheider I, Markov T, et al. Emergent tool use from multi-agent autocurricula[J]. CoRR, 2019, abs/1909.07528, 2019. [12] van der Vaart P, Mahajan A, Whiteson S. Model based multi-agent reinforcement learning with tensor decompositions [EB/OL]. arXiv:2110.14524 (2021-10-27) [2022-03-28]. https://arxiv.org/abs/2110.14524. [13] Stone P, Kaminka GA, Kraus S, et al. Ad hoc autonomous agent teams: collaboration without pre-coordination[C/OL]//Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010. (2010-07-11) [2022-03-28] https://dl.acm.org/doi/10.5555/2898607.2898847. [14] Samvelyan M, Rashid T, De Witt C S, et al. The starcraft multi-agent challenge [EB/OL]. arXiv:1902.04043 (2019-02-11) [2022-03-28]. https://arxiv.org/abs/1902.04043. [15] Oliehoek F A, Amato C. A concise introduction to decentralized POMDPs[M]. Cham: Springer International Publishing, 2016. DOI:10.1007/978-3-319-28929-8. [16] Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587):484-489. DOI:10.1038/nature16961. [17] Moravćík M, Schmid M, Burch N, et al. DeepStack: expert-level artificial intelligence in heads-up no-limit poker[J]. Science, 2017, 356(6337): 508-513. DOI:10.1126/science.aam6960. [18] Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning[J]. CoRR, 2013, abs/1312.5602, 2013. [19] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533. DOI:10.1038/nature14236. [20] Yang Y D, Wang J. An overview of multi-agent reinforcement learning from game theoretical perspective[EB/OL]. arXiv:2011.00583 (2020-11-01) [2022-03-28]. https://arxiv.org/abs/2011.00583. [21] Hernandez-Leal P, Kartal B, Taylor M E. A survey and critique of multiagent deep reinforcement learning[J]. Autonomous Agents and Multi-Agent Systems, 2019, 33(6): 750-797. DOI:10.1007/s10458-019-09421-1. [22] Yang Y D, Luo J, Wen Y, et al. Diverse auto-curriculum is critical for successful real-world multiagent learning systems [EB/OL]. arXiv:2102.07659 (2021-02-15) [2022-03-28]. https://arxiv.org/abs/2102.07659. [23] Tan M. Multi-agent reinforcement learning: independent vs. cooperative agents [EB/OL]//Proceedings of the tenth international conference on machine learning,1993: 330-337. (1993-06-27) [2022-03-28] https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.55.8066&rep=rep1&type=pdf&ref=https://githubhelp.com. [24] Raghu M, Irpan A, Andreas J, et al. Can deep reinforcement learning solve Erdos-Selfridge-Spencer games [EB/OL]. arXiv:1711.02301 (2017-11-07) [2022-03-28]. https://arxiv.org/abs/1711.02301. [25] Bansal T, Pachocki J, Sidor S, et al. Emergent complexity via multi-agent competition[EB/OL]. arXiv:1710.03748 (2017-10-10) [2022-03-28]. https://arxiv.org/abs/1710.03748. [26] Leibo J Z, Perolat J, Hughes E, et al. Malthusian reinforcement learning [EB/OL]. arXiv:1812.07019 (2018-12-17) [2022-03-28]. https://arxiv.org/abs/1812.07019. [27] Leibo J Z, Zambaldi V, Lanctot M, et al. Multi-agent reinforcement learning in sequential social dilemmas [EB/OL]. arXiv:1702.03037 (2017-02-10) [2022-03-28]. https://arxiv.org/abs/1702.03037. [28] Mordatch I, Abbeel P. Emergence of grounded compositional language in multi-agent populations [EB/OL]. arXiv:1703.04908 (2017-03-15) [2022-03-28]. https://arxiv.org/abs/1703.04908. [29] Lazaridou A, Peysakhovich A, Baroni M. Multi-agent cooperation and the emergence of (natural) language[EB/OL]. arXiv:1612.07182 (2016-12-21) [2022-03-28]. https://arxiv.org/abs/1612.07182. [30] Foerster J N, Assael Y M, de Freitas N, et al. Learning to communicate with deep multi-agent reinforcement learning[J]. CoRR, 2016, abs/1605.06676, 2016. [31] Sukhbaatar S, Szlam A, Fergus R. Learning multiagent communication with backpropagation[J]. CoRR, 2016, abs/1605.07736, 2016. [32] Peng P, Wen Y, Yang Y D, et al. Multiagent bidirectionally-coordinated nets: emergence of topologyhuman-level coordination in learning to play StarCraft combat games[EB/OL]. arXiv:1703.10069 (2017-03-29) [2022-03-28]. https://arxiv.org/abs/1703.10069. [33] Palmer G, Tuyls K, Bloembergen D, et al. Lenient multi-agent deep reinforcement learning[EB/OL]. arXiv:1707.04402 (2017-07-14) [2022-03-28]. https://arxiv.org/abs/1707.04402. [34] Omidshafiei S, Pazis J, Amato C, et al. Deep decentralized multi-task multi-agent reinforcement learning under partial observability [EB/OL]. arXiv:1703.06182 (2017-03-17) [2022-03-28]. https://arxiv.org/abs/1703.06182. [35] Lanctot M, Zambaldi V, Gruslys A, et al. A unified game-theoretic approach to multiagent reinforcement learning[EB/OL]. arXiv:1711.00832 (2017-10-02) [2022-03-28]. https://arxiv.org/abs/1711.00832. [36] Hong ZW, Su SY, Shann TY, et al. A deep policy inference q-network for multi-agent systems[EB/OL]. arXiv:1712.07893 (2017-12-21) [2022-03-28]. https://arxiv.org/abs/1712.07893. [37] Heinrich J, Silver D. Deep reinforcement learning from self-play in imperfect-information games[J]. CoRR, 2016, abs/1603.01121, 2016. [38] Chen S, Andrejczuk E, Cao Z G, et al. AATEAM: achieving the ad hoc teamwork by employing the attention mechanism[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 7095-7102. DOI:10.1609/aaai.v34i05.6196. [39] Zhang T, Xu H, Wang X, et al. Multi-agent collaboration via reward attribution decomposition[EB/OL]. arXiv:2010.08531 (2020-10-16) [2022-03-28]. https://arxiv.org/abs/2010.08531. [40] Mahajan A, Samvelyan M, Gupta T, et al. Generalization in cooperative multi-agent systems[EB/OL]. arXiv:2202.00104 (2022-01-31) [2022-03-28]. https://arxiv.org/abs/2202.00104. [41] Agmon N, Stone P. Leading ad hoc agents in joint action settings with multiple teammates[C]//AAMAS, 2012: 341-348. DOI: 10.5555/2343576.2343625. [42] Stone P, Kaminka G A, Rosenschein J S. Leading a best-response teammate in an ad hoc team[C]//Agent-Mediated Electronic Commerce. Designing Trading Strategies and Mechanisms for Electronic Markets, 2010: 132-146. DOI:10.1007/978-3-642-15117-0_10. [43] Tambe M. Towards flexible teamwork[J]. Journal of Artificial Intelligence Research, 1997, 7: 83-124. DOI:10.1613/jair.433. [44] Grosz B J, Kraus S. Collaborative plans for complex group action[J]. Artificial Intelligence, 1996, 86(2): 269-357. DOI:10.1016/0004-3702(95)00103-4. |