[1] Bhalla S, Ganapathi Subramanian S, Crowley M. Deep multi agent reinforcement learning for autonomous driving[M]//Advances in Artificial Intelligence. Cham: Springer International Publishing, 2020: 67-78. DOI:10.1007/978-3-030-47358-7_7. [2] Ye D Y, Zhang M J, Yang Y. A multi-agent framework for packet routing in wireless sensor networks[J]. Sensors (Basel Switzerland), 2015, 15(5):10026-10047. DOI:10.3390/s150510026. [3] Hüttenrauch M, Šošić A, Neumann G. Guided deep reinforcement learning for swarm systems[EB/OL]. arXiv:1709.06011(2017-09-18)[2022-04-15]. https://arxiv.org/abs/1709.06011. [4] Berner C, Brockman G, Chan B, et al. Dota 2 with large scale deep reinforcement learning[EB/OL]. arXiv:1912.06680(2019-12-13)[2022-04-15]. https://arxiv.org/abs/1912.06680. [5] Vinyals O, Ewalds T, Bartunov S, et al. StarCraft II: a new challenge for reinforcement learning[EB/OL]. arXiv:1708.04782(2017-08-16)[2022-04-15]. https://arxiv.org/abs/1708.04782. [6] Ye D H, Chen G B, Zhang W, et al. Towards playing full moba games with deep reinforcement learning[EB/OL]. arXiv:2011.12692(2020-12-31)[2022-04-15]. https://arxiv.org/abs/2011.12692. [7] Gupta J K, Egorov M, Kochenderfer M. Cooperative multi-agent control using deep reinforcement learning[M]// Autonomous Agents and Multiagent Systems. Cham: Springer International Publishing, 2017: 66-83. DOI:10.1007/978-3-319-71682-4_5. [8] Rashid T, Samvelyan M, Witt C S D, et al. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning[EB/OL]. arXiv: 1803.11485(2018-06-06)[2022-04-15]. https://arxiv.org/abs/1803.11485. [9] Tan M. Multi-agent reinforcement learning: independent vs. cooperative agents[M]//Machine Learning Proceedings 1993. Amsterdam: Elsevier, 1993: 330-337. DOI:10.1016/b978-1-55860-307-3.50049-6. [10] Du Y L, Han Lei, Fang M, et al. LIIR: learning individual intrinsic reward in multi-agent reinforcement learning[C/OL]//Advances in Neural Information Processing Systems, Cambridge, MIT Press, 2019: 4405-4416. (2021-06-15)[2022-04-18]. https://dl.acm.org/doi/10.5555/3454287.3454683. [11] Foerster J, Farquhar G, Afouras T, et al. Counterfactual multi-agent policy gradients[EB/OL]. arXiv: 1705.08926(2017-12-14)[2022-04-15]. https://arxiv.org/abs/1705.08926. [12] Kraemer L, Banerjee B. Multi-agent reinforcement learning as a rehearsal for decentralized planning[J]. Neurocomputing, 2016, 190: 82-94. DOI:10.1016/j.neucom.2016.01.031. [13] Mahajan A, Rashid T, Samvelyan M, et al. MAVEN: multi-agent variational exploration[EB/OL]. arXiv: 1910.07483v2(2020-01-20)[2022-04-15]. https://arxiv.org/abs/1910.07483v2. [14] Oliehoek F A, Spaan M T J, Vlassis N. Optimal and approximate q-value functions for decentralized POMDPs[J]. Journal of Artificial Intelligence Research, 2008, 32: 289-353. DOI:10.1613/jair.2447. [15] Wang S C, Li B. Implicit posterior sampling reinforcement learning for continuous control[M]// Neural Information Processing. Cham: Springer International Publishing, 2020: 452-460. DOI:10.1007/978-3-030-63833-7_38. [16] Blundell C, Cornebise J, Kavukcuoglu K, et al. Weight uncertainty in neural network[EB/OL]. arXiv: 1505.05424(2015-05-21)[2022-04-18]. https://arxiv.org/abs/1505.05424. [17] Krueger D, Huang C W, Islam R, et al. Bayesian hypernetworks[EB/OL]. arXiv: 1710.04759(2018-04-24)[2022-04-15]. https://arxiv.org/abs/1710.04759. [18] Pawlowski N, Brock A, Lee M C H, et al. Implicit weight uncertainty in neural networks[EB/OL]. arXiv: 1711.01297(2018-05-25)[2022-04-15]. https://arxiv.org/abs/1711.01297. [19] Oliehoek F A, Amato C. A concise introduction to decentralized POMDPs[M]. Cham: Springer International Publishing, 2016. DOI:10.1007/978-3-319-28929-8. [20] Wolpert D H, Tumer K. Optimal payoff functions for members of collectives[M]//Modeling Complexity in Economic and Social Systems. WORLD SCIENTIFIC, 2002: 355-369. DOI:10.1142/9789812777263_0020. [21] Sunehag P, Lever G, Gruslys A, et al. Value-decomposition networks for cooperative multi-agent learning[EB/OL]. arXiv: 1706.05296(2017-06-16)[2022-04-15]. https://arxiv.org/abs/1706.05296. [22] Wang J H, Zhang Y, Kim T K, et al. Shapley q-value: a local reward approach to solve global reward games[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 7285-7292. DOI:10.1609/aaai.v34i05.6220. [23] Chalkiadakis G, Elkind E, Wooldridge M. Computational aspects of cooperative game theory[J]. Synthesis Lectures on Artificial Intelligence and Machine Learning, 2011, 5(6): 1-168. DOI:10.2200/s00355ed1v01y201107aim016. [24] Shapely L S. A value for n-person games[J]. Annals of mathematics studies, 1953, 2: 307-318. DOI:10.7249/P0295. [25] Son K, Kim D, Kang W J, et al. QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning[EB/OL]. arXiv: 1905.05408v1(2019-05-14)[2022-04-15]. https://arxiv.org/abs/1905.05408v1. [26] Zhou M, Liu Z Y, Sui P W, et al. Learning implicit credit assignment for cooperative multi-agent reinforcement learning[EB/OL]. arXiv: 2007.02529(2020-10-22)[2022-04-15]. https://arxiv.org/abs/2007.02529. [27] Wu Z F, Yu C, Ye D H, et al. Coordinated proximal policy optimization[EB/OL]. arXiv: 2111.04051(2021-11-07)[2022-04-15]. https://arxiv.org/abs/2111.04051. [28] Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[EB/OL]. arXiv:1707.06347(2017-08-28)[2022-04-15]. https://arxiv.org/abs/1707.06347. [29] Shao J Z, Zhang H C, Jiang Y C, et al. Credit assignment with meta-policy gradient for multi-agent reinforcement learning[EB/OL]. arXiv: 2102.12957(2021-02-24)[2022-04-15]. https://arxiv.org/abs/2102.12957. [30] Xu Z W, Hasselt H, Silver D. Meta-gradient reinforcement learning[C/OL]//Advances in International Conference on Neural Information Processing Systems, Cambridge, MIT Press, 2018: 2396-2407. (2018-12-03)[2022-04-18]. https://dl.acm.org/doi/10.5555/3327144.3327166. [31] Xu Z W, Li D P, Bai Y P, et al. MMD-MIX: value function factorisation with maximum mean discrepancy for cooperative multi-agent reinforcement learning[C]//2021 International Joint Conference on Neural Networks (IJCNN). July 18-22, 2021, Shenzhen, China. IEEE, 2021:1-7. DOI:10.1109/IJCNN52387.2021.9533636. [32] Bellemare M G, Dabney W, Munos R. A distributional perspective on reinforcement learning[EB/OL]. arXiv: 1707.06887(2017-07-21)[2022-04-15]. https://arxiv.org/abs/1707.06887. [33] Ahn H, Lee D, Cha S, et al. Uncertainty-based continual learning with adaptive regularization[EB/OL]. arXiv: 1905.11614(2019-11-14)[2022-04-18]. https://arxiv.org/abs/1905.11614. [34] Lipton Z C, Li X J, Gao J F, et al. BBQ-networks: efficient exploration in deep reinforcement learning for task-oriented dialogue systems[EB/OL]. arXiv: 1608.05081(2017-11-23)[2022-04-15]. https://arxiv.org/abs/1608.05081. [35] Hinton G E, van Camp D. Keeping the neural networks simple by minimizing the description length of the weights[C]//Proceedings of the sixth annual conference on Computational learning theory - COLT '93. July 26-28, 1993. Santa Cruz, California, USA. New York: ACM Press, 1993: 5-13. DOI:10.1145/168304.168306. [36] Moerland T M, Broekens J, Jonker C. Efficient exploration with double uncertain value networks[EB/OL]. arXiv:1711.10789(2017-11-29)[2022-04-15]. https://arxiv.org/abs/1711.10789. [37] Fortunato M, Azar M G, Piot B, et al. Noisy networks for exploration[EB/OL]. arXiv:1706.10295(2019-07-09)[2022-04-15]. https://arxiv.org/abs/1706.10295. [38] Jiang B, Xu T Y, Wong W H. Approximate bayesian computation with kullback-leibler divergence as data discrepancy[C/OL]//Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS) 2018, Lanzarote, Spain. JMLR: Volume 84. 2018: 1711-1721. (2018-03-31)[2022-04-18]. http://proceedings.mlr.press/v84/jiang18a/jiang18a.pdf. [39] Samvelyan M, Rashid T, de Witt C S, et al. The StarCraft multi-agent challenge[EB/OL]. arXiv: 1902.04043(2019-12-09)[2022-04-18]. https://arxiv.org/abs/1902.04043. [40] Hu J, Wu H, Harding S A, et al. RIIT: Rethinking the importance of implementation tricks in multi-agent reinforcement learning[EB/OL]. arXiv: 2102.03479(2022-01-01)[2022-04-15]. https://arxiv.org/abs/2102.03479. [41] Yao M, Yin Q Y, Yu T T, et al. The partially observable asynchronous multi-agent cooperation challenge[EB/OL]// arXiv: 2112.03809(2021-12-07)[2022-04-15]. https://arxiv.org/abs/2112.03809. |