[1] Silver D, Schrittwieser J, Simonyan K, et al.Mastering the game of go without human knowledge[J]. Nature, 2017, 550(7676): 354-359. DOI:10.1038/nature24270. [2] Silver D, Hubert T, Schrittwieser J, et al.A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play[J]. Science, 2018, 362(6419): 1140-1144. DOI:10.1126/science.aar6404. [3] Moravcik M, Schmid M, Burch N, et al.Deepstack: Expert-level artificial intelligence in heads-up no-limit poker[J]. Science, 2017, 356(6337): 508-513. DOI:10.1126/science.aam6960. [4] Brown N, Sandholm T.Superhuman AI for heads-up no-limit poker: Libratus beats top professionals[J]. Science, 2018, 359(6374): 418-424. DOI:10.1126/science.aao1733. [5] Brown N, Sandholm T.Superhuman AI for multiplayer poker[J]. Science, 2019, 365(6456): 885-890. DOI:10.1126/science.aay2400. [6] Perolat J, De Vylder B, Hennes D, et al.Mastering the game of Stratego with model-free multiagent reinforcement learning[J]. Science, 2022, 378(6623): 990-996. DOI:10.1126/science.add4679. [7] Burch N, Johanson M, Bowling M.Solving imperfect information games using decomposition[C].//Proceedings of the AAAI Conference on Artificial Intelligence. 2014, 28(1). DOI:10.1609/aaai.v28i1.8810. [8] Schmid M, Moravčik M, Burch N, et al.Student of games: A unified learning algorithm for both perfect and imperfect information games[J]. Sciences Advances, 2023, 9(46). DOI:10.1126/sciadv.adg3256. [9] Von Neumann J, Morgenstern O.Theory of games and economic behavior, 2nd rev[M]. Princeton: Princeton University Press, 1947. DOI:10.1007/bf02313433. [10] Kuhn HW.Simplified two-person poker[J]. Contributions to the Theory of Games, 1950: 97-103. DOI:10.1515/9781400881727-010. [11] Kuhn HW.Extensive games and the problem of information[J]. Contributions to the Theory of Games, 1953, 2(28): 193-216. DOI:10.1515/9781400881970-012. [12] Aumann RJ.Mixed and behavior strategies in infinite extensive games[M]. Princeton: Princeton University, 1961. DOI:10.1515/9781400882014-029. [13] Bowling M.Multiagent learning in the presence of agents with limitations[D]. Pittsburgh: Carnegie Mellon University, 2003. [14] Selten R.Reexamination of the Perfectness Concept for Equilibrium Points in Extensive Games[J]. Economics, 1974: 317-354. DOI:10.2307/j.ctv173f1fh.23. [15] Von Stengel B.Efficient computation of behavior strategies[J]. Games and Economic Behavior, 1996, 14(2): 220-246. [16] Johanson M, Waugh K, Bowling M, et al.Accelerating best response calculation in large extensive games[C].//IJCAI. 2011: 258-265. DOI:10.1006/game.1996.0050. [17] Lisy V, Bowling M H.Equilibrium approximation quality of current no-limit poker bots[C].//AAAI Workshops, 2017. [18] Greenwald A, Li J C, Sodomka E.Solving for best responses and equilibria in extensive-form games with reinforcement learning methods[M].//Rohit Parikh on Logic, Language and Society. Cham: Springer International Publishing, 2017: 185-226. DOI:10.1007/978-3-319-47843-2_11. [19] Timbers F, Bard N, Lockhart E, et al.Approximate exploitability: learning a best response[C].//Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 2022: 3487-3493. DOI:10.24963/ijcai.2022/484. [20] Qiuyu Y, Kai X, Jifu G, et al. Long-term multi-vehicle trajectory prediction with scene contextual information[J]. Journal of University of Chinese Academy of Sciences, 2024:240717-. DOI:10.7523/j.ucas.2024.066. [21] Xiao C S, Xiao P L, Jie L.An artificial-potential-field method for real-time UAV navigation in unknown environments[J]. Journal of University of Chinese Academy of Sciences, 2022, 39(3):393-402. DOI:10.7523/j.ucas.2020.0022.(in Chinese) [22] Yu J G, Xiao C S, Xiao P L, et al.Path planning and obstacle avoidance for UAV based on Laplacian potential field[J]. Journal of University of Chinese Academy of Sciences, 2020, 37(5):681-687. DOI:10.7523/j.issn.2095-6134.2020.05.013.(in Chinese) [23] Yang G K, Chen H, Zhang M Y, et al.Uncertainty-based credit assignment for cooperative multi-agent reinforcement learning[J]. Journal of University of Chinese Academy of Sciences, 2024, 41(2):231-240. DOI:10.7523/j.ucas.2022.047.(in Chinese) [24] Chen H, Yang L K, Yin Q Y, et al.Local observation reconstruction for Ad-Hoc cooperation[J]. Journal of University of Chinese Academy of Sciences, 2024, 41(1):117-126. DOI:10.7523/j.ucas.2022.028.(in Chinese) [25] Waugh K, Zinkevich M, Johanson M, et al.A practical use of imperfect recall[C].//Symposium on Abstraction, Reformulation and Approximation (SARA),2009. |