不完美信息博弈中基于逆向归纳的最佳响应直接计算方法

doi:10.7523/j.ucas.2026.006

摘要/Abstract

摘要： 最佳响应计算是博弈论策略评估的核心问题,但不完美信息扩展式博弈的通用算法面临指数级复杂度难题。针对具有完美回忆的博弈,Von Stengel（1996）通过序列形式线性规划实现了在完美回忆假设下线性时间复杂度的最佳响应计算,其本质隐含了逆向归纳思想,但这种联系仅通过抽象的线性规划机制间接体现。本文直接从博弈树结构出发,为逆向归纳法在不完美信息博弈中计算最佳响应提供了显式的构造性证明。通过引入反事实最佳响应值作为核心归纳度量,其与博弈树层级结构一致的递归关系得到了证明。基于此递归性质,算法可从叶节点邻近的信息集开始,通过子树的局部计算确定最优动作,逐层向上传播至根节点。该方法将计算过程与理论证明有机统一,在保持O(n)时间复杂度的同时,为基于线性规划的抽象方法提供了更直观、更贴近博弈树本质的替代方案。本研究阐明了逆向归纳与不完美信息博弈最佳响应计算之间的本质联系。

关键词: 最佳响应, 不完美信息博弈, 逆向归纳, 反事实最佳响应, 完美回忆

Abstract: Computing best responses is essential to strategy evaluation in game theory, yet general algorithms for imperfect-information extensive-form games suffer from exponential complexity. For games with perfect recall, Von Stengel (1996) established that best responses can be computed in linear time through a sequence-form linear program, which implicitly performs backward induction—though this connection remains indirect and embedded within abstract LP machinery. This paper provides an explicit, constructive proof that backward induction directly computes best responses in imperfect-information games with perfect recall. We introduce counterfactual best response values as the core inductive quantity and prove they satisfy a recursive relationship aligned with the game tree structure. This enables a direct backward induction algorithm that traverses information sets from leaves to root, making optimal action selections based on local subtree computations. Our approach unifies the computational procedure with its theoretical justification, offering a more intuitive, tree-structured alternative to LP-based methods while maintaining O(n) time complexity. This work clarifies the fundamental connection between backward induction and best response computation in imperfect-information settings.

Key words: best response, imperfect-information games, backward induction, counterfactual best response, perfect recall

中图分类号:

TP181
O225

付延昌, 尹奇跃, 刘圣达, 黄凯奇. 不完美信息博弈中基于逆向归纳的最佳响应直接计算方法[J]. 中国科学院大学学报, DOI: 10.7523/j.ucas.2026.006.

FU Yanchang, YIN Qiyue, LIU Shengda, HUANG Kaiqi. Direct backward induction for best response computation in imperfect-information games with perfect recall^*[J]. Journal of University of Chinese Academy of Sciences, DOI: 10.7523/j.ucas.2026.006.

参考文献

[1] Silver D, Schrittwieser J, Simonyan K, et al.Mastering the game of go without human knowledge[J]. Nature, 2017, 550(7676): 354-359. DOI:10.1038/nature24270.
[2] Silver D, Hubert T, Schrittwieser J, et al.A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play[J]. Science, 2018, 362(6419): 1140-1144. DOI:10.1126/science.aar6404.
[3] Moravcik M, Schmid M, Burch N, et al.Deepstack: Expert-level artificial intelligence in heads-up no-limit poker[J]. Science, 2017, 356(6337): 508-513. DOI:10.1126/science.aam6960.
[4] Brown N, Sandholm T.Superhuman AI for heads-up no-limit poker: Libratus beats top professionals[J]. Science, 2018, 359(6374): 418-424. DOI:10.1126/science.aao1733.
[5] Brown N, Sandholm T.Superhuman AI for multiplayer poker[J]. Science, 2019, 365(6456): 885-890. DOI:10.1126/science.aay2400.
[6] Perolat J, De Vylder B, Hennes D, et al.Mastering the game of Stratego with model-free multiagent reinforcement learning[J]. Science, 2022, 378(6623): 990-996. DOI:10.1126/science.add4679.
[7] Burch N, Johanson M, Bowling M.Solving imperfect information games using decomposition[C].//Proceedings of the AAAI Conference on Artificial Intelligence. 2014, 28(1). DOI:10.1609/aaai.v28i1.8810.
[8] Schmid M, Moravčik M, Burch N, et al.Student of games: A unified learning algorithm for both perfect and imperfect information games[J]. Sciences Advances, 2023, 9(46). DOI:10.1126/sciadv.adg3256.
[9] Von Neumann J, Morgenstern O.Theory of games and economic behavior, 2nd rev[M]. Princeton: Princeton University Press, 1947. DOI:10.1007/bf02313433.
[10] Kuhn HW.Simplified two-person poker[J]. Contributions to the Theory of Games, 1950: 97-103. DOI:10.1515/9781400881727-010.
[11] Kuhn HW.Extensive games and the problem of information[J]. Contributions to the Theory of Games, 1953, 2(28): 193-216. DOI:10.1515/9781400881970-012.
[12] Aumann RJ.Mixed and behavior strategies in infinite extensive games[M]. Princeton: Princeton University, 1961. DOI:10.1515/9781400882014-029.
[13] Bowling M.Multiagent learning in the presence of agents with limitations[D]. Pittsburgh: Carnegie Mellon University, 2003.
[14] Selten R.Reexamination of the Perfectness Concept for Equilibrium Points in Extensive Games[J]. Economics, 1974: 317-354. DOI:10.2307/j.ctv173f1fh.23.
[15] Von Stengel B.Efficient computation of behavior strategies[J]. Games and Economic Behavior, 1996, 14(2): 220-246.
[16] Johanson M, Waugh K, Bowling M, et al.Accelerating best response calculation in large extensive games[C].//IJCAI. 2011: 258-265. DOI:10.1006/game.1996.0050.
[17] Lisy V, Bowling M H.Equilibrium approximation quality of current no-limit poker bots[C].//AAAI Workshops, 2017.
[18] Greenwald A, Li J C, Sodomka E.Solving for best responses and equilibria in extensive-form games with reinforcement learning methods[M].//Rohit Parikh on Logic, Language and Society. Cham: Springer International Publishing, 2017: 185-226. DOI:10.1007/978-3-319-47843-2_11.
[19] Timbers F, Bard N, Lockhart E, et al.Approximate exploitability: learning a best response[C].//Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 2022: 3487-3493. DOI:10.24963/ijcai.2022/484.
[20] Qiuyu Y, Kai X, Jifu G, et al. Long-term multi-vehicle trajectory prediction with scene contextual information[J]. Journal of University of Chinese Academy of Sciences, 2024:240717-. DOI:10.7523/j.ucas.2024.066.
[21] Xiao C S, Xiao P L, Jie L.An artificial-potential-field method for real-time UAV navigation in unknown environments[J]. Journal of University of Chinese Academy of Sciences, 2022, 39(3):393-402. DOI:10.7523/j.ucas.2020.0022.(in Chinese)
[22] Yu J G, Xiao C S, Xiao P L, et al.Path planning and obstacle avoidance for UAV based on Laplacian potential field[J]. Journal of University of Chinese Academy of Sciences, 2020, 37(5):681-687. DOI:10.7523/j.issn.2095-6134.2020.05.013.(in Chinese)
[23] Yang G K, Chen H, Zhang M Y, et al.Uncertainty-based credit assignment for cooperative multi-agent reinforcement learning[J]. Journal of University of Chinese Academy of Sciences, 2024, 41(2):231-240. DOI:10.7523/j.ucas.2022.047.(in Chinese)
[24] Chen H, Yang L K, Yin Q Y, et al.Local observation reconstruction for Ad-Hoc cooperation[J]. Journal of University of Chinese Academy of Sciences, 2024, 41(1):117-126. DOI:10.7523/j.ucas.2022.028.(in Chinese)
[25] Waugh K, Zinkevich M, Johanson M, et al.A practical use of imperfect recall[C].//Symposium on Abstraction, Reformulation and Approximation (SARA),2009.