欢迎访问中国科学院大学学报,今天是

中国科学院大学学报

• • 上一篇    下一篇

不完美信息博弈中基于逆向归纳的最佳响应直接计算方法

付延昌1,2, 尹奇跃2, 刘圣达2, 黄凯奇1,2   

  1. 1.中国科学院大学人工智能学院,北京 101408;
    2.中国科学院自动化研究所,北京 100190
  • 收稿日期:2025-09-15 修回日期:2026-02-04 发布日期:2026-02-04

Direct backward induction for best response computation in imperfect-information games with perfect recall*

FU Yanchang1,2, YIN Qiyue2†, LIU Shengda2, HUANG Kaiqi1,2   

  1. 1 School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 101408, China;
    2 Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
  • Received:2025-09-15 Revised:2026-02-04 Published:2026-02-04
  • Contact: †E-mail: qyyin@nlpr.ia.ac.cn
  • Supported by:
    *Supported by the Key Research Project of Chinese Academy of Sciences (No. RCJJ-145-24-15)

摘要: 最佳响应计算是博弈论策略评估的核心问题,但不完美信息扩展式博弈的通用算法面临指数级复杂度难题。针对具有完美回忆的博弈,Von Stengel(1996)通过序列形式线性规划实现了在完美回忆假设下线性时间复杂度的最佳响应计算,其本质隐含了逆向归纳思想,但这种联系仅通过抽象的线性规划机制间接体现。本文直接从博弈树结构出发,为逆向归纳法在不完美信息博弈中计算最佳响应提供了显式的构造性证明。通过引入反事实最佳响应值作为核心归纳度量,其与博弈树层级结构一致的递归关系得到了证明。基于此递归性质,算法可从叶节点邻近的信息集开始,通过子树的局部计算确定最优动作,逐层向上传播至根节点。该方法将计算过程与理论证明有机统一,在保持O(n)时间复杂度的同时,为基于线性规划的抽象方法提供了更直观、更贴近博弈树本质的替代方案。本研究阐明了逆向归纳与不完美信息博弈最佳响应计算之间的本质联系。

关键词: 最佳响应, 不完美信息博弈, 逆向归纳, 反事实最佳响应, 完美回忆

Abstract: Computing best responses is essential to strategy evaluation in game theory, yet general algorithms for imperfect-information extensive-form games suffer from exponential complexity. For games with perfect recall, Von Stengel (1996) established that best responses can be computed in linear time through a sequence-form linear program, which implicitly performs backward induction—though this connection remains indirect and embedded within abstract LP machinery. This paper provides an explicit, constructive proof that backward induction directly computes best responses in imperfect-information games with perfect recall. We introduce counterfactual best response values as the core inductive quantity and prove they satisfy a recursive relationship aligned with the game tree structure. This enables a direct backward induction algorithm that traverses information sets from leaves to root, making optimal action selections based on local subtree computations. Our approach unifies the computational procedure with its theoretical justification, offering a more intuitive, tree-structured alternative to LP-based methods while maintaining O(n) time complexity. This work clarifies the fundamental connection between backward induction and best response computation in imperfect-information settings.

Key words: best response, imperfect-information games, backward induction, counterfactual best response, perfect recall

中图分类号: