谁看到什么？用于LLMs认知推理的结构化思维-行动序列

2508.14564v1

中文标题#

谁看到什么？用于 LLMs 认知推理的结构化思维 - 行动序列

英文标题#

Who Sees What? Structured Thought-Action Sequences for Epistemic Reasoning in LLMs

中文摘要#

近年来，大型语言模型（LLMs）和推理框架的进展为提升自主代理的换位思考能力提供了新的可能性。然而，涉及主动感知、协作推理和换位思考的任务（理解另一个代理能看到或知道什么）对当前基于 LLM 的系统来说仍然具有持续的挑战性。本研究探讨了由 Fast Downward 规划器生成的转换解决方案图中得出的结构化示例，以在 ReAct 框架内提高基于 LLM 的代理性能的潜力。我们提出了一种结构化的解决方案处理流程，生成三种不同类别的示例：最优目标路径（G 型）、信息节点路径（E 型）以及对比替代动作的逐步最优决策序列（L 型）。这些解决方案通过提示 LLM 明确阐述每个决策背后的推理过程，进一步转换为 “思考 - 行动” 示例。虽然 L 型示例略微减少了澄清请求和整体行动步骤，但它们并未产生一致的改进。代理在需要基本注意过滤的任务中表现成功，但在需要关于被遮挡空间的心理化或权衡认识论行动成本的场景中则遇到困难。这些发现表明，仅靠结构化示例不足以实现稳健的换位思考，强调了显式信念跟踪、成本建模和更丰富的环境的必要性，以在基于 LLM 的代理中实现社会基础的合作。

英文摘要#

Recent advances in large language models (LLMs) and reasoning frameworks have opened new possibilities for improving the perspective -taking capabilities of autonomous agents. However, tasks that involve active perception, collaborative reasoning, and perspective taking (understanding what another agent can see or knows) pose persistent challenges for current LLM-based systems. This study investigates the potential of structured examples derived from transformed solution graphs generated by the Fast Downward planner to improve the performance of LLM-based agents within a ReAct framework. We propose a structured solution-processing pipeline that generates three distinct categories of examples: optimal goal paths (G-type), informative node paths (E-type), and step-by-step optimal decision sequences contrasting alternative actions (L-type). These solutions are further converted into ``thought-action'' examples by prompting an LLM to explicitly articulate the reasoning behind each decision. While L-type examples slightly reduce clarification requests and overall action steps, they do not yield consistent improvements. Agents are successful in tasks requiring basic attentional filtering but struggle in scenarios that required mentalising about occluded spaces or weighing the costs of epistemic actions. These findings suggest that structured examples alone are insufficient for robust perspective-taking, underscoring the need for explicit belief tracking, cost modelling, and richer environments to enable socially grounded collaboration in LLM-based agents.

文章页面#

谁看到什么？用于 LLMs 认知推理的结构化思维 - 行动序列

PDF 获取#

查看中文 PDF - 2508.14564v1

智能达人抖店二维码

抖音扫码查看更多精彩内容