誰が何を見た？ LLMsの認知推論のための構造化思考-行動シーケンス

2508.14564v1

日本語タイトル#

誰が何を見ているのか？ LLM における認識推論のための構造化思考 - 行動シーケンス

英文タイトル#

Who Sees What? Structured Thought-Action Sequences for Epistemic Reasoning in LLMs

日本語要約#

近年、大型言語モデル（LLMs）と推論フレームワークの進展は、自律エージェントの視点取得能力を向上させる新たな可能性を開きました。しかし、能動的な知覚、協調的な推論、視点取得（他のエージェントが何を見たり知ったりできるかを理解する）を含むタスクは、現在の LLM ベースのシステムにとって依然として持続的な課題を提示しています。本研究では、Fast Downward プランナーによって生成された変換解決策グラフから導き出された構造化された例の可能性を探求し、ReAct フレームワーク内での LLM ベースのエージェントの性能を向上させることを目指します。最適な目標経路（G 型）、情報ノード経路（E 型）、および代替行動の対比に基づく段階的最適決定シーケンス（L 型）の 3 つの異なるカテゴリの例を生成する構造化された解決策処理パイプラインを提案します。これらの解決策は、LLM に各決定の背後にある推論過程を明示的に説明するよう促すことで、「思考 - 行動」例にさらに変換されます。L 型の例は明確化リクエストと全体的な行動ステップをわずかに減少させますが、一貫した改善は見られません。エージェントは基本的な注意フィルタリングを必要とするタスクで成功を収めますが、遮蔽された空間についてのメンタル化や認識論的行動コストの重み付けを必要とするシナリオでは苦労します。これらの発見は、構造化された例だけでは堅牢な視点取得を実現するには不十分であることを示唆しており、LLM ベースのエージェントにおける社会的に基づく協力を可能にするために、明示的な信念追跡、コストモデリング、より豊かな環境の必要性を強調しています。

英文要約#

Recent advances in large language models (LLMs) and reasoning frameworks have opened new possibilities for improving the perspective-taking capabilities of autonomous agents. However, tasks that involve active perception, collaborative reasoning, and perspective taking (understanding what another agent can see or knows) pose persistent challenges for current LLM-based systems. This study investigates the potential of structured examples derived from transformed solution graphs generated by the Fast Downward planner to improve the performance of LLM-based agents within a ReAct framework. We propose a structured solution-processing pipeline that generates three distinct categories of examples: optimal goal paths (G-type), informative node paths (E-type), and step-by-step optimal decision sequences contrasting alternative actions (L-type). These solutions are further converted into ``thought-action'' examples by prompting an LLM to explicitly articulate the reasoning behind each decision. While L-type examples slightly reduce clarification requests and overall action steps, they do not yield consistent improvements. Agents are successful in tasks requiring basic attentional filtering but struggle in scenarios that required mentalising about occluded spaces or weighing the costs of epistemic actions. These findings suggest that structured examples alone are insufficient for robust perspective-taking, underscoring the need for explicit belief tracking, cost modelling, and richer environments to enable socially grounded collaboration in LLM-based agents.

文章ページ#

誰が何を見ているのか？ LLM における認識推論のための構造化思考 - 行動シーケンス

PDF 取得#

中文 PDF - 2508.14564v1 を表示

スマート達人抖店 QR コード

抖音でさらに素晴らしいコンテンツをチェック