階層化と意味的正規化された知識グラフによる視覚的物語のロバストシンボル推論

2508.14941v1

日本語タイトル#

階層的かつ意味的に正規化された知識グラフによる視覚的物語の堅牢な記号推論

英文タイトル#

Robust Symbolic Reasoning for Visual Narratives via Hierarchical and Semantically Normalized Knowledge Graphs

日本語摘要#

漫画のような視覚的物語を理解するには、ストーリーの構造を捉える構造化された表現が必要です。これには、複数のレベルのストーリー構成におけるイベント、キャラクター、およびそれらの関係が含まれます。しかし、記号的な物語グラフは、類似のアクションやイベントが異なる注釈や文脈で異なるラベル付けされるため、一貫性や冗長性の問題に悩まされることがよくあります。このような変動は、推論や一般化の効果を制限します。本論文では、階層的な物語知識グラフのための意味的正規化フレームワークを紹介します。認知に基づいた物語理解モデルに基づいて、語彙の類似性と埋め込みベースのクラスタリングを使用して意味的に関連するアクションとイベントを統合する方法を提案します。正規化プロセスは、注釈ノイズを減少させ、物語レベル間の記号カテゴリを整列させ、解釈可能性を保持します。私たちは、Manga109 データセットの注釈付き漫画ストーリーにおいてこのフレームワークを示し、パネルレベル、イベントレベル、ストーリーレベルのグラフに正規化を適用しました。アクション検索、キャラクターの位置付け、イベント要約などの物語推論タスクにおける初期評価は、意味的正規化が一貫性と堅牢性を向上させ、記号の透明性を維持することを示しています。これらの発見は、正規化がスケーラブルで認知に触発されたマルチモーダル物語理解グラフモデルへの重要なステップであることを示唆しています。

英文摘要#

Understanding visual narratives such as comics requires structured representations that capture events, characters, and their relations across multiple levels of story organization. However, symbolic narrative graphs often suffer from inconsistency and redundancy, where similar actions or events are labeled differently across annotations or contexts. Such variance limits the effectiveness of reasoning and generalization. This paper introduces a semantic normalization framework for hierarchical narrative knowledge graphs. Building on cognitively grounded models of narrative comprehension, we propose methods that consolidate semantically related actions and events using lexical similarity and embedding-based clustering. The normalization process reduces annotation noise, aligns symbolic categories across narrative levels, and preserves interpretability. We demonstrate the framework on annotated manga stories from the Manga109 dataset, applying normalization to panel-, event-, and story-level graphs. Preliminary evaluations across narrative reasoning tasks, such as action retrieval, character grounding, and event summarization, show that semantic normalization improves coherence and robustness, while maintaining symbolic transparency. These findings suggest that normalization is a key step toward scalable, cognitively inspired graph models for multimodal narrative understanding.

文章ページ#

階層的かつ意味的に正規化された知識グラフによる視覚的物語の堅牢な記号推論

PDF 获取#

日本語 PDF を表示 - 2508.14941v1

スマート達人の抖店 QR コード

抖音でさらに素晴らしいコンテンツをスキャンして見る