zikele

zikele

人生如此自可乐

基于记忆锚定的多模态推理用于可解释的视频取证

2508.14581v1

中文标题#

基于记忆锚定的多模态推理用于可解释的视频取证

英文标题#

Memory-Anchored Multimodal Reasoning for Explainable Video Forensics

中文摘要#

我们通过提出 FakeHunter,一种结合记忆引导检索、结构化观察 - 思考 - 行动推理循环和自适应取证工具调用的统一框架,解决了需要鲁棒性和可解释性的多模态深度伪造检测问题。 来自对比语言 - 图像预训练(CLIP)模型的视觉表示和来自对比语言 - 音频预训练(CLAP)模型的音频表示从大规模记忆中检索语义对齐的真实示例,提供上下文锚点,指导可疑篡改的迭代定位和解释。 在内部置信度较低时,该框架会选择性地触发细粒度分析,如空间区域缩放和梅谱图检查,以收集区分性证据,而不是依赖不透明的边缘分数。 我们还发布了 X-AVFake,一个全面的音视频伪造基准,具有细粒度的篡改类型、受影响区域或实体、推理类别和解释性依据的标注,旨在强调上下文基础和解释的真实性。 大量实验表明,FakeHunter 超越了强大的多模态基线,消融研究证实,上下文检索和选择性工具激活对于提高鲁棒性和解释精度都是不可或缺的。

英文摘要#

We address multimodal deepfake detection requiring both robustness and interpretability by proposing FakeHunter, a unified framework that combines memory guided retrieval, a structured Observation-Thought-Action reasoning loop, and adaptive forensic tool invocation. Visual representations from a Contrastive Language-Image Pretraining (CLIP) model and audio representations from a Contrastive Language-Audio Pretraining (CLAP) model retrieve semantically aligned authentic exemplars from a large scale memory, providing contextual anchors that guide iterative localization and explanation of suspected manipulations. Under low internal confidence the framework selectively triggers fine grained analyses such as spatial region zoom and mel spectrogram inspection to gather discriminative evidence instead of relying on opaque marginal scores. We also release X-AVFake, a comprehensive audio visual forgery benchmark with fine grained annotations of manipulation type, affected region or entity, reasoning category, and explanatory justification, designed to stress contextual grounding and explanation fidelity. Extensive experiments show that FakeHunter surpasses strong multimodal baselines, and ablation studies confirm that both contextual retrieval and selective tool activation are indispensable for improved robustness and explanatory precision.

文章页面#

基于记忆锚定的多模态推理用于可解释的视频取证

PDF 获取#

查看中文 PDF - 2508.14581v1

智能达人抖店二维码

抖音扫码查看更多精彩内容

読み込み中...
文章は、創作者によって署名され、ブロックチェーンに安全に保存されています。