SIA：意図認識を通じて視覚言語モデルの安全性を強化する

2507.16856v1

日本語タイトル#

SIA：意図認識による視覚言語モデルの安全性向上

英文タイトル#

SIA: Enhancing Safety via Intent Awareness for Vision-Language Models

日本語摘要#

視覚 - 言語モデル（VLMs）が現実のアプリケーションでますます展開される中、画像とテキストの微妙な相互作用から新たな安全リスクが生じています。特に、一見無害な入力が組み合わさることで有害な意図が明らかになり、安全でないモデルの応答を引き起こす可能性があります。マルチモーダルの安全性への関心が高まる中、事後的なフィルタリングや静的な拒否プロンプトに基づく従来のアプローチは、特に有害性が入力の組み合わせからのみ現れる場合に、こうした潜在的リスクを検出するのに苦労しています。私たちは、SIA（意図認識による安全性向上）を提案します。これは、マルチモーダル入力における有害な意図を積極的に検出し緩和する、トレーニング不要のプロンプトエンジニアリングフレームワークです。SIA は、(1) キャプションを通じた視覚的抽象化、(2) 少数の例による思考の連鎖を通じた意図推論、(3) 意図条件付き応答の洗練という三段階の推論プロセスを採用しています。SIA は、事前に定義されたルールや分類器に依存するのではなく、画像 - テキストペアから推論された暗黙の意図に動的に適応します。SIUO、MM-SafetyBench、HoliSafe を含む安全性が重要なベンチマークでの広範な実験を通じて、SIA が以前の方法を上回る顕著な安全性の向上を達成したことを示します。SIA は MMStar での一般的な推論精度がわずかに低下するものの、対応する安全性の向上は、VLM を人間中心の価値観に整合させる上での意図認識推論の価値を際立たせています。

英文摘要#

As vision-language models (VLMs) are increasingly deployed in real-world applications, new safety risks arise from the subtle interplay between images and text. In particular, seemingly innocuous inputs can combine to reveal harmful intent, leading to unsafe model responses. Despite increasing attention to multimodal safety, previous approaches based on post hoc filtering or static refusal prompts struggle to detect such latent risks, especially when harmfulness emerges only from the combination of inputs. We propose SIA (Safety via Intent Awareness), a training-free prompt engineering framework that proactively detects and mitigates harmful intent in multimodal inputs. SIA employs a three-stage reasoning process: (1) visual abstraction via captioning, (2) intent inference through few-shot chain-of-thought prompting, and (3) intent-conditioned response refinement. Rather than relying on predefined rules or classifiers, SIA dynamically adapts to the implicit intent inferred from the image-text pair. Through extensive experiments on safety-critical benchmarks including SIUO, MM-SafetyBench, and HoliSafe, we demonstrate that SIA achieves substantial safety improvements, outperforming prior methods. Although SIA shows a minor reduction in general reasoning accuracy on MMStar, the corresponding safety gains highlight the value of intent-aware reasoning in aligning VLMs with human-centric values.

PDF 取得#

中文 PDF を表示 - 2507.16856v1

スマート達人抖店 QR コード

抖音で QR コードをスキャンしてさらに素晴らしいコンテンツを確認