zikele

zikele

人生如此自可乐

使用大語言模型的顯式漏洞生成:超越對抗攻擊的研究

2507.10054v2

中文标题#

使用大語言模型的顯式漏洞生成:超越對抗攻擊的研究

英文标题#

Explicit Vulnerability Generation with LLMs: An Investigation Beyond Adversarial Attacks

中文摘要#

大型語言模型(LLMs)越來越多地被用作代碼助手,但當明確要求生成不安全代碼時,它們的行為仍知之甚少。 儘管之前的研究集中在意外的漏洞上,但本研究考察了一個更直接的威脅:當被提示時,開源 LLMs 生成易受攻擊的代碼。 我們提出了一種雙實驗設計:(1)動態提示,系統地在結構化模板中變化漏洞類型、用戶角色和提示措辭;以及(2)反向提示,從真實的易受攻擊代碼樣本中推導出自然語言提示。 我們使用靜態分析評估了三個開源 7B 參數模型(Qwen2、Mistral、Gemma),以評估生成漏洞的存在性和正確性。 我們的結果表明,所有模型經常生成請求的漏洞,儘管性能差異顯著。 在動態提示下,Gemma 在內存漏洞方面的正確性最高(例如,緩衝區溢出的正確率為 98.6%),而 Qwen2 在所有任務中表現出最平衡的性能。 我們發現,專業角色(例如 “DevOps 工程師”)始終比學生角色產生更高的成功率,並且直接與間接措辭的有效性取決於提示策略而反轉。 漏洞重現準確性與代碼複雜度呈非線性關係,在中等範圍內達到峰值。 我們的發現揭示了 LLMs 依賴模式回憶而非語義推理如何在其安全對齊中造成重大盲點,特別是在以看似合理的專業任務形式提出的請求中。

英文摘要#

Large Language Models (LLMs) are increasingly used as code assistants, yet their behavior when explicitly asked to generate insecure code remains poorly understood. While prior research has focused on unintended vulnerabilities, this study examines a more direct threat: open-source LLMs generating vulnerable code when prompted. We propose a dual experimental design: (1) Dynamic Prompting, which systematically varies vulnerability type, user persona, and prompt phrasing across structured templates; and (2) Reverse Prompting, which derives natural-language prompts from real vulnerable code samples. We evaluate three open-source 7B-parameter models (Qwen2, Mistral, Gemma) using static analysis to assess both the presence and correctness of generated vulnerabilities. Our results show that all models frequently generate the requested vulnerabilities, though with significant performance differences. Gemma achieves the highest correctness for memory vulnerabilities under Dynamic Prompting (e.g., 98.6% for buffer overflows), while Qwen2 demonstrates the most balanced performance across all tasks. We find that professional personas (e.g., "DevOps Engineer") consistently elicit higher success rates than student personas, and that the effectiveness of direct versus indirect phrasing is inverted depending on the prompting strategy. Vulnerability reproduction accuracy follows a non-linear pattern with code complexity, peaking in a moderate range. Our findings expose how LLMs' reliance on pattern recall over semantic reasoning creates significant blind spots in their safety alignments, particularly for requests framed as plausible professional tasks.

PDF 获取#

查看中文 PDF - 2507.10054v2

智能達人抖店二維碼

抖音掃碼查看更多精彩內容

載入中......
此文章數據所有權由區塊鏈加密技術和智能合約保障僅歸創作者所有。