中文标题#
LinkedIn 中的弱链接:在大型语言模型時代增強虛假資料檢測
英文标题#
Weak Links in LinkedIn: Enhancing Fake Profile Detection in the Age of LLMs
中文摘要#
大型語言模型(LLMs)使得在 LinkedIn 等平台上創建逼真的虛假資料變得更加容易。這對基於文本的虛假資料檢測器構成了重大風險。在本研究中,我們評估了現有檢測器對 LLM 生成的資料的魯棒性。雖然在檢測人工創建的虛假資料方面非常有效(誤接受率:6-7%),但現有檢測器無法識別 GPT 生成的資料(誤接受率:42-52%)。我們提出了一種 GPT 輔助的對抗訓練作為對策,將誤接受率恢復到 1-7% 之間,同時不影響誤拒絕率(0.5-2%)。消融研究表明,使用組合數值和文本嵌入訓練的檢測器表現出最高的魯棒性,其次是使用僅數值嵌入的檢測器,最後是使用僅文本嵌入的檢測器。對基於提示的 GPT-4Turbo 和人類評估者的分析進一步證實了需要如本研究中提出的強大自動化檢測器。
英文摘要#
Large Language Models (LLMs) have made it easier to create realistic fake profiles on platforms like LinkedIn. This poses a significant risk for text-based fake profile detectors. In this study, we evaluate the robustness of existing detectors against LLM-generated profiles. While highly effective in detecting manually created fake profiles (False Accept Rate: 6-7%), the existing detectors fail to identify GPT-generated profiles (False Accept Rate: 42-52%). We propose GPT-assisted adversarial training as a countermeasure, restoring the False Accept Rate to between 1-7% without impacting the False Reject Rates (0.5-2%). Ablation studies revealed that detectors trained on combined numerical and textual embeddings exhibit the highest robustness, followed by those using numerical-only embeddings, and lastly those using textual-only embeddings. Complementary analysis on the ability of prompt-based GPT-4Turbo and human evaluators affirms the need for robust automated detectors such as the one proposed in this study.
PDF 获取#
抖音掃碼查看更多精彩內容