zikele

zikele

人生如此自可乐

RadReason:带有原因和子分数的放射学报告评估指标

2508.15464v1

中文标题#

RadReason:带有原因和子分数的放射学报告评估指标

英文标题#

RadReason: Radiology Report Evaluation Metric with Reasons and Sub-Scores

中文摘要#

评估自动生成的放射科报告仍然是一个基本挑战,因为缺乏具有临床依据、可解释且细粒度的指标。 现有方法要么生成粗略的整体评分,要么依赖于不透明的黑盒模型,限制了它们在现实临床工作流程中的实用性。 我们引入了 RadReason,这是一种用于放射科报告的新评估框架,不仅可以输出六个临床定义的错误类型的细粒度子评分,还能生成人类可读的解释,说明每个评分的依据。 我们的方法基于组相对策略优化,并结合了两个关键创新:(1) 子评分动态加权,根据实时 F1 统计信息自适应地优先考虑临床上具有挑战性的错误类型;以及 (2) 众数引导的优势缩放,根据从子评分一致性中得出的提示难度调整策略梯度更新。 这些组件共同实现了更稳定的优化,并更好地与专家临床判断对齐。 在 ReXVal 基准上的实验表明,RadReason 超越了所有先前的离线指标,并达到了与基于 GPT-4 的评估相当的水平,同时保持可解释性、成本效率,并适合临床部署。 代码将在发表后发布。

英文摘要#

Evaluating automatically generated radiology reports remains a fundamental challenge due to the lack of clinically grounded, interpretable, and fine-grained metrics. Existing methods either produce coarse overall scores or rely on opaque black-box models, limiting their usefulness in real-world clinical workflows. We introduce RadReason, a novel evaluation framework for radiology reports that not only outputs fine-grained sub-scores across six clinically defined error types, but also produces human-readable justifications that explain the rationale behind each score. Our method builds on Group Relative Policy Optimization and incorporates two key innovations: (1) Sub-score Dynamic Weighting, which adaptively prioritizes clinically challenging error types based on live F1 statistics; and (2) Majority-Guided Advantage Scaling, which adjusts policy gradient updates based on prompt difficulty derived from sub-score agreement. Together, these components enable more stable optimization and better alignment with expert clinical judgment. Experiments on the ReXVal benchmark show that RadReason surpasses all prior offline metrics and achieves parity with GPT-4-based evaluations, while remaining explainable, cost-efficient, and suitable for clinical deployment. Code will be released upon publication.

文章页面#

RadReason:带有原因和子分数的放射学报告评估指标

PDF 获取#

查看中文 PDF - 2508.15464v1

智能达人抖店二维码

抖音扫码查看更多精彩内容

読み込み中...
文章は、創作者によって署名され、ブロックチェーンに安全に保存されています。