Abstract
Automatic radiology report summarization is a crucial clinical task, whose key challenge is to maintain factual accuracy between produced summaries and ground truth radiology findings. Existing research adopts reinforcement learning to directly optimize factual consistency metrics such as CheXBert or RadGraph score. However, their decoding method using greedy search or beam search considers no factual consistency when picking the optimal candidate, leading to limited factual consistency improvement. To address it, we propose a novel second-stage summarizing approach FactReranker, the first attempt that learns to choose the best summary from all candidates based on their estimated factual consistency score. We propose to extract medical facts of the input medical report, its gold summary, and candidate summaries based on the RadGraph schema and design the fact-guided reranker to efficiently incorporate the extracted medical facts for selecting the optimal summary. We decompose the fact-guided reranker into the factual knowledge graph generation and the factual scorer, which allows the reranker to model the mapping between the medical facts of the input text and its gold summary, thus can select the optimal summary even the gold summary can't be observed during inference. We also present a fact-based ranking metric (RadMRR) for measuring the ability of the reranker on selecting factual consistent candidates. Experimental results on two benchmark datasets demonstrate the superiority of our method in generating summaries with higher factual consistency scores when compared with existing methods.
Abstract (translated)
自动放射学报告总结是一个重要的临床任务,其关键挑战是保持产生的总结和基准事实放射学发现之间的事实准确性。现有的研究采用强化学习直接优化CheXBert或RadGraph等事实一致性指标。然而,他们使用贪心搜索或 beam搜索的解码方法在选取最佳候选人时不考虑事实一致性,导致事实一致性改进有限。为了解决这个问题,我们提出了一种新的第二阶段总结方法FactReranker,这是第一个尝试,学习从所有候选人中选择最佳总结,根据其估计的事实一致性得分。我们建议从输入医学报告的医学事实、其黄金总结和候选人总结中提取,并设计Fact guided reranker,以高效地将提取的医学事实纳入选择最佳的总结。我们将Fact guided reranker分解为事实知识图生成和事实评分,这使reranker可以模型输入文本中的医学事实和其黄金总结之间的关系,因此可以选择最佳的总结,即使在黄金总结无法在推理期间观察到时也是如此。我们还提出了一个基于事实的排名指标(RadMRR)以衡量reranker在选择事实一致性候选人方面的能力。在两个基准数据集上的实验结果证明,我们的方法和现有方法在生成具有更高事实一致性得分的总结方面相比,具有优越的性能。
URL
https://arxiv.org/abs/2303.08335