Abstract
Recent advancements in Computer Assisted Diagnosis have shown promising performance in medical imaging tasks, particularly in chest X-ray analysis. However, the interaction between these models and radiologists has been primarily limited to input images. This work proposes a novel approach to enhance human-computer interaction in chest X-ray analysis using Vision-Language Models (VLMs) enhanced with radiologists' attention by incorporating eye gaze data alongside textual prompts. Our approach leverages heatmaps generated from eye gaze data, overlaying them onto medical images to highlight areas of intense radiologist's focus during chest X-ray evaluation. We evaluate this methodology in tasks such as visual question answering, chest X-ray report automation, error detection, and differential diagnosis. Our results demonstrate the inclusion of eye gaze information significantly enhances the accuracy of chest X-ray analysis. Also, the impact of eye gaze on fine-tuning was confirmed as it outperformed other medical VLMs in all tasks except visual question answering. This work marks the potential of leveraging both the VLM's capabilities and the radiologist's domain knowledge to improve the capabilities of AI models in medical imaging, paving a novel way for Computer Assisted Diagnosis with a human-centred AI.
Abstract (translated)
近年来,计算机辅助诊断(Computer Assisted Diagnosis,CAD)技术在医学影像任务中的表现已经取得了显著的进展,特别是在胸部X光片分析方面。然而,这些模型与放射科医生的交互主要局限于输入图像。本文提出了一种利用视觉语言模型(VLMs)增强放射科医生注意力的方法,以实现与放射科医生协同工作,从而在胸部X光片分析中提高人机交互。我们的方法将目光数据生成的热图叠加在医学图像上,突出显示了胸部X光片评估过程中放射科医生关注区域的强度。我们对这种方法在诸如视觉问答、胸部X光片报告自动化、错误检测和差异诊断等任务中进行了评估。我们的研究结果表明,包括目光信息可以显著提高胸部X光片分析的准确性。此外,目光对细粒度调整的影响得到了证实,因为在除视觉问答外的所有任务中,它的表现优于其他医疗VLM。这项工作为将VLM的功能和放射科医生的专业知识相结合,改进医疗影像领域的人工智能模型提供了可能,为以人为中心的AI辅助诊断铺平了新的道路。
URL
https://arxiv.org/abs/2404.02370