Abstract
The burgeoning integration of 3D medical imaging into healthcare has led to a substantial increase in the workload of medical professionals. To assist clinicians in their diagnostic processes and alleviate their workload, the development of a robust system for retrieving similar case studies presents a viable solution. While the concept holds great promise, the field of 3D medical text-image retrieval is currently limited by the absence of robust evaluation benchmarks and curated datasets. To remedy this, our study presents a groundbreaking dataset, BIMCV-R (This dataset will be released upon acceptance.), which includes an extensive collection of 8,069 3D CT volumes, encompassing over 2 million slices, paired with their respective radiological reports. Expanding upon the foundational work of our dataset, we craft a retrieval strategy, MedFinder. This approach employs a dual-stream network architecture, harnessing the potential of large language models to advance the field of medical image retrieval beyond existing text-image retrieval solutions. It marks our preliminary step towards developing a system capable of facilitating text-to-image, image-to-text, and keyword-based retrieval tasks.
Abstract (translated)
3D医疗影像融入医疗行业,导致医疗专业人员的负担大幅增加。为了帮助临床医生在诊断过程中减轻负担,开发一个稳健的系统检索类似病历片是一个可行的解决方案。虽然这一概念具有很大的潜力,但3D医疗文本图像检索领域目前仍然受到缺乏可靠评估标准和精心挑选的数据集的限制。为了弥补这一不足,我们的研究向我们展示了令人印象深刻的 dataset BIMCV-R(该数据集将在接受审核后发布),其中包括8,069个3D CT volume,涵盖超过200万切片,并与各自的放射学报告相对应。在拓展我们数据集的基础工作之上,我们制定了检索策略,MedFinder。这种方法采用了一种双流网络架构,利用大型语言模型的潜力,将医疗图像检索领域从现有的文本-图像检索解决方案中推向更远。这标志着我们迈向开发一个能够促进文本-图像、图像-文本和关键词-基础检索任务的系统的第一步。
URL
https://arxiv.org/abs/2403.15992