RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis

Abstract
Abstract (translated)
URL
PDF

Abstract

Developing generalist foundation model has recently attracted tremendous attention among researchers in the field of AI for Medicine (AI4Medicine). A pivotal insight in developing these models is their reliance on dataset scaling, which emphasizes the requirements on developing open-source medical image datasets that incorporate diverse supervision signals across various imaging modalities. In this paper, we introduce RadGenome-Chest CT, a comprehensive, large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE. Specifically, we leverage the latest powerful universal segmentation and large language models, to extend the original datasets (over 25,692 non-contrast 3D chest CT volume and reports from 20,000 patients) from the following aspects: (i) organ-level segmentation masks covering 197 categories, which provide intermediate reasoning visual clues for interpretation; (ii) 665 K multi-granularity grounded reports, where each sentence of the report is linked to the corresponding anatomical region of CT volume in the form of a segmentation mask; (iii) 1.3 M grounded VQA pairs, where questions and answers are all linked with reference segmentation masks, enabling models to associate visual evidence with textual explanations. All grounded reports and VQA pairs in the validation set have gone through manual verification to ensure dataset quality. We believe that RadGenome-Chest CT can significantly advance the development of multimodal medical foundation models, by training to generate texts based on given segmentation regions, which is unattainable with previous relevant datasets. We will release all segmentation masks, grounded reports, and VQA pairs to facilitate further research and development in this field.

Abstract (translated)

在人工智能领域（AI4Medicine）的研究者中，开发通用基础模型最近引起了巨大的关注。这些模型的关键在于它们对数据集扩大的依赖，强调开发包含各种成像模式下不同监督信号的开放医疗图像数据集。在本文中，我们介绍了RadGenome-Chest CT，一个基于CT-RATE的全面、大规模、区域指导的3D chest CT解释数据集。具体来说，我们利用最先进的强大通用分割和大型语言模型，从以下方面扩展了原始数据集：（一）覆盖197个类别的器官级别分割掩码，为解释提供中间推理的视觉提示；（二）665K个多粒度 grounded 报告，其中每个报告的句子都与相应的 CT 体积的解剖区域通过分割掩码链接；（三）1.3M个 grounded VQA 对，其中问题及其答案都与参考分割掩码链接，使模型能够将视觉证据与文本解释相关联。所有验证集中的 grounded 报告和 VQA 对都经过手动验证，以确保数据集质量。我们相信，RadGenome-Chest CT 可以通过根据给定分割区域生成文本，从而显著推动多模态医疗基础模型的开发，这是之前相关数据集无法实现的。我们将释放所有分割掩码、 grounded 报告和 VQA 对，以促进该领域进一步的研究和发展。

URL

https://arxiv.org/abs/2404.16754

PDF

https://arxiv.org/pdf/2404.16754.pdf

RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis

Abstract

Abstract (translated)

URL

PDF Copy

PDF