Abstract
Deep learning models can be applied successfully in real-work problems; however, training most of these models requires massive data. Recent methods use language and vision, but unfortunately, they rely on datasets that are not usually publicly available. Here we pave the way for further research in the multimodal language-vision domain for radiology. In this paper, we train a representation learning method that uses local and global representations of the language and vision through an attention mechanism and based on the publicly available Indiana University Radiology Report (IU-RR) dataset. Furthermore, we use the learned representations to diagnose five lung pathologies: atelectasis, cardiomegaly, edema, pleural effusion, and consolidation. Finally, we use both supervised and zero-shot classifications to extensively analyze the performance of the representation learning on the IU-RR dataset. Average Area Under the Curve (AUC) is used to evaluate the accuracy of the classifiers for classifying the five lung pathologies. The average AUC for classifying the five lung pathologies on the IU-RR test set ranged from 0.85 to 0.87 using the different training datasets, namely CheXpert and CheXphoto. These results compare favorably to other studies using UI-RR. Extensive experiments confirm consistent results for classifying lung pathologies using the multimodal global local representations of language and vision information.
Abstract (translated)
深度学习模型可以在实际工作问题中成功应用;然而,训练大多数模型需要大量的数据。最近的方法使用语言和视觉,但不幸的是,它们依赖于通常不公开的数据集。在这里,我们为放射学多模态语言-视觉领域的进一步研究开辟了道路。在本文中,我们训练了一种表示学习方法,该方法使用语言和视觉的局部和全球表示而是通过注意力机制,基于公开的印第安纳大学放射学报告(IU-RR)数据集。此外,我们使用 learned 表示来学习诊断五种肺部疾病:肺不愈合、肺动脉高压、黏液积聚、肺泡液体和肺组织修复。最后,我们使用监督和零次分类广泛分析表示学习在IU-RR数据集上的表现。平均曲线下面积(AUC)用于评估分类器对分类五种肺部疾病的精度。在IU-RR测试集上分类五种肺部疾病的平均AUC使用不同的训练数据集,即 CheXpert和CheXphoto, ranging from 0.85 to 0.87。这些结果与使用UI-RR的其他研究相比非常接近。广泛的实验确认了使用语言和视觉信息 multimodal 全球 local 表示进行分类肺部疾病呈现一致的结果。
URL
https://arxiv.org/abs/2301.10951