Paper Reading AI Learner

Cross Modal Global Local Representation Learning from Radiology Reports and X-Ray Chest Images

2023-01-26 06:02:28
Nathan Hadjiyski, Ali Vosoughi, Axel Wismueller

Abstract

Deep learning models can be applied successfully in real-work problems; however, training most of these models requires massive data. Recent methods use language and vision, but unfortunately, they rely on datasets that are not usually publicly available. Here we pave the way for further research in the multimodal language-vision domain for radiology. In this paper, we train a representation learning method that uses local and global representations of the language and vision through an attention mechanism and based on the publicly available Indiana University Radiology Report (IU-RR) dataset. Furthermore, we use the learned representations to diagnose five lung pathologies: atelectasis, cardiomegaly, edema, pleural effusion, and consolidation. Finally, we use both supervised and zero-shot classifications to extensively analyze the performance of the representation learning on the IU-RR dataset. Average Area Under the Curve (AUC) is used to evaluate the accuracy of the classifiers for classifying the five lung pathologies. The average AUC for classifying the five lung pathologies on the IU-RR test set ranged from 0.85 to 0.87 using the different training datasets, namely CheXpert and CheXphoto. These results compare favorably to other studies using UI-RR. Extensive experiments confirm consistent results for classifying lung pathologies using the multimodal global local representations of language and vision information.

Abstract (translated)

深度学习模型可以在实际工作问题中成功应用;然而,训练大多数模型需要大量的数据。最近的方法使用语言和视觉,但不幸的是,它们依赖于通常不公开的数据集。在这里,我们为放射学多模态语言-视觉领域的进一步研究开辟了道路。在本文中,我们训练了一种表示学习方法,该方法使用语言和视觉的局部和全球表示而是通过注意力机制,基于公开的印第安纳大学放射学报告(IU-RR)数据集。此外,我们使用 learned 表示来学习诊断五种肺部疾病:肺不愈合、肺动脉高压、黏液积聚、肺泡液体和肺组织修复。最后,我们使用监督和零次分类广泛分析表示学习在IU-RR数据集上的表现。平均曲线下面积(AUC)用于评估分类器对分类五种肺部疾病的精度。在IU-RR测试集上分类五种肺部疾病的平均AUC使用不同的训练数据集,即 CheXpert和CheXphoto, ranging from 0.85 to 0.87。这些结果与使用UI-RR的其他研究相比非常接近。广泛的实验确认了使用语言和视觉信息 multimodal 全球 local 表示进行分类肺部疾病呈现一致的结果。

URL

https://arxiv.org/abs/2301.10951

PDF

https://arxiv.org/pdf/2301.10951.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot