Paper Reading AI Learner

ChEX: Interactive Localization and Region Description in Chest X-rays

2024-04-24 09:44:44
Philip Müller, Georgios Kaissis, Daniel Rueckert

Abstract

Report generation models offer fine-grained textual interpretations of medical images like chest X-rays, yet they often lack interactivity (i.e. the ability to steer the generation process through user queries) and localized interpretability (i.e. visually grounding their predictions), which we deem essential for future adoption in clinical practice. While there have been efforts to tackle these issues, they are either limited in their interactivity by not supporting textual queries or fail to also offer localized interpretability. Therefore, we propose a novel multitask architecture and training paradigm integrating textual prompts and bounding boxes for diverse aspects like anatomical regions and pathologies. We call this approach the Chest X-Ray Explainer (ChEX). Evaluations across a heterogeneous set of 9 chest X-ray tasks, including localized image interpretation and report generation, showcase its competitiveness with SOTA models while additional analysis demonstrates ChEX's interactive capabilities.

Abstract (translated)

报告生成模型可以提供对医学图像(如胸部X光片)的细粒度文本解释,但通常缺乏交互性(即通过用户查询引导生成过程的能力)和本地解释性(即通过视觉将预测与图像相结合的能力),我们认为这对于未来在临床实践中采用是至关重要的。尽管已经做出了努力来解决这些问题,但它们要么因不支持文本查询而限制了交互性,要么未能提供本地解释性。因此,我们提出了一个新颖的多任务架构和训练范式,结合文本提示和边界框来处理各种方面(如解剖区域和疾病)。我们将这种方法称为胸部X光片解释器(ChEX)。在包括局部图像解释和报告生成的异质集合9个胸部X光片任务中进行的评估显示,与最先进的模型相比,ChEX具有竞争力,而其他分析则展示了ChEX的交互能力。

URL

https://arxiv.org/abs/2404.15770

PDF

https://arxiv.org/pdf/2404.15770.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot