Paper Reading AI Learner

LaPA: Latent Prompt Assist Model For Medical Visual Question Answering

2024-04-19 17:51:52
Tiancheng Gu, Kaicheng Yang, Dongnan Liu, Weidong Cai

Abstract

Medical visual question answering (Med-VQA) aims to automate the prediction of correct answers for medical images and questions, thereby assisting physicians in reducing repetitive tasks and alleviating their workload. Existing approaches primarily focus on pre-training models using additional and comprehensive datasets, followed by fine-tuning to enhance performance in downstream tasks. However, there is also significant value in exploring existing models to extract clinically relevant information. In this paper, we propose the Latent Prompt Assist model (LaPA) for medical visual question answering. Firstly, we design a latent prompt generation module to generate the latent prompt with the constraint of the target answer. Subsequently, we propose a multi-modal fusion block with latent prompt fusion module that utilizes the latent prompt to extract clinical-relevant information from uni-modal and multi-modal features. Additionally, we introduce a prior knowledge fusion module to integrate the relationship between diseases and organs with the clinical-relevant information. Finally, we combine the final integrated information with image-language cross-modal information to predict the final answers. Experimental results on three publicly available Med-VQA datasets demonstrate that LaPA outperforms the state-of-the-art model ARL, achieving improvements of 1.83%, 0.63%, and 1.80% on VQA-RAD, SLAKE, and VQA-2019, respectively. The code is publicly available at this https URL.

Abstract (translated)

医疗视觉问题回答(Med-VQA)旨在自动化的目的是辅助医生减少重复的任务,减轻他们的负担。现有的方法主要集中在使用额外的全面数据集进行预训练,然后进行微调以提高下游任务的性能。然而,探索现有的模型以提取有关临床相关的信息也非常重要。在本文中,我们提出了LaPA模型,用于医疗视觉问题回答。首先,我们设计了一个基于目标答案的隐式提示生成模块。然后,我们提出了一个多模态融合模块,利用隐式提示从单模态和多模态特征中提取临床相关信息。此外,我们还引入了一个先验知识融合模块,将疾病和器官的关系与临床相关信息集成起来。最后,我们将最终整合的信息与图像语言跨模态信息相结合来预测最终答案。在三个公开可用的Med-VQA数据集上的实验结果表明,LaPA在ARL(最先进的模型)上实现了卓越的性能,分别取得了VQA-RAD 1.83%,SLAKE 0.63%和VQA-2019 1.80%的改进。代码公开可用,在https://www. this URL。

URL

https://arxiv.org/abs/2404.13039

PDF

https://arxiv.org/pdf/2404.13039.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot