Paper Reading AI Learner

Data Alignment for Zero-Shot Concept Generation in Dermatology AI

2024-04-19 17:57:29
Soham Gadgil, Mahtab Bigverdi

Abstract

AI in dermatology is evolving at a rapid pace but the major limitation to training trustworthy classifiers is the scarcity of data with ground-truth concept level labels, which are meta-labels semantically meaningful to humans. Foundation models like CLIP providing zero-shot capabilities can help alleviate this challenge by leveraging vast amounts of image-caption pairs available on the internet. CLIP can be fine-tuned using domain specific image-caption pairs to improve classification performance. However, CLIP's pre-training data is not well-aligned with the medical jargon that clinicians use to perform diagnoses. The development of large language models (LLMs) in recent years has led to the possibility of leveraging the expressive nature of these models to generate rich text. Our goal is to use these models to generate caption text that aligns well with both the clinical lexicon and with the natural human language used in CLIP's pre-training data. Starting with captions used for images in PubMed articles, we extend them by passing the raw captions through an LLM fine-tuned on the field's several textbooks. We find that using captions generated by an expressive fine-tuned LLM like GPT-3.5 improves downstream zero-shot concept classification performance.

Abstract (translated)

翻译:皮肤病学中的人工智能正在以快速发展的速度不断演变,但训练可靠分类器的主要限制是缺乏具有真实概念级别标签的数据,这些标签在人类中具有语义意义。像CLIP这样的基础模型提供零散能力,通过利用互联网上大量可用的图像标题对数据进行微调,可以缓解这一挑战。CLIP可以通过针对特定领域的图像标题进行微调来提高分类性能。然而,CLIP的预训练数据与医生在诊断过程中使用的医学术语并不完全对齐。近年来,大型语言模型(LLMs)的发展使得利用这些模型的表达性特征生成丰富文本的可能性成为可能。我们的目标是使用这些模型生成与临床词汇和CLIP预训练数据中自然人类语言相符的文本。从PubMed文章中使用的图像的摘要开始,我们通过在几个教材上微调的LLM对原始摘要进行扩展。我们发现,使用像GPT-3.5这样的具有表达性的微调的LLM生成captions可以提高下游的零散概念分类性能。

URL

https://arxiv.org/abs/2404.13043

PDF

https://arxiv.org/pdf/2404.13043.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot