Paper Reading AI Learner

Conquering the Retina: Bringing Visual in-Context Learning to OCT

2025-06-18 07:28:47
Alessio Negrini, Simon Rei{\ss}

Abstract

Recent advancements in medical image analysis have led to the development of highly specialized models tailored to specific clinical tasks. These models have demonstrated exceptional performance and remain a crucial research direction. Yet, their applicability is limited to predefined tasks, requiring expertise and extensive resources for development and adaptation. In contrast, generalist models offer a different form of utility: allowing medical practitioners to define tasks on the fly without the need for task-specific model development. In this work, we explore how to train generalist models for the domain of retinal optical coherence tomography using visual in-context learning (VICL), i.e., training models to generalize across tasks based on a few examples provided at inference time. To facilitate rigorous assessment, we propose a broad evaluation protocol tailored to VICL in OCT. We extensively evaluate a state-of-the-art medical VICL approach on multiple retinal OCT datasets, establishing a first baseline to highlight the potential and current limitations of in-context learning for OCT. To foster further research and practical adoption, we openly release our code.

Abstract (translated)

近期在医学图像分析领域的进展促使了针对特定临床任务的高度专业化模型的发展。这些模型展示了卓越的性能,依然是至关重要的研究方向。然而,它们的应用仅限于预定义的任务,并且需要专业知识和大量资源来开发和适应新的任务。相比之下,通才型模型提供了一种不同的实用性:允许医疗从业者在没有特定任务模型开发的情况下即时定义任务。在这项工作中,我们探讨了如何使用视觉上下文学习(VICL)训练通才型模型用于视网膜光学相干断层扫描领域,即基于推理时提供的少量示例来训练模型以跨不同任务泛化。为了促进严格的评估,我们提出了一套广泛的评估协议,专门针对OCT的VICL进行设计。我们在多个视网膜OCT数据集上对最先进的医学VICL方法进行了全面评价,并建立了第一个基准线,以此突出上下文学习在OCT中的潜力和当前局限性。为推动进一步研究和实际应用,我们开放发布了我们的代码。

URL

https://arxiv.org/abs/2506.15200

PDF

https://arxiv.org/pdf/2506.15200.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot