Abstract
Recent advancements in medical image analysis have led to the development of highly specialized models tailored to specific clinical tasks. These models have demonstrated exceptional performance and remain a crucial research direction. Yet, their applicability is limited to predefined tasks, requiring expertise and extensive resources for development and adaptation. In contrast, generalist models offer a different form of utility: allowing medical practitioners to define tasks on the fly without the need for task-specific model development. In this work, we explore how to train generalist models for the domain of retinal optical coherence tomography using visual in-context learning (VICL), i.e., training models to generalize across tasks based on a few examples provided at inference time. To facilitate rigorous assessment, we propose a broad evaluation protocol tailored to VICL in OCT. We extensively evaluate a state-of-the-art medical VICL approach on multiple retinal OCT datasets, establishing a first baseline to highlight the potential and current limitations of in-context learning for OCT. To foster further research and practical adoption, we openly release our code.
Abstract (translated)
近期在医学图像分析领域的进展促使了针对特定临床任务的高度专业化模型的发展。这些模型展示了卓越的性能,依然是至关重要的研究方向。然而,它们的应用仅限于预定义的任务,并且需要专业知识和大量资源来开发和适应新的任务。相比之下,通才型模型提供了一种不同的实用性:允许医疗从业者在没有特定任务模型开发的情况下即时定义任务。在这项工作中,我们探讨了如何使用视觉上下文学习(VICL)训练通才型模型用于视网膜光学相干断层扫描领域,即基于推理时提供的少量示例来训练模型以跨不同任务泛化。为了促进严格的评估,我们提出了一套广泛的评估协议,专门针对OCT的VICL进行设计。我们在多个视网膜OCT数据集上对最先进的医学VICL方法进行了全面评价,并建立了第一个基准线,以此突出上下文学习在OCT中的潜力和当前局限性。为推动进一步研究和实际应用,我们开放发布了我们的代码。
URL
https://arxiv.org/abs/2506.15200