Training a high-performance retinal foundation model with half-the-data and 400 times less compute

Abstract
Abstract (translated)
URL
PDF

Abstract

Artificial Intelligence holds tremendous potential in medicine, but is traditionally limited by the lack of massive datasets to train models on. Foundation models, pre-trained models that can be adapted to downstream tasks with small datasets, could alleviate this problem. Researchers at Moorfields Eye Hospital (MEH) proposed RETFound-MEH, a foundation model for retinal imaging that was trained on 900,000 images, including private hospital data. Recently, data-efficient DERETFound was proposed that provides comparable performance while being trained on only 150,000 images that are all publicly available. However, both these models required very substantial resources to train initially and are resource-intensive in downstream use. We propose a novel Token Reconstruction objective that we use to train RETFound-Green, a retinal foundation model trained using only 75,000 publicly available images and 400 times less compute. We estimate the cost of training RETFound-MEH and DERETFound at $10,000 and $14,000, respectively, while RETFound-Green could be trained for less than $100, with equally reduced environmental impact. RETFound-Green is also far more efficient in downstream use: it can be downloaded 14 times faster, computes vector embeddings 2.7 times faster which then require 2.6 times less storage space. Despite this, RETFound-Green does not perform systematically worse. In fact, it performs best on 14 tasks, compared to six for DERETFound and two for RETFound-MEH. Our results suggest that RETFound-Green is a very efficient, high-performance retinal foundation model. We anticipate that our Token Reconstruction objective could be scaled up for even higher performance and be applied to other domains beyond retinal imaging.

Abstract (translated)

人工智能在医学领域具有巨大的潜力，但传统上受到大规模数据集缺乏的限制。基础模型、可以适应下游任务的小数据集预训练模型，可以缓解这一问题。英国莫费尔德眼科医院（MEH）的研究人员提出了RETFound-MEH，一种基于900,000张图像的视网膜成像基础模型，其中包括私立医院数据。最近，提出了数据高效的DERETFound模型，该模型在仅使用150,000张可公开获取的图像进行训练的同时，提供了与原模相同的表现。然而，这两种模型在训练初期都需要非常大量的资源，并且在下游使用时也是资源密集型。我们提出了一个新颖的标记重构目标，该目标我们用于训练RETFind-Green，一种仅使用75,000张公开可获取图像的视网膜基础模型，以及400倍较少的计算资源。我们估计RETFind-MEH和DERETFound的训练成本分别为10,000美元和14,000美元，而RETFind-Green的训练成本可以低于100美元，同时具有与原模相当程度的减少环境影响的效应。RETFind-Green在下游使用方面也远更高效：它可以实现14次更快地下载，同时计算向量嵌入速度比需要2.6倍更少的存储空间。尽管如此，RETFind-Green在系统内并没有表现出更差的表现。事实上，与DERETFound和RETFind-MEH相比，RETFind-Green在14个任务上表现最好，相对于六项任务，RETFind-Green的性能更好。我们的结果表明，RETFind-Green是一种非常高效、高性能的视网膜基础模型。我们预计，我们的标记重构目标可以进行扩展，以实现更高的性能，并应用于其他领域，而不仅仅是视网膜成像。

URL

https://arxiv.org/abs/2405.00117

PDF

https://arxiv.org/pdf/2405.00117.pdf

Training a high-performance retinal foundation model with half-the-data and 400 times less compute

Abstract

Abstract (translated)

URL

PDF Copy

PDF