Abstract
Foundation models, including language models, e.g., GPT, and vision models, e.g., CLIP, have significantly advanced numerous biomedical tasks. Despite these advancements, the high inference latency and the "overthinking" issues in model inference impair the efficiency and effectiveness of foundation models, thus limiting their application in real-time clinical settings. To address these challenges, we proposed EPEE (Entropy- and Patience-based Early Exiting), a novel hybrid strategy designed to improve the inference efficiency of foundation models. The core idea was to leverage the strengths of entropy-based and patience-based early exiting methods to overcome their respective weaknesses. To evaluate EPEE, we conducted experiments on three core biomedical tasks-classification, relation extraction, and event extraction-using four foundation models (BERT, ALBERT, GPT-2, and ViT) across twelve datasets, including clinical notes and medical images. The results showed that EPEE significantly reduced inference time while maintaining or improving accuracy, demonstrating its adaptability to diverse datasets and tasks. EPEE addressed critical barriers to deploying foundation models in healthcare by balancing efficiency and effectiveness. It potentially provided a practical solution for real-time clinical decision-making with foundation models, supporting reliable and efficient workflows.
Abstract (translated)
基础模型(包括语言模型,如GPT,和视觉模型,如CLIP)在众多生物医学任务中取得了显著进展。尽管这些进步已经实现,但在模型推断过程中存在的高延迟问题以及“过度思考”现象仍然阻碍了基础模型的效率与效果,从而限制其在临床实时应用场景中的应用。为了应对这些挑战,我们提出了EPEE(基于熵和耐心的早期退出策略),这是一种旨在提高基础模型推理效率的新颖混合策略。该方法的核心思想是利用熵为基础和以耐心为基础的早期退出方法的优点来克服各自的弱点。 为了评估EPEE的效果,我们在三个核心生物医学任务——分类、关系抽取和事件抽取上进行了实验,并使用了四种基础模型(BERT、ALBERT、GPT-2 和 ViT)在十二个数据集(包括临床记录和医学影像)上进行测试。结果表明,EPEE显著缩短了推理时间,同时保持甚至提升了准确性,展示了其适应多样数据集与任务的能力。 通过平衡效率与效果,EPEE解决了部署基础模型到医疗保健中的关键障碍,并为实时临床决策中应用基础模型提供了潜在的实际解决方案,支持可靠的高效工作流程。
URL
https://arxiv.org/abs/2503.02053