Abstract
In the realm of Medical Visual Language Models (Med-VLMs), the quest for universal efficient fine-tuning mechanisms remains paramount, especially given researchers in interdisciplinary fields are often extremely short of training resources, yet largely unexplored. Given the unique challenges in the medical domain, such as limited data scope and significant domain-specific requirements, evaluating and adapting Parameter-Efficient Fine-Tuning (PEFT) methods specifically for Med-VLMs is essential. Most of the current PEFT methods on Med-VLMs have yet to be comprehensively investigated but mainly focus on adding some components to the model's structure or input. However, fine-tuning intrinsic model components often yields better generality and consistency, and its impact on the ultimate performance of Med-VLMs has been widely overlooked and remains understudied. In this paper, we endeavour to explore an alternative to traditional PEFT methods, especially the impact of fine-tuning LayerNorm layers, FFNs and Attention layers on the Med-VLMs. Our comprehensive studies span both small-scale and large-scale Med-VLMs, evaluating their performance under various fine-tuning paradigms across tasks such as Medical Visual Question Answering and Medical Imaging Report Generation. The findings reveal unique insights into the effects of intrinsic parameter fine-tuning methods on fine-tuning Med-VLMs to downstream tasks and expose fine-tuning solely the LayerNorm layers not only surpasses the efficiency of traditional PEFT methods but also retains the model's accuracy and generalization capabilities across a spectrum of medical downstream tasks. The experiments show LayerNorm fine-tuning's superior adaptability and scalability, particularly in the context of large-scale Med-VLMs.
Abstract (translated)
在医疗可视语言模型(Med-VLMs)领域,寻求通用的有效微调方法仍然是至关重要的,尤其是在跨学科领域的研究者通常缺乏训练资源的情况下,而这一领域也往往被广泛探索。考虑到医疗领域的独特挑战,如有限的数据范围和显著的领域特定要求,专门为Med-VLMs评估和适应参数高效的微调(PEFT)方法至关重要。目前,大多数关于Med-VLMs的PEFT方法尚未进行全面的调查,但主要集中在向模型结构或输入中添加一些组件。然而,微调固有模型组件通常会产生更好的泛化能力和一致性,对其在Med-VLMs最终性能的影响却被广泛忽视和未研究。在本文中,我们力求探讨一种不同于传统PEFT方法的新型选择,特别是对LayerNorm层、FFN和Attention层的微调对Med-VLMs的影响。我们全面的研究跨越了小规模和大型Med-VLMs,在各种任务上评估它们在不同微调范式下的性能,例如医疗视觉问答和医疗图像报告生成。研究结果揭示了在微调固有参数方法对微调Med-VLMs的影响以及仅对LayerNorm层进行微调不仅超越了传统PEFT方法的效率,而且保留了模型的准确性和泛化能力。实验表明,LayerNorm微调的适应性和可扩展性在大型Med-VLMs方面具有优势。
URL
https://arxiv.org/abs/2404.16385