Abstract
Large language models like ChatGPT have shown substantial progress in natural language understanding and generation, proving valuable across various disciplines, including the medical field. Despite advancements, challenges persist due to the complexity and diversity inherent in medical tasks which often require multi-task learning capabilities. Previous approaches, although beneficial, fall short in real-world applications because they necessitate task-specific annotations at inference time, limiting broader generalization. This paper introduces MING-MOE, a novel Mixture-of-Expert~(MOE)-based medical large language model designed to manage diverse and complex medical tasks without requiring task-specific annotations, thus enhancing its usability across extensive datasets. MING-MOE employs a Mixture of Low-Rank Adaptation (MoLoRA) technique, allowing for efficient parameter usage by maintaining base model parameters static while adapting through a minimal set of trainable parameters. We demonstrate that MING-MOE achieves state-of-the-art (SOTA) performance on over 20 medical tasks, illustrating a significant improvement over existing models. This approach not only extends the capabilities of medical language models but also improves inference efficiency.
Abstract (translated)
大语言模型如ChatGPT在自然语言理解和生成方面取得了实质性进展,在医学领域等各种学科中都有价值。尽管取得了进步,但医疗任务的复杂性和多样性仍然存在挑战。之前的方法虽然有益,但在实际应用中仍然存在局限性,因为它们在推理时需要特定的任务注释,这限制了模型的更广泛的应用。本文介绍了MING-MOE,一种新颖的基于专家混合(MOE)的医疗大语言模型,旨在管理多样和复杂的医疗任务,而不需要具体的任务注释,从而提高了在广泛数据集上的可用性。MING-MOE采用了一种名为莫尔比乌斯低秩适应(MoLoRA)的技术,通过保持基模型参数静态的同时通过最小的一组可训练参数进行适应,实现了高效的数据参数使用。我们证明了MING-MOE在超过20个医疗任务上实现了最先进的性能,比现有模型有了显著的改进。这种方法不仅扩展了医疗语言模型的能力,还提高了推理效率。
URL
https://arxiv.org/abs/2404.09027