Abstract
Continual learning, the ability to acquire knowledge from new data while retaining previously learned information, is a fundamental challenge in machine learning. Various approaches, including memory replay, knowledge distillation, model regularization, and dynamic network expansion, have been proposed to address this issue. Thus far, dynamic network expansion methods have achieved state-of-the-art performance at the cost of incurring significant computational overhead. This is due to the need for additional model buffers, which makes it less feasible in resource-constrained settings, particularly in the medical domain. To overcome this challenge, we propose Dynamic Model Merging, DynaMMo, a method that merges multiple networks at different stages of model training to achieve better computational efficiency. Specifically, we employ lightweight learnable modules for each task and combine them into a unified model to minimize computational overhead. DynaMMo achieves this without compromising performance, offering a cost-effective solution for continual learning in medical applications. We evaluate DynaMMo on three publicly available datasets, demonstrating its effectiveness compared to existing approaches. DynaMMo offers around 10-fold reduction in GFLOPS with a small drop of 2.76 in average accuracy when compared to state-of-the-art dynamic-based approaches. The code implementation of this work will be available upon the acceptance of this work at this https URL.
Abstract (translated)
持续学习,在新数据上获取知识的同时保留之前学到的信息,是机器学习中的一个基本挑战。为解决这个问题,已经提出了各种方法,包括记忆回放、知识蒸馏、模型正则化和动态网络扩展。迄今为止,动态网络扩展方法已经在尖端性能的同时造成了显著的计算开销。这是因为需要额外的模型缓冲区,这使得在资源受限的环境中,特别是在医疗领域,实现起来更加困难。为了克服这个挑战,我们提出了动态模型合并,DynaMMo,一种将多个网络在模型训练的不同阶段合并以实现更好的计算效率的方法。具体来说,我们为每个任务采用轻量化的可学习模块,并将它们合并成一个统一的模型,以最小化计算开销。DynaMMo在没有牺牲性能的情况下实现了这一目标,为医疗应用中的持续学习提供了经济有效的解决方案。我们在三个公开可用的数据集上评估DynaMMo,证明了其与现有方法的优越性。与基于动态的方法相比,DynaMMo在GFLOPS方面的减少大约为10倍,而在平均准确性方面的下降不到3.00。本工作的代码实现将在本工作的提交被接受时在上述链接处提供。
URL
https://arxiv.org/abs/2404.14099