Abstract
In this paper, we empirically study the optimization dynamics of multi-task learning, particularly focusing on those that govern a collection of tasks with significant data imbalance. We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks. We provide a thorough empirical study and analysis of this method's benefits showing that it achieves consistent improvements relative to the performance trade-off profile of standard static weighting. We analyze under what data regimes this method is applicable and show its improvements empirically in neural machine translation (NMT) and multi-lingual language modeling.
Abstract (translated)
在本文中,我们通过经验研究了多任务学习(Multi-task Learning,MT)的优化动态,特别关注那些具有显著数据不平衡的一组任务的优化。我们提出了一个在高端任务上进行预训练,然后在小/高端任务上进行微调的有效方法。我们对这种方法的益处进行了详细的实证研究和分析,表明它相对于标准静态加权方案实现了稳健的改善。我们分析了这种方法适用于哪些数据模式,并用电文机器翻译(NMT)和多语言语言建模等实证研究证明了它的改善。
URL
https://arxiv.org/abs/2312.06134