Abstract
Training deep networks and tuning hyperparameters on large datasets is computationally intensive. One of the primary research directions for efficient training is to reduce training costs by selecting well-generalizable subsets of training data. Compared to simple adaptive random subset selection baselines, existing intelligent subset selection approaches are not competitive due to the time-consuming subset selection step, which involves computing model-dependent gradients and feature embeddings and applies greedy maximization of submodular objectives. Our key insight is that removing the reliance on downstream model parameters enables subset selection as a pre-processing step and enables one to train multiple models at no additional cost. In this work, we propose MILO, a model-agnostic subset selection framework that decouples the subset selection from model training while enabling superior model convergence and performance by using an easy-to-hard curriculum. Our empirical results indicate that MILO can train models $3\times - 10 \times$ faster and tune hyperparameters $20\times - 75 \times$ faster than full-dataset training or tuning without compromising performance.
Abstract (translated)
训练深度神经网络并在大型数据集上调整超参数计算量巨大。高效的训练的主要研究方向之一是选择可泛化的训练数据的子集来减少训练成本。与简单的自适应随机子集选择基准相比,现有的智能子集选择方法由于子集选择步骤的费时性而不够竞争力。我们的深刻认识是,取消对后续模型参数的依赖可以使子集选择作为预处理步骤,并允许在不增加成本的情况下训练多个模型。在本文中,我们提出了米洛(MILO)框架,这是一个模型无关的子集选择框架,它将子集选择与模型训练分离,同时使用简单易懂的课程设置来提高模型收敛和性能。我们的实验结果表明,米洛可以更快地训练模型 $3 imes -10 imes$ 次,并调优超参数 $20 imes -75 imes$ 次,而不会牺牲性能。
URL
https://arxiv.org/abs/2301.13287