Abstract
Loss function learning is a new meta-learning paradigm that aims to automate the essential task of designing a loss function for a machine learning model. Existing techniques for loss function learning have shown promising results, often improving a model's training dynamics and final inference performance. However, a significant limitation of these techniques is that the loss functions are meta-learned in an offline fashion, where the meta-objective only considers the very first few steps of training, which is a significantly shorter time horizon than the one typically used for training deep neural networks. This causes significant bias towards loss functions that perform well at the very start of training but perform poorly at the end of training. To address this issue we propose a new loss function learning technique for adaptively updating the loss function online after each update to the base model parameters. The experimental results show that our proposed method consistently outperforms the cross-entropy loss and offline loss function learning techniques on a diverse range of neural network architectures and datasets.
Abstract (translated)
损失函数学习是一种新的元学习范式,旨在自动化为机器学习模型设计损失函数所必须的任务。现有的损失函数学习技术已经取得了令人期望的结果,常常改进模型的训练动态和最终推理性能。然而,这些技术的一个显著限制是,损失函数是元学习方法,即只考虑训练的最初几步,这是比训练深度神经网络通常使用的更长的时间 horizon 更短的时间 horizon。这导致向损失函数在训练开始时表现良好但训练结束时表现较差的方向产生显著偏差。为了解决这一问题,我们提出了一种新的损失函数学习技术,用于在线Adaptively更新损失函数,在每个更新基模型参数后。实验结果显示,我们提出的方法在多种神经网络架构和数据集上 consistently outperforms 交叉熵损失和 offline 损失函数学习技术。
URL
https://arxiv.org/abs/2301.13247