Abstract
In recent studies, line search methods have shown significant improvements in the performance of traditional stochastic gradient descent techniques, eliminating the need for a specific learning rate schedule. In this paper, we identify existing issues in state-of-the-art line search methods, propose enhancements, and rigorously evaluate their effectiveness. We test these methods on larger datasets and more complex data domains than before. Specifically, we improve the Armijo line search by integrating the momentum term from ADAM in its search direction, enabling efficient large-scale training, a task that was previously prone to failure using Armijo line search methods. Our optimization approach outperforms both the previous Armijo implementation and tuned learning rate schedules for Adam. Our evaluation focuses on Transformers and CNNs in the domains of NLP and image data. Our work is publicly available as a Python package, which provides a hyperparameter free Pytorch optimizer.
Abstract (translated)
在最近的研究中,线搜索方法已经显著提高了传统随机梯度下降技术的表现,无需特定的学习率计划。在本文中,我们指出了最先进的线搜索方法的现有问题,提出了改进,并对其有效性进行了严格的评估。我们使用更大的数据集和更复杂的数据领域测试这些方法。具体来说,我们通过将ADAM中的动量项集成到搜索方向中,改进了Armijo线搜索,使得大规模训练成为可能,而这一点在使用Armijo线搜索方法时曾经容易导致失败。我们的优化方法超越了前Armijo实现和自适应学习率计划。我们的评估重点在于自然语言处理和图像数据的领域。我们的工作已公开发布为Python软件包,该软件包提供了一个不需要超参数的免费Pytorch优化器。
URL
https://arxiv.org/abs/2403.18519