Abstract
Modern machine learning applications have seen a remarkable success of optimization algorithms that are designed to find flat minima. Motivated by this paradigm, this work formulates and studies the algorithmic question of how to find flat minima. As an initial effort, this work adopts the trace of hessian of the cost function as the measure of flatness, and formally defines the notion of approximate flat minima. Under this notion, we then design algorithms that find approximate flat minima efficiently. For general cost functions, we present a gradient-based algorithm that finds an approximate flat local minimum efficiently. The main component of the algorithm is to use gradients computed from randomly perturbed iterates to estimate a direction that leads to flatter minima. For the setting where the cost function is an empirical risk over training data, we present a faster algorithm that is inspired by a recently proposed practical algorithm called sharpness-aware minimization, supporting its success in practice.
Abstract (translated)
现代机器学习应用已经见证了设计用于找到平面最小值的优化算法的惊人成功。受到这个范式的启发,这项工作提出了并研究如何找到平面最小值的算法问题。作为最初的努力,这项工作采用了成本函数的哈希路径作为平面性的度量,并正式定义了近似平面最小值的概念。在这个概念下,我们然后设计了一种基于梯度的算法,可以快速找到近似平面 local 最小值。对于一般成本函数,我们介绍了一种基于梯度的算法,可以快速找到近似平面的局部最小值。算法的主要组成部分是从随机扰动迭代中计算的梯度来估计一个方向,以找到更平的最小值。对于成本函数是训练数据中的经验风险的场景,我们介绍了一种更快的算法,它是最近提出的实用算法Sharpness-aware minimize的启发式,支持它在实际应用中的成功。
URL
https://arxiv.org/abs/2305.15659