Abstract
We consider stochastic unconstrained bilevel optimization problems when only the first-order gradient oracles are available. While numerous optimization methods have been proposed for tackling bilevel problems, existing methods either tend to require possibly expensive calculations regarding Hessians of lower-level objectives, or lack rigorous finite-time performance guarantees. In this work, we propose a Fully First-order Stochastic Approximation (F2SA) method, and study its non-asymptotic convergence properties. Specifically, we show that F2SA converges to an $\epsilon$-stationary solution of the bilevel problem after $\epsilon^{-7/2}, \epsilon^{-5/2}$, and $\epsilon^{-3/2}$ iterations (each iteration using $O(1)$ samples) when stochastic noises are in both level objectives, only in the upper-level objective, and not present (deterministic settings), respectively. We further show that if we employ momentum-assisted gradient estimators, the iteration complexities can be improved to $\epsilon^{-5/2}, \epsilon^{-4/2}$, and $\epsilon^{-3/2}$, respectively. We demonstrate even superior practical performance of the proposed method over existing second-order based approaches on MNIST data-hypercleaning experiments.
Abstract (translated)
当只有第一阶导数指示器可用时,我们考虑了无约束的二阶优化问题,也就是当随机噪声出现在两个级别目标上时。虽然已经提出了许多方法来处理二阶问题,但现有的方法往往需要可能昂贵的关于低级别目标的梯度计算,或者缺乏严格的有限时间性能保证。在这项工作中,我们提出了 fully first-order stochasticapproximation (F2SA)方法,并研究了其非指数收敛性质。具体来说,我们证明了 F2SA 在 $epsilon^{-7/2}, epsilon^{-5/2}$,和 $epsilon^{-3/2}$ 迭代后收敛到二阶问题的 $epsilon$ 静态解决方案(每个迭代使用 $O(1)$ 样本)时,当随机噪声出现在两个级别目标上,仅出现在高层次目标上,并且不是存在的(确定性设置)时分别呈现。我们还证明了,如果我们使用动量协助梯度估计器,迭代复杂性可以改进到 $epsilon^{-5/2}, epsilon^{-4/2}$,和 $epsilon^{-3/2}$。我们在 MNIST 数据超清理实验中证明了所提出方法比现有的第二级基于方法的实践中的实际表现更为优越。
URL
https://arxiv.org/abs/2301.10945