Data Sketching for Faster Training of Machine Learning Models

Abstract
Abstract (translated)
URL
PDF

Abstract

Many machine learning problems reduce to the problem of minimizing an expected risk, defined as the sum of a large number of, often convex, component functions. Iterative gradient methods are popular techniques for the above problems. However, they are in general slow to converge, in particular for large data sets. In this work, we develop analysis for selecting a subset (or sketch) of training data points with their corresponding learning rates in order to provide faster convergence to a close neighbordhood of the optimal solution. We show that subsets that minimize the upper-bound on the estimation error of the full gradient, maximize a submodular facility location function. As a result, by greedily maximizing the facility location function we obtain subsets that yield faster convergence to a close neighborhood of the optimum solution. We demonstrate the real-world effectiveness of our algorithm, SIG, confirming our analysis, through an extensive set of experiments on several applications, including logistic regression and training neural networks. We also include a method that provides a deliberate deterministic ordering of the data subset that is quite effective in practice. We observe that our method, while achieving practically the same loss, speeds up gradient methods by up to 10x for convex and 3x for non-convex (deep) functions.

Abstract (translated)

许多机器学习问题归结为最小化预期风险的问题，定义为大量通常是凸的部件函数的总和。迭代梯度法是解决上述问题的常用方法。然而，它们通常收敛缓慢，特别是对于大型数据集。在这项工作中，我们开发了一个分析，以选择训练数据点的子集（或草图）及其相应的学习率，以提供更快的收敛到一个邻近的最优解决方案。我们证明了在全梯度估计误差上界最小的子集，最大化子模设备定位函数。因此，通过贪婪地最大化设施位置函数，我们得到了子集，可以更快地收敛到最优解的邻近区域。我们通过对逻辑回归和训练神经网络等多种应用的广泛实验，证明了我们的算法sig的实际有效性，证实了我们的分析。我们还包括一个方法，它为数据子集提供了一个经过深思熟虑的确定性排序，这在实践中非常有效。我们观察到，我们的方法在实现几乎相同的损失的同时，将凸函数的梯度方法提高了10倍，非凸（深）函数的梯度方法提高了3倍。

URL

https://arxiv.org/abs/1906.01827

PDF

https://arxiv.org/pdf/1906.01827.pdf