Paper Reading AI Learner

Data Sketching for Faster Training of Machine Learning Models

2019-06-05 05:10:37
Baharan Mirzasoleiman, Jeff Bilmes, Jure Leskovec

Abstract

Many machine learning problems reduce to the problem of minimizing an expected risk, defined as the sum of a large number of, often convex, component functions. Iterative gradient methods are popular techniques for the above problems. However, they are in general slow to converge, in particular for large data sets. In this work, we develop analysis for selecting a subset (or sketch) of training data points with their corresponding learning rates in order to provide faster convergence to a close neighbordhood of the optimal solution. We show that subsets that minimize the upper-bound on the estimation error of the full gradient, maximize a submodular facility location function. As a result, by greedily maximizing the facility location function we obtain subsets that yield faster convergence to a close neighborhood of the optimum solution. We demonstrate the real-world effectiveness of our algorithm, SIG, confirming our analysis, through an extensive set of experiments on several applications, including logistic regression and training neural networks. We also include a method that provides a deliberate deterministic ordering of the data subset that is quite effective in practice. We observe that our method, while achieving practically the same loss, speeds up gradient methods by up to 10x for convex and 3x for non-convex (deep) functions.

Abstract (translated)

许多机器学习问题归结为最小化预期风险的问题,定义为大量通常是凸的部件函数的总和。迭代梯度法是解决上述问题的常用方法。然而,它们通常收敛缓慢,特别是对于大型数据集。在这项工作中,我们开发了一个分析,以选择训练数据点的子集(或草图)及其相应的学习率,以提供更快的收敛到一个邻近的最优解决方案。我们证明了在全梯度估计误差上界最小的子集,最大化子模设备定位函数。因此,通过贪婪地最大化设施位置函数,我们得到了子集,可以更快地收敛到最优解的邻近区域。我们通过对逻辑回归和训练神经网络等多种应用的广泛实验,证明了我们的算法sig的实际有效性,证实了我们的分析。我们还包括一个方法,它为数据子集提供了一个经过深思熟虑的确定性排序,这在实践中非常有效。我们观察到,我们的方法在实现几乎相同的损失的同时,将凸函数的梯度方法提高了10倍,非凸(深)函数的梯度方法提高了3倍。

URL

https://arxiv.org/abs/1906.01827

PDF

https://arxiv.org/pdf/1906.01827.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot