Paper Reading AI Learner

How to escape sharp minima

2023-05-25 02:12:33
Kwangjun Ahn, Ali Jadbabaie, Suvrit Sra

Abstract

Modern machine learning applications have seen a remarkable success of optimization algorithms that are designed to find flat minima. Motivated by this paradigm, this work formulates and studies the algorithmic question of how to find flat minima. As an initial effort, this work adopts the trace of hessian of the cost function as the measure of flatness, and formally defines the notion of approximate flat minima. Under this notion, we then design algorithms that find approximate flat minima efficiently. For general cost functions, we present a gradient-based algorithm that finds an approximate flat local minimum efficiently. The main component of the algorithm is to use gradients computed from randomly perturbed iterates to estimate a direction that leads to flatter minima. For the setting where the cost function is an empirical risk over training data, we present a faster algorithm that is inspired by a recently proposed practical algorithm called sharpness-aware minimization, supporting its success in practice.

Abstract (translated)

现代机器学习应用已经见证了设计用于找到平面最小值的优化算法的惊人成功。受到这个范式的启发,这项工作提出了并研究如何找到平面最小值的算法问题。作为最初的努力,这项工作采用了成本函数的哈希路径作为平面性的度量,并正式定义了近似平面最小值的概念。在这个概念下,我们然后设计了一种基于梯度的算法,可以快速找到近似平面 local 最小值。对于一般成本函数,我们介绍了一种基于梯度的算法,可以快速找到近似平面的局部最小值。算法的主要组成部分是从随机扰动迭代中计算的梯度来估计一个方向,以找到更平的最小值。对于成本函数是训练数据中的经验风险的场景,我们介绍了一种更快的算法,它是最近提出的实用算法Sharpness-aware minimize的启发式,支持它在实际应用中的成功。

URL

https://arxiv.org/abs/2305.15659

PDF

https://arxiv.org/pdf/2305.15659.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot