Paper Reading AI Learner

Don't Waste Your Time: Early Stopping Cross-Validation

2024-05-06 11:51:09
Edward Bergman, Lennart Purucker, Frank Hutter

Abstract

State-of-the-art automated machine learning systems for tabular data often employ cross-validation; ensuring that measured performances generalize to unseen data, or that subsequent ensembling does not overfit. However, using k-fold cross-validation instead of holdout validation drastically increases the computational cost of validating a single configuration. While ensuring better generalization and, by extension, better performance, the additional cost is often prohibitive for effective model selection within a time budget. We aim to make model selection with cross-validation more effective. Therefore, we study early stopping the process of cross-validation during model selection. We investigate the impact of early stopping on random search for two algorithms, MLP and random forest, across 36 classification datasets. We further analyze the impact of the number of folds by considering 3-, 5-, and 10-folds. In addition, we investigate the impact of early stopping with Bayesian optimization instead of random search and also repeated cross-validation. Our exploratory study shows that even a simple-to-understand and easy-to-implement method consistently allows model selection to converge faster; in ~94% of all datasets, on average by ~214%. Moreover, stopping cross-validation enables model selection to explore the search space more exhaustively by considering +167% configurations on average within one hour, while also obtaining better overall performance.

Abstract (translated)

先进的自动机器学习系统通常采用交叉验证;确保测量的性能推广到未见过的数据,或者后续集成不会过拟合。然而,使用k-fold交叉验证代替holdout验证会极大地增加验证单个配置的计算成本。虽然确保更好的泛化性能(进而更好的性能)是值得的,但为了在时间预算内有效选择模型,这种额外的成本往往过于高昂。我们旨在使使用交叉验证进行模型选择更加有效。因此,我们在模型选择过程中研究早期停止。我们研究了MLP和random forest两种算法在36个分类数据集上的随机搜索过程中早期停止的影响。我们进一步分析了3-、5-和10-fold对模型选择的影响。此外,我们还研究了使用贝叶斯优化 instead of random search 和重复交叉验证对模型选择的影响。我们的探索性研究显示,即使是简单易理解且易于实现的方法,也能使模型选择收敛更快;在~94%的数据集上,平均每次加速214%。此外,停止交叉验证使模型选择能够更充分地探索搜索空间,通过考虑+167%的配置,平均在一个小时内获得更好的整体性能。

URL

https://arxiv.org/abs/2405.03389

PDF

https://arxiv.org/pdf/2405.03389.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot