Paper Reading AI Learner

Using J-K fold Cross Validation to Reduce Variance When Tuning NLP Models

2018-06-19 10:12:25
Henry B.Moss, David S.Leslie, Paul Rayson

Abstract

K-fold cross validation (CV) is a popular method for estimating the true performance of machine learning models, allowing model selection and parameter tuning. However, the very process of CV requires random partitioning of the data and so our performance estimates are in fact stochastic, with variability that can be substantial for natural language processing tasks. We demonstrate that these unstable estimates cannot be relied upon for effective parameter tuning. The resulting tuned parameters are highly sensitive to how our data is partitioned, meaning that we often select sub-optimal parameter choices and have serious reproducibility issues. Instead, we propose to use the less variable J-K-fold CV, in which J independent K-fold cross validations are used to assess performance. Our main contributions are extending J-K-fold CV from performance estimation to parameter tuning and investigating how to choose J and K. We argue that variability is more important than bias for effective tuning and so advocate lower choices of K than are typically seen in the NLP literature, instead use the saved computation to increase J. To demonstrate the generality of our recommendations we investigate a wide range of case-studies: sentiment classification (both general and target-specific), part-of-speech tagging and document classification.

Abstract (translated)

K-fold交叉验证(CV)是评估机器学习模型真实性能的常用方法,可用于模型选择和参数调整。然而,CV的过程需要对数据进行随机分割,因此我们的性能估计实际上是随机的,对于自然语言处理任务而言,可变性可能很大。我们证明这些不稳定的估计值不能用于有效的参数调整。由此产生的调谐参数对我们的数据如何分区非常敏感,这意味着我们经常选择次优参数选择并且存在严重的重复性问题。相反,我们建议使用较少变量的J-K-fold CV,其中使用J个独立的K-fold交叉验证来评估性能。我们的主要贡献是将JK-fold CV从性能估计扩展到参数调整并研究如何选择J和K.我们认为变异性比有偏调的偏好更重要,因此主张K的选择比NLP中典型的选择要少文献,而是使用保存的计算来增加J.为了证明我们推荐的一般性,我们调查了大量的案例研究:情感分类(普通和特定目标),词性标注和文档分类。

URL

https://arxiv.org/abs/1806.07139

PDF

https://arxiv.org/pdf/1806.07139.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot