Paper Reading AI Learner

Uncertainty Quantification Metrics for Deep Regression

2024-05-07 12:46:45
Zilian Xiong, Simon Kristoffersson Lind, Per-Erik Forss\'en, Volker Kr\"uger

Abstract

When deploying deep neural networks on robots or other physical systems, the learned model should reliably quantify predictive uncertainty. A reliable uncertainty allows downstream modules to reason about the safety of its actions. In this work, we address metrics for evaluating such an uncertainty. Specifically, we focus on regression tasks, and investigate Area Under Sparsification Error (AUSE), Calibration Error, Spearman's Rank Correlation, and Negative Log-Likelihood (NLL). Using synthetic regression datasets, we look into how those metrics behave under four typical types of uncertainty, their stability regarding the size of the test set, and reveal their strengths and weaknesses. Our results indicate that Calibration Error is the most stable and interpretable metric, but AUSE and NLL also have their respective use cases. We discourage the usage of Spearman's Rank Correlation for evaluating uncertainties and recommend replacing it with AUSE.

Abstract (translated)

在将深度神经网络应用于机器人或其他物理系统时,学习到的模型应可靠地量化预测的不确定性。可靠的不确定性允许下游模块评估其行动的安全性。在这项工作中,我们关注评估这种不确定性的指标。具体来说,我们关注回归任务,并研究了稀疏化误差(AUSE)、标定误差、斯皮尔曼相关系数和负对数似然(NLL)。使用合成回归数据集,我们研究了这四个典型类型不确定性下,这些指标的行为,以及它们关于测试集大小的稳定性,并揭示了它们的优缺点。我们的结果表明,标定误差是最稳定且最易解释的指标,但AUSE和NLL也有各自的适用场景。我们劝诫使用斯皮尔曼等级相关系数来评估不确定性,并建议用AUSE来代替它。

URL

https://arxiv.org/abs/2405.04278

PDF

https://arxiv.org/pdf/2405.04278.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot