Paper Reading AI Learner

Reliability in Semantic Segmentation: Are We on the Right Track?

2023-03-20 17:38:24
Pau de Jorge, Riccardo Volpi, Philip Torr, Gregory Rogez

Abstract

Motivated by the increasing popularity of transformers in computer vision, in recent times there has been a rapid development of novel architectures. While in-domain performance follows a constant, upward trend, properties like robustness or uncertainty estimation are less explored -leaving doubts about advances in model reliability. Studies along these axes exist, but they are mainly limited to classification models. In contrast, we carry out a study on semantic segmentation, a relevant task for many real-world applications where model reliability is paramount. We analyze a broad variety of models, spanning from older ResNet-based architectures to novel transformers and assess their reliability based on four metrics: robustness, calibration, misclassification detection and out-of-distribution (OOD) detection. We find that while recent models are significantly more robust, they are not overall more reliable in terms of uncertainty estimation. We further explore methods that can come to the rescue and show that improving calibration can also help with other uncertainty metrics such as misclassification or OOD detection. This is the first study on modern segmentation models focused on both robustness and uncertainty estimation and we hope it will help practitioners and researchers interested in this fundamental vision task. Code available at this https URL.

Abstract (translated)

受到计算机视觉Transformer越来越受欢迎的影响,近年来出现了迅速发展的新颖架构。虽然域内表现遵循一贯的、向上的趋势,但像鲁棒性或不确定性估计这样的属性较少被探索-这导致了模型可靠性进步的质疑。在这些轴上的研究存在,但主要局限于分类模型。相比之下,我们进行了一项语义分割的研究,这是许多现实世界应用中相关的任务之一,模型可靠性至关重要。我们分析了广泛的模型类型,涵盖了从较旧的ResNet架构到新颖的Transformer,并基于四个指标评估它们的可靠性:鲁棒性、校准、误分类检测和分布外检测。我们发现,虽然最近的模型表现出显著的鲁棒性,但它们在不确定性估计方面整体并不更可靠。我们进一步探索了能够提供帮助的方法,并表明改进校准也可以帮助其他不确定性指标,如误分类或分布外检测。这是第一个专注于鲁棒性和不确定性估计的现代分割模型研究,我们希望通过它帮助关注这个基本的视觉任务的实践者和研究人员。代码在此httpsURL可用。

URL

https://arxiv.org/abs/2303.11298

PDF

https://arxiv.org/pdf/2303.11298.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot