Abstract
Motivated by the increasing popularity of transformers in computer vision, in recent times there has been a rapid development of novel architectures. While in-domain performance follows a constant, upward trend, properties like robustness or uncertainty estimation are less explored -leaving doubts about advances in model reliability. Studies along these axes exist, but they are mainly limited to classification models. In contrast, we carry out a study on semantic segmentation, a relevant task for many real-world applications where model reliability is paramount. We analyze a broad variety of models, spanning from older ResNet-based architectures to novel transformers and assess their reliability based on four metrics: robustness, calibration, misclassification detection and out-of-distribution (OOD) detection. We find that while recent models are significantly more robust, they are not overall more reliable in terms of uncertainty estimation. We further explore methods that can come to the rescue and show that improving calibration can also help with other uncertainty metrics such as misclassification or OOD detection. This is the first study on modern segmentation models focused on both robustness and uncertainty estimation and we hope it will help practitioners and researchers interested in this fundamental vision task. Code available at this https URL.
Abstract (translated)
受到计算机视觉Transformer越来越受欢迎的影响,近年来出现了迅速发展的新颖架构。虽然域内表现遵循一贯的、向上的趋势,但像鲁棒性或不确定性估计这样的属性较少被探索-这导致了模型可靠性进步的质疑。在这些轴上的研究存在,但主要局限于分类模型。相比之下,我们进行了一项语义分割的研究,这是许多现实世界应用中相关的任务之一,模型可靠性至关重要。我们分析了广泛的模型类型,涵盖了从较旧的ResNet架构到新颖的Transformer,并基于四个指标评估它们的可靠性:鲁棒性、校准、误分类检测和分布外检测。我们发现,虽然最近的模型表现出显著的鲁棒性,但它们在不确定性估计方面整体并不更可靠。我们进一步探索了能够提供帮助的方法,并表明改进校准也可以帮助其他不确定性指标,如误分类或分布外检测。这是第一个专注于鲁棒性和不确定性估计的现代分割模型研究,我们希望通过它帮助关注这个基本的视觉任务的实践者和研究人员。代码在此httpsURL可用。
URL
https://arxiv.org/abs/2303.11298