Abstract
One of the challenges for neural networks in real-life applications is the overconfident errors these models make when the data is not from the original training distribution. Addressing this issue is known as Out-of-Distribution (OOD) detection. Many state-of-the-art OOD methods employ an auxiliary dataset as a surrogate for OOD data during training to achieve improved performance. However, these methods fail to fully exploit the local information embedded in the auxiliary dataset. In this work, we propose the idea of leveraging the information embedded in the gradient of the loss function during training to enable the network to not only learn a desired OOD score for each sample but also to exhibit similar behavior in a local neighborhood around each sample. We also develop a novel energy-based sampling method to allow the network to be exposed to more informative OOD samples during the training phase. This is especially important when the auxiliary dataset is large. We demonstrate the effectiveness of our method through extensive experiments on several OOD benchmarks, improving the existing state-of-the-art FPR95 by 4% on our ImageNet experiment. We further provide a theoretical analysis through the lens of certified robustness and Lipschitz analysis to showcase the theoretical foundation of our work. We will publicly release our code after the review process.
Abstract (translated)
神经网络在实际应用中面临的一个挑战是,当数据不是原始训练分布时,这些模型会犯过度自信的错误。解决这个问题称为离散化(DIS)检测。许多最先进的离散化方法在训练期间使用辅助数据作为离散化数据的代理以提高性能。然而,这些方法无法充分利用辅助数据中固有的局部信息。在本文中,我们提出了利用损失函数梯度中嵌入的信息在训练过程中引导网络不仅学习每个样本的所需离散化分数,而且还要表现出类似的行为在每个样本的局部邻域内。我们还开发了一种基于能量的采样方法,以便在训练阶段让网络接触到更多的信息丰富的离散化样本。当辅助数据很大时,这尤其重要。通过在多个离散化基准上进行广泛的实验,我们提高了现有 state-of-the-art FPR95 4%。我们进一步通过认证鲁棒性和 Lipschitz 分析的透镜展示了我们工作的理论基础。在审查过程中之后,我们会公开发布我们的代码。
URL
https://arxiv.org/abs/2404.12368