Abstract
CAM-based methods are widely-used post-hoc interpretability method that produce a saliency map to explain the decision of an image classification model. The saliency map highlights the important areas of the image relevant to the prediction. In this paper, we show that most of these methods can incorrectly attribute an important score to parts of the image that the model cannot see. We show that this phenomenon occurs both theoretically and experimentally. On the theory side, we analyze the behavior of GradCAM on a simple masked CNN model at initialization. Experimentally, we train a VGG-like model constrained to not use the lower part of the image and nevertheless observe positive scores in the unseen part of the image. This behavior is evaluated quantitatively on two new datasets. We believe that this is problematic, potentially leading to mis-interpretation of the model's behavior.
Abstract (translated)
基于CAM的方法是在图像分类模型的后置可解释方法中广泛使用的,该方法产生一个局部的重要性图,以解释图像分类模型的决策。局部重要性图突出了与预测相关的图像重要区域。在本文中,我们证明了大多数这些方法会将一个模型无法看到的图像部分的重要分数错误地分配给模型。我们还证明了这种现象在理论和实验中都存在。在理论方面,我们对GradCAM在初始化时对简单掩码CNN模型的行为进行了分析。实验方面,我们训练了一个类似于VGG的模型,该模型被约束不要使用图像的较低部分,然而在未见过的图像部分观察到了积极的分数。这种行为在两个新的数据集上进行了定量评估。我们认为这个问题很严重,可能导致模型行为的误解。
URL
https://arxiv.org/abs/2404.01964