Abstract
Interpreting the decisions of Convolutional Neural Networks (CNNs) is essential for understanding their behavior, yet explainability remains a significant challenge, particularly for self-supervised models. Most existing methods for generating saliency maps rely on ground truth labels, restricting their use to supervised tasks. EigenCAM is the only notable label-independent alternative, leveraging Singular Value Decomposition to generate saliency maps applicable across CNN models, but it does not fully exploit the tensorial structure of feature maps. In this work, we introduce the Tucker Saliency Map (TSM) method, which applies Tucker tensor decomposition to better capture the inherent structure of feature maps, producing more accurate singular vectors and values. These are used to generate high-fidelity saliency maps, effectively highlighting objects of interest in the input. We further extend EigenCAM and TSM into multivector variants -Multivec-EigenCAM and Multivector Tucker Saliency Maps (MTSM)- which utilize all singular vectors and values, further improving saliency map quality. Quantitative evaluations on supervised classification models demonstrate that TSM, Multivec-EigenCAM, and MTSM achieve competitive performance with label-dependent methods. Moreover, TSM enhances explainability by approximately 50% over EigenCAM for both supervised and self-supervised models. Multivec-EigenCAM and MTSM further advance state-of-the-art explainability performance on self-supervised models, with MTSM achieving the best results.
Abstract (translated)
解读卷积神经网络(CNN)的决策对于理解其行为至关重要,然而可解释性仍然是一个重大挑战,尤其是对于自监督模型而言。现有的大多数生成显著图的方法依赖于地面真实标签,限制了它们仅适用于有监督任务。EigenCAM 是唯一值得注意的不依赖标签的替代方法,它利用奇异值分解来生成适用于各种 CNN 模型的显著图,但它没有充分利用特征图的张量结构。在这项工作中,我们引入了 Tucker 显著图(TSM)方法,该方法应用 Tucker 张量分解以更好地捕捉特征图的内在结构,产生更准确的奇异向量和值。这些用于生成高保真的显著图,有效突出输入中的感兴趣对象。我们进一步将 EigenCAM 和 TSM 扩展为多矢量变体 - Multivec-EigenCAM 和多矢量 Tucker 显著图(MTSM)- 这些方法利用所有的奇异向量和值,从而进一步提高显著图的质量。在有监督分类模型上的定量评估表明,TSM、Multivec-EigenCAM 和 MTSM 在与依赖标签的方法相比时实现了具有竞争力的表现。此外,对于有监督和自监督模型,TSM 相比 EigenCAM 大约提高了 50% 的可解释性。Multivec-EigenCAM 和 MTSM 更进一步地提升了自监督模型上的最新可解释性能表现,其中 MTSM 达到了最佳结果。
URL
https://arxiv.org/abs/2410.23072