Abstract
Transparency and explainability in image classification are essential for establishing trust in machine learning models and detecting biases and errors. State-of-the-art explainability methods generate saliency maps to show where a specific class is identified, without providing a detailed explanation of the model's decision process. Striving to address such a need, we introduce a post-hoc method that explains the entire feature extraction process of a Convolutional Neural Network. These explanations include a layer-wise representation of the features the model extracts from the input. Such features are represented as saliency maps generated by clustering and merging similar feature maps, to which we associate a weight derived by generalizing Grad-CAM for the proposed methodology. To further enhance these explanations, we include a set of textual labels collected through a gamified crowdsourcing activity and processed using NLP techniques and Sentence-BERT. Finally, we show an approach to generate global explanations by aggregating labels across multiple images.
Abstract (translated)
透明度和可解释性在图像分类中至关重要,用于建立对机器学习模型的信任并检测偏见和错误。最先进的可解释性方法生成确切显示特定类别的 saliency 地图,而不会提供模型决策过程的详细解释。为了解决这个问题,我们引入了一种后置方法,该方法解释了卷积神经网络(CNN)的完整特征提取过程。这些解释包括从输入中提取的每个层的特征的层级表示。这些特征以通过聚类和合并类似特征图生成的 saliency 地图的形式表示,并附有通过扩展 Grad-CAM 获得的权重。为了进一步增强这些解释,我们在活动中通过游戏化众包活动收集了一组文本标签,并使用 NLP 技术和 Sentence-BERT 对这些标签进行处理。最后,我们展示了通过聚合多个图像上的标签来生成全局解释的方法。
URL
https://arxiv.org/abs/2405.03301