Abstract
The performance of convolutional neural networks has continued to improve over the last decade. At the same time, as model complexity grows, it becomes increasingly more difficult to explain model decisions. Such explanations may be of critical importance for reliable operation of human-machine pairing setups, or for model selection when the "best" model among many equally-accurate models must be established. Saliency maps represent one popular way of explaining model decisions by highlighting image regions models deem important when making a prediction. However, examining salience maps at scale is not practical. In this paper, we propose five novel methods of leveraging model salience to explain a model behavior at scale. These methods ask: (a) what is the average entropy for a model's salience maps, (b) how does model salience change when fed out-of-set samples, (c) how closely does model salience follow geometrical transformations, (d) what is the stability of model salience across independent training runs, and (e) how does model salience react to salience-guided image degradations. To assess the proposed measures on a concrete and topical problem, we conducted a series of experiments for the task of synthetic face detection with two types of models: those trained traditionally with cross-entropy loss, and those guided by human salience when training to increase model generalizability. These two types of models are characterized by different, interpretable properties of their salience maps, which allows for the evaluation of the correctness of the proposed measures. We offer source codes for each measure along with this paper.
Abstract (translated)
卷积神经网络的性能在过去十年中继续 improve。同时,随着模型复杂性的增加,解释模型决策变得越来越困难。这些解释可能对可靠地人类机器配对 setup 的正常运行至关重要,或者当必须在许多同样准确的模型中选择一个“最好的”模型时,必须建立模型选择。注意力地图是一种 popular 的方法,通过突出在预测时认为重要的图像区域,解释模型决策。然而,在尺度上检查注意力地图并不实际。在本文中,我们提出了五个利用模型注意力来解释模型行为的新方法。这些方法询问:(a) 模型注意力地图的平均熵是多少,(b) 如何处理超出范围样本时模型注意力的变化,(c) 模型注意力如何与几何变换密切跟随,(d) 模型注意力在不同独立训练轮上的稳定程度是多少,(e) 模型注意力如何响应由注意力引导的图像恶化。为了评估提出的措施的正确性,我们进行了一项涉及合成面部检测任务的系列实验,使用两种模型类型:那些传统上使用交叉熵损失训练的模型,以及在训练时由人类注意力指导以提高模型泛化能力的模型。这两种模型的特点是它们的注意力地图具有不同可解释的特性,这允许评估提出的措施的正确性。我们与本文一起提供了每个措施的源代码。
URL
https://arxiv.org/abs/2303.11969