Abstract
Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions. Crucially, the CBM design inherently allows for human interventions, in which expert users are given the ability to modify potentially misaligned concept choices to influence the decision behavior of the model in an interpretable fashion. However, existing approaches often require numerous human interventions per image to achieve strong performances, posing practical challenges in scenarios where obtaining human feedback is expensive. In this paper, we find that this is noticeably driven by an independent treatment of concepts during intervention, wherein a change of one concept does not influence the use of other ones in the model's final decision. To address this issue, we introduce a trainable concept intervention realignment module, which leverages concept relations to realign concept assignments post-intervention. Across standard, real-world benchmarks, we find that concept realignment can significantly improve intervention efficacy; significantly reducing the number of interventions needed to reach a target classification performance or concept prediction accuracy. In addition, it easily integrates into existing concept-based architectures without requiring changes to the models themselves. This reduced cost of human-model collaboration is crucial to enhancing the feasibility of CBMs in resource-constrained environments.
Abstract (translated)
概念瓶颈模型(CBMs)将人类可理解的观念作为 ground image 分类,以实现模型决策的可解释性。关键是,CBM 设计本身允许专家用户进行干预,其中专家用户被赋予了修改可能存在错配的概念选择以影响模型决策行为的解释性方式。然而,现有的方法通常需要每个图像进行大量的人工干预才能实现强大的性能,这在一个获得人类反馈代价高昂的场景中提出了实际挑战。在本文中,我们发现这明显是由干预过程中的独立处理概念导致的,其中改变一个概念不会影响模型最终决策的使用其他概念。为了应对这个问题,我们引入了一个可训练的概念干预对齐模块,它利用概念关系在干预后对概念分配进行对齐。在标准、真实世界的基准测试中,我们发现概念对齐可以显著提高干预效果;大大减少了达到目标分类性能或概念预测精度所需的干预数量。此外,它很容易与现有的基于概念的架构集成,而无需对模型本身进行更改。这种降低人机协同成本对增强在资源受限环境中的 CBM 的可行性至关重要。
URL
https://arxiv.org/abs/2405.01531