Abstract
Simplicity bias is the concerning tendency of deep networks to over-depend on simple, weakly predictive features, to the exclusion of stronger, more complex features. This causes biased, incorrect model predictions in many real-world applications, exacerbated by incomplete training data containing spurious feature-label correlations. We propose a direct, interventional method for addressing simplicity bias in DNNs, which we call the feature sieve. We aim to automatically identify and suppress easily-computable spurious features in lower layers of the network, thereby allowing the higher network levels to extract and utilize richer, more meaningful representations. We provide concrete evidence of this differential suppression & enhancement of relevant features on both controlled datasets and real-world images, and report substantial gains on many real-world debiasing benchmarks (11.4% relative gain on Imagenet-A; 3.2% on BAR, etc). Crucially, we outperform many baselines that incorporate knowledge about known spurious or biased attributes, despite our method not using any such information. We believe that our feature sieve work opens up exciting new research directions in automated adversarial feature extraction & representation learning for deep networks.
Abstract (translated)
简单性偏见是深度学习网络过度依赖简单、弱预测的特征,而忽略更强大、更复杂的特征的趋向,这在许多实际应用中导致了偏差和错误的模型预测,尤其是训练数据不完整、包含伪特征标签相关系数的情况下更加显著。我们提出了一种直接干预的方法来解决深度学习网络中的简单性偏见,我们称之为特征筛除。我们的目标是自动识别并抑制网络中低层容易计算的伪特征,从而使更高层次的网络能够提取和使用更富集、更有意义的特征表示。我们在控制数据集和真实图像上提供了这种差异的抑制和增强方面的 concrete evidence,并在许多真实的反偏见基准数据上报告了显著的增益(例如Imagenet-A的11.4%相对增益,Bar的3.2%)。重要的是,我们超越了许多包含已知伪或偏见属性知识的基准模型,尽管我们的方法和这些方法并未使用任何相关信息。我们相信,我们的特征筛除工作为深度学习网络中的自动对抗特征提取和特征表示学习打开了令人兴奋的新研究方向。
URL
https://arxiv.org/abs/2301.13293