Abstract
Scene Graph Generation (SGG) aims to detect all the visual relation triplets <sub, pred, obj> in a given image. With the emergence of various advanced techniques for better utilizing both the intrinsic and extrinsic information in each relation triplet, SGG has achieved great progress over the recent years. However, due to the ubiquitous long-tailed predicate distributions, today's SGG models are still easily biased to the head predicates. Currently, the most prevalent debiasing solutions for SGG are re-balancing methods, e.g., changing the distributions of original training samples. In this paper, we argue that all existing re-balancing strategies fail to increase the diversity of the relation triplet features of each predicate, which is critical for robust SGG. To this end, we propose a novel Compositional Feature Augmentation (CFA) strategy, which is the first unbiased SGG work to mitigate the bias issue from the perspective of increasing the diversity of triplet features. Specifically, we first decompose each relation triplet feature into two components: intrinsic feature and extrinsic feature, which correspond to the intrinsic characteristics and extrinsic contexts of a relation triplet, respectively. Then, we design two different feature augmentation modules to enrich the feature diversity of original relation triplets by replacing or mixing up either their intrinsic or extrinsic features from other samples. Due to its model-agnostic nature, CFA can be seamlessly incorporated into various SGG frameworks. Extensive ablations have shown that CFA achieves a new state-of-the-art performance on the trade-off between different metrics.
Abstract (translated)
Scene Graph Generation (SGG) 旨在在给定图像中检测所有视觉关系三对数 <sub,pred,obj>。随着各种高级技术更好地利用每个关系三对数的内在和外部信息的出现,SGG在过去几年中取得了巨大的进展。然而,由于普遍存在长尾巴的谓词分布,今天的SGG模型仍然很容易受到头谓词的影响。目前,SGG最常见的抗偏解决方案是重新平衡方法,例如改变原始训练样本的分布。在本文中,我们主张,所有现有的重新平衡策略都没有增加每个谓词的关系三对数特征的多样性,这是SGG稳健的关键。为此,我们提出了一种全新的组合特征增强策略,它是SGG中第一个从增加三对数特征多样性的角度来看消除偏见的工作。具体来说,我们首先将每个关系三对数特征分解为两个组件:内在特征和外部特征,它们对应于一个关系三对数的内在特征和外部上下文。然后,我们设计两个不同的特征增强模块,以丰富原始关系三对数的特征多样性,通过从其他样本中替换或混合它们的内在或外部特征。由于其独特的模型无关性,CFA可以无缝融入各种SGG框架中。广泛的实验表明,CFA在不同度量之间的权衡中实现了新的最先进的性能。
URL
https://arxiv.org/abs/2308.06712