Paper Reading AI Learner

Compositional Feature Augmentation for Unbiased Scene Graph Generation

2023-08-13 08:02:14
Lin Li, Guikun Chen, Jun Xiao, Yi Yang, Chunping Wang, Long Chen

Abstract

Scene Graph Generation (SGG) aims to detect all the visual relation triplets <sub, pred, obj> in a given image. With the emergence of various advanced techniques for better utilizing both the intrinsic and extrinsic information in each relation triplet, SGG has achieved great progress over the recent years. However, due to the ubiquitous long-tailed predicate distributions, today's SGG models are still easily biased to the head predicates. Currently, the most prevalent debiasing solutions for SGG are re-balancing methods, e.g., changing the distributions of original training samples. In this paper, we argue that all existing re-balancing strategies fail to increase the diversity of the relation triplet features of each predicate, which is critical for robust SGG. To this end, we propose a novel Compositional Feature Augmentation (CFA) strategy, which is the first unbiased SGG work to mitigate the bias issue from the perspective of increasing the diversity of triplet features. Specifically, we first decompose each relation triplet feature into two components: intrinsic feature and extrinsic feature, which correspond to the intrinsic characteristics and extrinsic contexts of a relation triplet, respectively. Then, we design two different feature augmentation modules to enrich the feature diversity of original relation triplets by replacing or mixing up either their intrinsic or extrinsic features from other samples. Due to its model-agnostic nature, CFA can be seamlessly incorporated into various SGG frameworks. Extensive ablations have shown that CFA achieves a new state-of-the-art performance on the trade-off between different metrics.

Abstract (translated)

Scene Graph Generation (SGG) 旨在在给定图像中检测所有视觉关系三对数 <sub,pred,obj>。随着各种高级技术更好地利用每个关系三对数的内在和外部信息的出现,SGG在过去几年中取得了巨大的进展。然而,由于普遍存在长尾巴的谓词分布,今天的SGG模型仍然很容易受到头谓词的影响。目前,SGG最常见的抗偏解决方案是重新平衡方法,例如改变原始训练样本的分布。在本文中,我们主张,所有现有的重新平衡策略都没有增加每个谓词的关系三对数特征的多样性,这是SGG稳健的关键。为此,我们提出了一种全新的组合特征增强策略,它是SGG中第一个从增加三对数特征多样性的角度来看消除偏见的工作。具体来说,我们首先将每个关系三对数特征分解为两个组件:内在特征和外部特征,它们对应于一个关系三对数的内在特征和外部上下文。然后,我们设计两个不同的特征增强模块,以丰富原始关系三对数的特征多样性,通过从其他样本中替换或混合它们的内在或外部特征。由于其独特的模型无关性,CFA可以无缝融入各种SGG框架中。广泛的实验表明,CFA在不同度量之间的权衡中实现了新的最先进的性能。

URL

https://arxiv.org/abs/2308.06712

PDF

https://arxiv.org/pdf/2308.06712.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot