Paper Reading AI Learner

Decomposed Prototype Learning for Few-Shot Scene Graph Generation

2023-03-20 04:54:26
Xingchen Li, Long Chen, Guikun Chen, Yinfu Feng, Yi Yang, Jun Xiao

Abstract

Today's scene graph generation (SGG) models typically require abundant manual annotations to learn new predicate types. Thus, it is difficult to apply them to real-world applications with a long-tailed distribution of predicates. In this paper, we focus on a new promising task of SGG: few-shot SGG (FSSGG). FSSGG encourages models to be able to quickly transfer previous knowledge and recognize novel predicates well with only a few examples. Although many advanced approaches have achieved great success on few-shot learning (FSL) tasks, straightforwardly extending them into FSSGG is not applicable due to two intrinsic characteristics of predicate concepts: 1) Each predicate category commonly has multiple semantic meanings under different contexts. 2) The visual appearance of relation triplets with the same predicate differs greatly under different subject-object pairs. Both issues make it hard to model conventional latent representations for predicate categories with state-of-the-art FSL methods. To this end, we propose a novel Decomposed Prototype Learning (DPL). Specifically, we first construct a decomposable prototype space to capture intrinsic visual patterns of subjects and objects for predicates, and enhance their feature representations with these decomposed prototypes. Then, we devise an intelligent metric learner to assign adaptive weights to each support sample by considering the relevance of their subject-object pairs. We further re-split the VG dataset and compare DPL with various FSL methods to benchmark this task. Extensive results show that DPL achieves excellent performance in both base and novel categories.

Abstract (translated)

当今的场景图生成(SGG)模型通常需要大量的手动标注来学习新的条件类型。因此,很难将其应用于具有长尾巴分布的条件类型的现实世界应用中。在本文中,我们关注SGG模型的一个有前途的任务:少量SGG(FSSGG)。FSSGG鼓励模型能够快速转移先前知识并准确地识别新的条件类型,只需要几个例子。虽然许多先进的方法在少量学习(FSL)任务中取得了成功,但直接将其扩展到FSSGG并不适用,因为条件概念的两个内在特性:1)每个条件类别通常在不同上下文中具有多个语义含义。2)在不同主题对象对中,相同条件类型的关联三体的视觉外观差异很大。这些问题使得使用最先进的FSL方法来建模传统条件表示变得困难。为此,我们提出了一种新的分解原型学习(DPL)方法。具体来说,我们首先构建一个可分解的原型空间,以捕捉主题和对象对的条件视觉模式的固有视觉特征,并利用这些分解原型增强它们的特征表示。然后,我们设计了一种智能度量学习器,通过考虑它们的主题对象对之间的关系,为每个支持样本分配自适应权重。我们进一步重新分割VG数据集,并比较DPL与各种FSL方法以基准 this 任务。广泛的结果表明,DPL在基类和新类中都取得了出色的表现。

URL

https://arxiv.org/abs/2303.10863

PDF

https://arxiv.org/pdf/2303.10863.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot