EGTR: Extracting Graph from Transformer for Scene Graph Generation

Abstract
Abstract (translated)
URL
PDF

Abstract

Scene Graph Generation (SGG) is a challenging task of detecting objects and predicting relationships between objects. After DETR was developed, one-stage SGG models based on a one-stage object detector have been actively studied. However, complex modeling is used to predict the relationship between objects, and the inherent relationship between object queries learned in the multi-head self-attention of the object detector has been neglected. We propose a lightweight one-stage SGG model that extracts the relation graph from the various relationships learned in the multi-head self-attention layers of the DETR decoder. By fully utilizing the self-attention by-products, the relation graph can be extracted effectively with a shallow relation extraction head. Considering the dependency of the relation extraction task on the object detection task, we propose a novel relation smoothing technique that adjusts the relation label adaptively according to the quality of the detected objects. By the relation smoothing, the model is trained according to the continuous curriculum that focuses on object detection task at the beginning of training and performs multi-task learning as the object detection performance gradually improves. Furthermore, we propose a connectivity prediction task that predicts whether a relation exists between object pairs as an auxiliary task of the relation extraction. We demonstrate the effectiveness and efficiency of our method for the Visual Genome and Open Image V6 datasets. Our code is publicly available at this https URL .

Abstract (translated)

场景图生成（SGG）是一项具有挑战性的任务，旨在检测物体并预测物体之间的关系。在DETR开发之后，基于一阶段对象的检测器的一阶SGG模型得到了广泛研究。然而，为了预测物体之间的关系，使用了复杂的建模。在多头自注意的对象检测器中学习的物体查询固有关系已被忽视。我们提出了一种轻量级的一阶SGG模型，它从DETR decoder的各个关系层中提取关系图。通过充分利用自注意的副产品，浅层关系提取头可以有效地提取关系图。考虑到关系提取任务与物体检测任务之间的依赖关系，我们提出了一种新颖的关系平滑技术，根据检测到的物体的质量调整关系标签。通过关系平滑，根据训练开始的物体检测任务，对模型进行训练，并在物体检测性能逐渐提高时进行多任务学习。此外，我们还提出了一种关系预测任务，作为关系提取的辅助任务，预测物体对之间的关系是否存在。我们在Visual Genome和Open Image V6数据集上证明了我们方法的有效性和高效性。我们的代码可在此处公开访问：https:// this URL.

URL

https://arxiv.org/abs/2404.02072

PDF

https://arxiv.org/pdf/2404.02072.pdf

EGTR: Extracting Graph from Transformer for Scene Graph Generation

Abstract

Abstract (translated)

URL

PDF Copy

PDF