Paper Reading AI Learner

Improving Visual Relation Detection using Depth Maps

2019-05-02 21:14:35
Sahand Sharifzadeh, Max Berrendorf, Volker Tresp

Abstract

State of the art visual relation detection methods have been relying on features extracted from RGB images including objects' 2D positions. In this paper, we argue that the 3D positions of objects in space can provide additional valuable information about object relations. This information helps not only to detect spatial relations, such as "standing behind", but also non-spatial relations, such as "holding". Since 3D information of a scene is not easily accessible, we propose incorporating a pre-trained RGB-to-Depth model within visual relation detection frameworks. We discuss different feature extraction strategies from depth maps and show their critical role in relation detection. Our experiments confirm that the performance of state-of-the-art visual relation detection approaches can significantly be improved by utilizing depth map information.

Abstract (translated)

最先进的视觉关系检测方法依赖于从RGB图像中提取的特征,包括物体的二维位置。本文认为,物体在空间中的三维位置可以提供有关物体关系的其他有价值的信息。这些信息不仅有助于检测空间关系,如“站在后面”,也有助于检测非空间关系,如“保持”。由于场景的三维信息不易获取,我们建议在视觉关系检测框架中加入一个预先训练的RGB到深度模型。我们讨论了深度图的不同特征提取策略,并说明了它们在关系检测中的关键作用。我们的实验证实,利用深度图信息可以显著提高最先进的视觉关系检测方法的性能。

URL

https://arxiv.org/abs/1905.00966

PDF

https://arxiv.org/pdf/1905.00966.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot