Paper Reading AI Learner

On Class Imbalance and Background Filtering in Visual Relationship Detection

2019-03-20 11:46:24
Alessio Sarullo, Tingting Mu

Abstract

In this paper we investigate the problems of class imbalance and irrelevant relationships in Visual Relationship Detection (VRD). State-of-the-art deep VRD models still struggle to predict uncommon classes, limiting their applicability. Moreover, many methods are incapable of properly filtering out background relationships while predicting relevant ones. Although these problems are very apparent, they have both been overlooked so far. We analyse why this is the case and propose modifications to both model and training to alleviate the aforementioned issues, as well as suggesting new measures to complement existing ones and give a more holistic picture of the efficacy of a model.

Abstract (translated)

本文研究了视觉关系检测(VRD)中的类不平衡和无关关系问题。先进的深VRD模型仍然难以预测不常见的类,限制了它们的适用性。此外,许多方法在预测相关的背景关系时都不能正确地过滤掉背景关系。尽管这些问题很明显,但迄今为止都被忽视了。我们分析了这种情况的原因,并提出了对模型和培训的修改,以缓解上述问题,同时提出了补充现有问题的新措施,并对模型的有效性进行了更全面的描述。

URL

https://arxiv.org/abs/1903.08456

PDF

https://arxiv.org/pdf/1903.08456.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot