Paper Reading AI Learner

BAOD: Budget-Aware Object Detection

2019-04-10 21:13:08
Alejandro Pardo, Mengmeng Xu, Ali Thabet, Pablo Arbelaez, Bernard Ghanem

Abstract

We study the problem of object detection from a novel perspective in which annotation budget constraints are taken into consideration, appropriately coined Budget Aware Object Detection (BAOD). When provided with a fixed budget, we propose a strategy for building a diverse and informative dataset that can be used to optimally train a robust detector. We investigate both optimization and learning-based methods to sample which images to annotate and what type of annotation (strongly or weakly supervised) to annotate them with. We adopt a hybrid supervised learning framework to train the object detector from both these types of annotation. We conduct a comprehensive empirical study showing that a handcrafted optimization method outperforms other selection techniques including random sampling, uncertainty sampling and active learning. By combining an optimal image/annotation selection scheme with hybrid supervised learning to solve the BAOD problem, we show that one can achieve the performance of a strongly supervised detector on PASCAL-VOC 2007 while saving 12.8% of its original annotation budget. Furthermore, when $100\%$ of the budget is used, it surpasses this performance by 2.0 mAP percentage points.

Abstract (translated)

我们从一个新的角度来研究对象检测问题,其中考虑了注释预算约束,适当地创造了预算感知对象检测(BAOD)。当有固定的预算时,我们提出了一个构建多样化和信息丰富的数据集的策略,可以用来优化训练一个健壮的检测器。我们研究了优化和基于学习的方法来采样要注释的图像和要注释的注释类型(强或弱监督)。我们采用混合监督学习框架从这两种类型的注释中训练对象检测器。我们进行了一项全面的实证研究,表明手工优化方法优于其他选择技术,包括随机抽样、不确定性抽样和主动学习。通过将最优的图像/注释选择方案与混合监督学习相结合,解决了BAOD问题,表明在Pascal-VOC 2007上可以实现强监督检测器的性能,同时节省了12.8%的原始注释预算。此外,当预算的100%$被使用时,它比这个性能高出2.0个映射百分点。

URL

https://arxiv.org/abs/1904.05443

PDF

https://arxiv.org/pdf/1904.05443.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot