Paper Reading AI Learner

DOOBNet: Deep Object Occlusion Boundary Detection from an Image

2018-06-11 02:24:31
Guoxia Wang, Xiaohui Liang, Frederick W. B. Li

Abstract

Object occlusion boundary detection is a fundamental and crucial research problem in computer vision. This is challenging to solve as encountering the extreme boundary/non-boundary class imbalance during training an object occlusion boundary detector. In this paper, we propose to address this class imbalance by up-weighting the loss contribution of false negative and false positive examples with our novel Attention Loss function. We also propose a unified end-to-end multi-task deep object occlusion boundary detection network (DOOBNet) by sharing convolutional features to simultaneously predict object boundary and occlusion orientation. DOOBNet adopts an encoder-decoder structure with skip connection in order to automatically learn multi-scale and multi-level features. We significantly surpass the state-of-the-art on the PIOD dataset (ODS F-score of .668) and the BSDS ownership dataset (ODS F-score of .555), as well as improving the detecting speed to as 0.037s per image.

Abstract (translated)

对象遮挡边界检测是计算机视觉中的一个基础性和关键性研究问题。在训练物体遮挡边界检测器期间遇到极端边界/非边界类不平衡是解决这个问题的难题。在本文中,我们提出通过利用我们的新型注意力丢失函数来增加假阴性和假阳性例子的损失贡献来解决这类不平衡问题。我们还提出了一个统一的端到端多任务深度物体遮挡边界检测网络(DOOBNet),通过共享卷积特征来同时预测物体的边界和遮挡方向。 DOOBNet采用跳码连接的编码器 - 解码器结构,以自动学习多尺度和多层次的特性。我们在PIOD数据集(ODS F-score为0.668)和BSDS所有权数据集(ODS F-score为.555)方面显着超越了最新水平,并将检测速度提高到0.037s每张图片。

URL

https://arxiv.org/abs/1806.03772

PDF

https://arxiv.org/pdf/1806.03772.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot