Paper Reading AI Learner

SharpNet: Fast and Accurate Recovery of Occluding Contours in Monocular Depth Estimation

2019-05-21 13:08:52
Michaël Ramamonjisoa, Vincent Lepetit

Abstract

We introduce SharpNet, a method that predicts an accurate depth map for an input color image, with a particular attention to the reconstruction of occluding contours: Occluding contours are an important cue for object recognition, and for realistic integration of virtual objects in Augmented Reality, but they are also notoriously difficult to reconstruct accurately. For example, they are a challenge for stereo-based reconstruction methods, as points around an occluding contour are visible in only one image. Inspired by recent methods that introduce normal estimation to improve depth prediction, we introduce a novel term that constrains depth and occluding contours predictions. Since ground truth depth is difficult to obtain with pixel-perfect accuracy along occluding contours, we use synthetic images for training, followed by fine-tuning on real data. We demonstrate our approach on the challenging NYUv2-Depth dataset, and show that our method outperforms the state-of-the-art along occluding contours, while performing on par with the best recent methods for the rest of the images. Its accuracy along the occluding contours is actually better than the `ground truth' acquired by a depth camera based on structured light. We show this by introducing a new benchmark based on NYUv2-Depth for evaluating occluding contours in monocular reconstruction, which is our second contribution.

Abstract (translated)

我们介绍了夏普网(sharpnet),一种预测输入彩色图像精确深度图的方法,特别关注阻塞轮廓的重建:阻塞轮廓是对象识别和增强现实中虚拟对象的真实整合的重要提示,但它们也很难识别。结构准确。例如,对于基于立体的重建方法来说,它们是一个挑战,因为一个封闭轮廓周围的点只在一个图像中可见。受引入正态估计改进深度预测方法的启发,我们引入了一个新的术语来约束深度和阻塞轮廓预测。由于沿遮挡轮廓线很难获得像素完全准确的地面真实深度,因此我们使用合成图像进行训练,然后对真实数据进行微调。我们在具有挑战性的nyuv2深度数据集上演示了我们的方法,并表明我们的方法在封闭轮廓方面优于最先进的方法,同时与其他图像的最新方法一样执行。它在遮挡轮廓上的精度实际上比基于结构化光的深度照相机所获得的“地面真实性”要好。我们通过引入一个新的基于nyuv2深度的基准来评估单眼重建中的阻塞轮廓,这是我们的第二个贡献。

URL

https://arxiv.org/abs/1905.08598

PDF

https://arxiv.org/pdf/1905.08598.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot