Paper Reading AI Learner

The Devil is in the Edges: Monocular Depth Estimation with Edge-aware Consistency Fusion

2024-03-30 13:58:19
Pengzhi Li, Yikang Ding, Haohan Wang, Chengshuai Tang, Zhiheng Li

Abstract

This paper presents a novel monocular depth estimation method, named ECFNet, for estimating high-quality monocular depth with clear edges and valid overall structure from a single RGB image. We make a thorough inquiry about the key factor that affects the edge depth estimation of the MDE networks, and come to a ratiocination that the edge information itself plays a critical role in predicting depth details. Driven by this analysis, we propose to explicitly employ the image edges as input for ECFNet and fuse the initial depths from different sources to produce the final depth. Specifically, ECFNet first uses a hybrid edge detection strategy to get the edge map and edge-highlighted image from the input image, and then leverages a pre-trained MDE network to infer the initial depths of the aforementioned three images. After that, ECFNet utilizes a layered fusion module (LFM) to fuse the initial depth, which will be further updated by a depth consistency module (DCM) to form the final estimation. Extensive experimental results on public datasets and ablation studies indicate that our method achieves state-of-the-art performance. Project page: this https URL.

Abstract (translated)

本文提出了一种名为ECFNet的新单目深度估计方法,用于从单个RGB图像中估计高质量单目深度,具有清晰的边缘和有效的整体结构。我们对MDE网络边缘深度估计的关键因素进行了深入调查,得出的结论是边缘信息本身在预测深度细节中扮演了关键角色。基于这一分析,我们提出将图像边缘作为输入,并融合不同来源的初始深度,以产生最终深度的方法。具体来说,ECFNet首先使用混合边缘检测策略从输入图像中获取边缘图和边缘突出图像,然后利用预训练的MDE网络推断上述三个图像的初始深度。此后,ECFNet采用层叠融合模块(LFM)将初始深度进行融合,该模块将根据深度一致性模块(DCM)进一步更新以形成最终估计。对公开数据集的广泛实验结果和消融研究结果表明,我们的方法达到了最先进的性能水平。项目页面:https:// this URL。

URL

https://arxiv.org/abs/2404.00373

PDF

https://arxiv.org/pdf/2404.00373.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot