Abstract
This paper presents a novel monocular depth estimation method, named ECFNet, for estimating high-quality monocular depth with clear edges and valid overall structure from a single RGB image. We make a thorough inquiry about the key factor that affects the edge depth estimation of the MDE networks, and come to a ratiocination that the edge information itself plays a critical role in predicting depth details. Driven by this analysis, we propose to explicitly employ the image edges as input for ECFNet and fuse the initial depths from different sources to produce the final depth. Specifically, ECFNet first uses a hybrid edge detection strategy to get the edge map and edge-highlighted image from the input image, and then leverages a pre-trained MDE network to infer the initial depths of the aforementioned three images. After that, ECFNet utilizes a layered fusion module (LFM) to fuse the initial depth, which will be further updated by a depth consistency module (DCM) to form the final estimation. Extensive experimental results on public datasets and ablation studies indicate that our method achieves state-of-the-art performance. Project page: this https URL.
Abstract (translated)
本文提出了一种名为ECFNet的新单目深度估计方法,用于从单个RGB图像中估计高质量单目深度,具有清晰的边缘和有效的整体结构。我们对MDE网络边缘深度估计的关键因素进行了深入调查,得出的结论是边缘信息本身在预测深度细节中扮演了关键角色。基于这一分析,我们提出将图像边缘作为输入,并融合不同来源的初始深度,以产生最终深度的方法。具体来说,ECFNet首先使用混合边缘检测策略从输入图像中获取边缘图和边缘突出图像,然后利用预训练的MDE网络推断上述三个图像的初始深度。此后,ECFNet采用层叠融合模块(LFM)将初始深度进行融合,该模块将根据深度一致性模块(DCM)进一步更新以形成最终估计。对公开数据集的广泛实验结果和消融研究结果表明,我们的方法达到了最先进的性能水平。项目页面:https:// this URL。
URL
https://arxiv.org/abs/2404.00373