Abstract
This work presents Prior Depth Anything, a framework that combines incomplete but precise metric information in depth measurement with relative but complete geometric structures in depth prediction, generating accurate, dense, and detailed metric depth maps for any scene. To this end, we design a coarse-to-fine pipeline to progressively integrate the two complementary depth sources. First, we introduce pixel-level metric alignment and distance-aware weighting to pre-fill diverse metric priors by explicitly using depth prediction. It effectively narrows the domain gap between prior patterns, enhancing generalization across varying scenarios. Second, we develop a conditioned monocular depth estimation (MDE) model to refine the inherent noise of depth priors. By conditioning on the normalized pre-filled prior and prediction, the model further implicitly merges the two complementary depth sources. Our model showcases impressive zero-shot generalization across depth completion, super-resolution, and inpainting over 7 real-world datasets, matching or even surpassing previous task-specific methods. More importantly, it performs well on challenging, unseen mixed priors and enables test-time improvements by switching prediction models, providing a flexible accuracy-efficiency trade-off while evolving with advancements in MDE models.
Abstract (translated)
这项工作提出了Prior Depth Anything框架,该框架结合了深度测量中不完整但精确的度量信息与深度预测中相对但完整的几何结构,从而为任何场景生成准确、密集和详细的度量深度图。为此,我们设计了一个从粗到细的流水线,逐步整合这两种互补的深度来源。首先,我们引入像素级度量对齐和距离感知加权,通过明确使用深度预测来预先填充各种度量先验,有效地缩小了先前模式之间的领域差距,增强了在不同场景中的泛化能力。其次,我们开发了一个条件单目深度估计(MDE)模型,以精炼深度先验中存在的固有噪声。该模型通过对归一化的预填充分先验和预测进行调节,进一步隐式地融合这两种互补的深度来源。我们的模型在七个真实世界数据集上展示了跨深度完成、超分辨率和修复任务的强大零样本泛化能力,并且在这些特定任务的方法中表现出匹配甚至超越的结果。更重要的是,它在具有挑战性的未见混合先验下表现良好,并通过切换预测模型实现了测试时间改进,提供了一个灵活的准确性和效率之间的权衡,随着MDE模型的进步而不断进化。
URL
https://arxiv.org/abs/2505.10565