Abstract
Recently, diffusion-based depth estimation methods have drawn widespread attention due to their elegant denoising patterns and promising performance. However, they are typically unreliable under adverse conditions prevalent in real-world scenarios, such as rainy, snowy, etc. In this paper, we propose a novel robust depth estimation method called D4RD, featuring a custom contrastive learning mode tailored for diffusion models to mitigate performance degradation in complex environments. Concretely, we integrate the strength of knowledge distillation into contrastive learning, building the `trinity' contrastive scheme. This scheme utilizes the sampled noise of the forward diffusion process as a natural reference, guiding the predicted noise in diverse scenes toward a more stable and precise optimum. Moreover, we extend noise-level trinity to encompass more generic feature and image levels, establishing a multi-level contrast to distribute the burden of robust perception across the overall network. Before addressing complex scenarios, we enhance the stability of the baseline diffusion model with three straightforward yet effective improvements, which facilitate convergence and remove depth outliers. Extensive experiments demonstrate that D4RD surpasses existing state-of-the-art solutions on synthetic corruption datasets and real-world weather conditions. The code for D4RD will be made available for further exploration and adoption.
Abstract (translated)
近年来,基于扩散的深度估计方法因其优雅的去噪模式和对潜在性能的承诺而引起了广泛关注。然而,在现实场景中普遍存在的恶劣条件下,例如雨天、雪天等,这些方法通常不可靠。在本文中,我们提出了一个名为D4RD的新颖鲁棒深度估计方法,它针对扩散模型在复杂环境中的性能退化问题,采用自定义的对比学习模式,通过将知识蒸馏的力量融入对比学习,构建了`三体`对比方案。具体来说,我们将前扩散过程的随机噪声作为自然参考,指导不同场景中预测噪声向更加稳定和精确的最优解方向发展。此外,我们还扩展了噪音级别三体,涵盖更一般特征和图像级别,建立了多级对比,以在整个网络中分配稳健感知的负担。在解决复杂场景之前,我们通过三种简单而有效的改进来增强基线扩散模型的稳定性,促进收敛并消除深度异常。大量实验证明,D4RD在合成污染数据集和现实天气条件上超越了现有最先进的解决方案。D4RD的代码将供进一步探索和研究。
URL
https://arxiv.org/abs/2404.09831