Paper Reading AI Learner

Robust Depth Super-Resolution via Adaptive Diffusion Sampling

2026-02-10 08:10:02
Kun Wang, Yun Zhu, Pan Zhou, Na Zhao

Abstract

We propose AdaDS, a generalizable framework for depth super-resolution that robustly recovers high-resolution depth maps from arbitrarily degraded low-resolution inputs. Unlike conventional approaches that directly regress depth values and often exhibit artifacts under severe or unknown degradation, AdaDS capitalizes on the contraction property of Gaussian smoothing: as noise accumulates in the forward process, distributional discrepancies between degraded inputs and their pristine high-quality counterparts diminish, ultimately converging to isotropic Gaussian prior. Leveraging this, AdaDS adaptively selects a starting timestep in the reverse diffusion trajectory based on estimated refinement uncertainty, and subsequently injects tailored noise to position the intermediate sample within the high-probability region of the target posterior distribution. This strategy ensures inherent robustness, enabling generative prior of a pre-trained diffusion model to dominate recovery even when upstream estimations are imperfect. Extensive experiments on real-world and synthetic benchmarks demonstrate AdaDS's superior zero-shot generalization and resilience to diverse degradation patterns compared to state-of-the-art methods.

Abstract (translated)

我们提出了AdaDS,这是一种通用的深度超分辨率框架,能够从任意退化的低分辨率输入中稳健地恢复出高分辨率的深度图。与传统的直接回归深度值的方法不同,在严重或未知退化情况下经常出现伪影,AdaDS 利用了高斯平滑的收缩特性:随着噪声在前向过程中积累,退化输入与其原始高质量版本之间的分布差异逐渐减小,并最终收敛于各向同性的高斯先验。 借助这一点,AdaDS 根据估计的细化不确定性自适应地选择反向扩散轨迹中的起始时间步长,并随后注入定制化的噪声以将中间样本置于目标后验分布的高概率区域内。这种策略确保了固有的鲁棒性,使得预训练扩散模型的生成先验能够主导恢复过程,即使上游估计不完美也是如此。 在现实世界和合成基准测试上的广泛实验表明,与最先进的方法相比,AdaDS 在零样本泛化和对各种退化模式的韧性方面表现更佳。

URL

https://arxiv.org/abs/2602.09510

PDF

https://arxiv.org/pdf/2602.09510.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot