Paper Reading AI Learner

Self-Supervised Enhancement for Depth from a Lightweight ToF Sensor with Monocular Images

2025-06-16 12:57:58
Laiyan Ding, Hualie Jiang, Jiwei Chen, Rui Huang

Abstract

Depth map enhancement using paired high-resolution RGB images offers a cost-effective solution for improving low-resolution depth data from lightweight ToF sensors. Nevertheless, naively adopting a depth estimation pipeline to fuse the two modalities requires groundtruth depth maps for supervision. To address this, we propose a self-supervised learning framework, SelfToF, which generates detailed and scale-aware depth maps. Starting from an image-based self-supervised depth estimation pipeline, we add low-resolution depth as inputs, design a new depth consistency loss, propose a scale-recovery module, and finally obtain a large performance boost. Furthermore, since the ToF signal sparsity varies in real-world applications, we upgrade SelfToF to SelfToF* with submanifold convolution and guided feature fusion. Consequently, SelfToF* maintain robust performance across varying sparsity levels in ToF data. Overall, our proposed method is both efficient and effective, as verified by extensive experiments on the NYU and ScanNet datasets. The code will be made public.

Abstract (translated)

使用配对的高分辨率RGB图像来增强深度图,为改善来自轻量级ToF(飞行时间)传感器的低分辨率深度数据提供了一种成本效益高的解决方案。然而,简单地采用传统的深度估计管道融合这两种模态需要地面实况深度图来进行监督。为了应对这一挑战,我们提出了一种自监督学习框架SelfToF,该框架能够生成详细且尺度感知的深度图。从基于图像的自监督深度估计管道开始,我们将低分辨率深度图作为输入加入进来,并设计了新的深度一致性损失函数,提出了尺度恢复模块,最终实现了性能的重大提升。 由于实际应用中ToF信号稀疏性会有所不同,我们进一步升级SelfToF框架为SelfToF*版本,通过添加子流形卷积和引导特征融合来适应不同情况。这样,SelfToF*在处理不同稀疏度级别的ToF数据时能够保持稳健的性能表现。 总的来说,我们的方法既高效又有效,这一点已经在NYU(纽约大学)和ScanNet数据集上的广泛实验中得到了验证。代码将在后续公开发布。

URL

https://arxiv.org/abs/2506.13444

PDF

https://arxiv.org/pdf/2506.13444.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot