Paper Reading AI Learner

RaCalNet: Radar Calibration Network for Sparse-Supervised Metric Depth Estimation

2025-06-18 15:35:16
Xingrui Qin, Wentao Zhao, Chuan Cao, Yihe Niu, Houcheng Jiang, Jingchuan Wang

Abstract

Dense metric depth estimation using millimeter-wave radar typically requires dense LiDAR supervision, generated via multi-frame projection and interpolation, to guide the learning of accurate depth from sparse radar measurements and RGB images. However, this paradigm is both costly and data-intensive. To address this, we propose RaCalNet, a novel framework that eliminates the need for dense supervision by using sparse LiDAR to supervise the learning of refined radar measurements, resulting in a supervision density of merely around 1% compared to dense-supervised methods. Unlike previous approaches that associate radar points with broad image regions and rely heavily on dense labels, RaCalNet first recalibrates and refines sparse radar points to construct accurate depth priors. These priors then serve as reliable anchors to guide monocular depth prediction, enabling metric-scale estimation without resorting to dense supervision. This design improves structural consistency and preserves fine details. Despite relying solely on sparse supervision, RaCalNet surpasses state-of-the-art dense-supervised methods, producing depth maps with clear object contours and fine-grained textures. Extensive experiments on the ZJU-4DRadarCam dataset and real-world deployment scenarios demonstrate its effectiveness, reducing RMSE by 35.30% and 34.89%, respectively.

Abstract (translated)

使用毫米波雷达进行密集度量深度估计通常需要通过多帧投影和插值生成的密集LiDAR监督来指导从稀疏雷达测量和RGB图像中学习准确的深度信息。然而,这种范式既昂贵又需要大量的数据。为了解决这个问题,我们提出了RaCalNet,这是一种新的框架,它通过使用稀疏LiDAR来监督精炼后的雷达测量的学习过程,从而消除了对密集监督的需求,其监督密度仅为密集监督方法的大约1%。 与之前的方法将雷达点关联到宽泛的图像区域并依赖于密集标签不同,RaCalNet首先重新校准和细化稀疏雷达点以构建准确的深度先验。这些先验作为可靠的锚点来引导单目深度预测,从而实现没有密集监督下的度量级估计。这种设计提高了结构一致性,并保留了细粒度细节。 尽管仅依赖于稀疏监督,RaCalNet在深度图生成方面超过了最先进的密集监督方法,产生了具有清晰物体轮廓和细腻纹理的深度图。在ZJU-4DRadarCam数据集和真实世界部署场景中的广泛实验验证了其有效性,分别将RMSE降低了35.30%和34.89%。

URL

https://arxiv.org/abs/2506.15560

PDF

https://arxiv.org/pdf/2506.15560.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot