Abstract
Dense metric depth estimation using millimeter-wave radar typically requires dense LiDAR supervision, generated via multi-frame projection and interpolation, to guide the learning of accurate depth from sparse radar measurements and RGB images. However, this paradigm is both costly and data-intensive. To address this, we propose RaCalNet, a novel framework that eliminates the need for dense supervision by using sparse LiDAR to supervise the learning of refined radar measurements, resulting in a supervision density of merely around 1% compared to dense-supervised methods. Unlike previous approaches that associate radar points with broad image regions and rely heavily on dense labels, RaCalNet first recalibrates and refines sparse radar points to construct accurate depth priors. These priors then serve as reliable anchors to guide monocular depth prediction, enabling metric-scale estimation without resorting to dense supervision. This design improves structural consistency and preserves fine details. Despite relying solely on sparse supervision, RaCalNet surpasses state-of-the-art dense-supervised methods, producing depth maps with clear object contours and fine-grained textures. Extensive experiments on the ZJU-4DRadarCam dataset and real-world deployment scenarios demonstrate its effectiveness, reducing RMSE by 35.30% and 34.89%, respectively.
Abstract (translated)
使用毫米波雷达进行密集度量深度估计通常需要通过多帧投影和插值生成的密集LiDAR监督来指导从稀疏雷达测量和RGB图像中学习准确的深度信息。然而,这种范式既昂贵又需要大量的数据。为了解决这个问题,我们提出了RaCalNet,这是一种新的框架,它通过使用稀疏LiDAR来监督精炼后的雷达测量的学习过程,从而消除了对密集监督的需求,其监督密度仅为密集监督方法的大约1%。 与之前的方法将雷达点关联到宽泛的图像区域并依赖于密集标签不同,RaCalNet首先重新校准和细化稀疏雷达点以构建准确的深度先验。这些先验作为可靠的锚点来引导单目深度预测,从而实现没有密集监督下的度量级估计。这种设计提高了结构一致性,并保留了细粒度细节。 尽管仅依赖于稀疏监督,RaCalNet在深度图生成方面超过了最先进的密集监督方法,产生了具有清晰物体轮廓和细腻纹理的深度图。在ZJU-4DRadarCam数据集和真实世界部署场景中的广泛实验验证了其有效性,分别将RMSE降低了35.30%和34.89%。
URL
https://arxiv.org/abs/2506.15560