Abstract
Real-time multi-camera 3D reconstruction is crucial for 3D perception, immersive interaction, and robotics. Existing methods struggle with multi-view fusion, camera extrinsic uncertainty, and scalability for large camera setups. We propose SPARK, a self-calibrating real-time multi-camera point cloud reconstruction framework that jointly handles point cloud fusion and extrinsic uncertainty. SPARK consists of: (1) a geometry-aware online extrinsic estimation module leveraging multi-view priors and enforcing cross-view and temporal consistency for stable self-calibration, and (2) a confidence-driven point cloud fusion strategy modeling depth reliability and visibility at pixel and point levels to suppress noise and view-dependent inconsistencies. By performing frame-wise fusion without accumulation, SPARK produces stable point clouds in dynamic scenes while scaling linearly with the number of cameras. Extensive experiments on real-world multi-camera systems show that SPARK outperforms existing approaches in extrinsic accuracy, geometric consistency, temporal stability, and real-time performance, demonstrating its effectiveness and scalability for large-scale multi-camera 3D reconstruction.
Abstract (translated)
实时多摄像头三维重建对于三维感知、沉浸式交互和机器人技术至关重要。现有的方法在处理多视图融合、摄像机外参不确定性和大规模摄像头设置的可扩展性方面存在困难。我们提出了一种名为SPARK的自校准实时多摄像头点云重建框架,该框架同时处理点云融合和外参不确定性问题。SPARK包含以下两部分: 1. 一种基于几何信息的在线外参估计模块,利用多视图先验并强制执行跨视角和时间上的一致性以实现稳定的自我校准。 2. 一种信心驱动的点云融合策略,在像素级和点级别建模深度可靠性和可见度,以此抑制噪声和视角依赖性不一致。 通过进行帧级别的融合而不积累,SPARK能够在动态场景中生成稳定且准确的点云,并且随着摄像头数量线性扩展。在真实世界的多摄像头系统上的大量实验表明,与现有方法相比,SPARK在外参精度、几何一致性、时间稳定性以及实时性能方面表现出色,证明了其对于大规模多摄像头三维重建的有效性和可扩展性。
URL
https://arxiv.org/abs/2601.08414