Abstract
In this paper, we study the problem of jointly estimating the optical flow and scene flow from synchronized 2D and 3D data. Previous methods either employ a complex pipeline that splits the joint task into independent stages, or fuse 2D and 3D information in an ``early-fusion'' or ``late-fusion'' manner. Such one-size-fits-all approaches suffer from a dilemma of failing to fully utilize the characteristic of each modality or to maximize the inter-modality complementarity. To address the problem, we propose a novel end-to-end framework, which consists of 2D and 3D branches with multiple bidirectional fusion connections between them in specific layers. Different from previous work, we apply a point-based 3D branch to extract the LiDAR features, as it preserves the geometric structure of point clouds. To fuse dense image features and sparse point features, we propose a learnable operator named bidirectional camera-LiDAR fusion module (Bi-CLFM). We instantiate two types of the bidirectional fusion pipeline, one based on the pyramidal coarse-to-fine architecture (dubbed CamLiPWC), and the other one based on the recurrent all-pairs field transforms (dubbed CamLiRAFT). On FlyingThings3D, both CamLiPWC and CamLiRAFT surpass all existing methods and achieve up to a 47.9\% reduction in 3D end-point-error from the best published result. Our best-performing model, CamLiRAFT, achieves an error of 4.26\% on the KITTI Scene Flow benchmark, ranking 1st among all submissions with much fewer parameters. Besides, our methods have strong generalization performance and the ability to handle non-rigid motion. Code is available at this https URL.
Abstract (translated)
在本文中,我们研究从同步的2D和3D数据中 jointly estimating optical flow和场景流动的问题。以前的方法和方法要么使用复杂的管道将联合任务划分为独立阶段,要么在“早期融合”或“晚期融合”的方式下将2D和3D信息融合。这种适用于所有情况的方法面临一个困境,即未能充分利用每种模式的特性或最大化它们之间的互补性。为了解决这一问题,我们提出了一种全新的端到端框架,该框架由2D和3D分支,它们在特定的层中具有多个双向融合连接。与以前的工作不同,我们应用基于点基的3D分支来提取LiDAR特征,因为它保持了点云的几何结构。为了融合密集图像特征和稀疏点特征,我们提出了一种可学习的操作名称双向相机-LiDAR融合模块(Bi-CLFM)。我们实例化两种双向融合管道类型,一种基于Pyramidal Fine-to-Fine架构(称为 CamLiPWC),另一种基于循环全部对区域变换(称为 CamLiRAFT)。在飞行物体3D中, CamLiPWC和 CamLiRAFT都超越了所有现有方法,并在最佳公开结果上实现了3D端点误差的47.9%减少。我们的最优模型 CamLiRAFT 在KITTI场景Flow基准测试中实现了4.26%的错误,成为所有提交中参数更少的佼佼者。此外,我们的方法具有强大的泛化性能和处理非定域运动的能力。代码可在本网站的 https URL 中获取。
URL
https://arxiv.org/abs/2303.12017