Paper Reading AI Learner

Let It Flow: Simultaneous Optimization of 3D Flow and Object Clustering

2024-04-12 10:04:03
Patrik Vacek, David Hurych, Tomáš Svoboda, Karel Zimmermann

Abstract

We study the problem of self-supervised 3D scene flow estimation from real large-scale raw point cloud sequences, which is crucial to various tasks like trajectory prediction or instance segmentation. In the absence of ground truth scene flow labels, contemporary approaches concentrate on deducing optimizing flow across sequential pairs of point clouds by incorporating structure based regularization on flow and object rigidity. The rigid objects are estimated by a variety of 3D spatial clustering methods. While state-of-the-art methods successfully capture overall scene motion using the Neural Prior structure, they encounter challenges in discerning multi-object motions. We identified the structural constraints and the use of large and strict rigid clusters as the main pitfall of the current approaches and we propose a novel clustering approach that allows for combination of overlapping soft clusters as well as non-overlapping rigid clusters representation. Flow is then jointly estimated with progressively growing non-overlapping rigid clusters together with fixed size overlapping soft clusters. We evaluate our method on multiple datasets with LiDAR point clouds, demonstrating the superior performance over the self-supervised baselines reaching new state of the art results. Our method especially excels in resolving flow in complicated dynamic scenes with multiple independently moving objects close to each other which includes pedestrians, cyclists and other vulnerable road users. Our codes will be publicly available.

Abstract (translated)

我们研究从真实大型规模的点云序列中自监督3D场景流估计的问题,这对各种任务如轨迹预测或实例分割至关重要。在没有地面真实场景流标签的情况下,当代方法集中精力通过在流动和物体刚性上基于结构的正则化来推断序列点云之间的优化流动。刚性物体通过多种3D空间聚类方法估计。尽管最先进的方法能够通过神经先验结构成功捕捉整体场景运动,但他们在区分多对象运动时遇到了挑战。我们确定了当前方法的结构性约束以及大型严格刚性簇的使用是主要缺陷,并提出了一种新的聚类方法,允许结合重叠的软簇和非重叠的刚性簇表示。然后与固定大小的重叠软簇一起,使用共同增长的非重叠刚性簇估计流。我们在多个带有激光雷达点云的数据集上评估我们的方法,证明了自监督基线与自监督基线的优越性能,达到了新颖的最好结果。我们的方法在解决具有多个独立移动对象且彼此靠近的复杂动态场景中的流方面尤为出色。我们的代码将公开可用。

URL

https://arxiv.org/abs/2404.08363

PDF

https://arxiv.org/pdf/2404.08363.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot