Paper Reading AI Learner

Instantaneous Perception of Moving Objects in 3D

2024-05-05 01:07:24
Di Liu, Bingbing Zhuang, Dimitris N. Metaxas, Manmohan Chandraker

Abstract

The perception of 3D motion of surrounding traffic participants is crucial for driving safety. While existing works primarily focus on general large motions, we contend that the instantaneous detection and quantification of subtle motions is equally important as they indicate the nuances in driving behavior that may be safety critical, such as behaviors near a stop sign of parking positions. We delve into this under-explored task, examining its unique challenges and developing our solution, accompanied by a carefully designed benchmark. Specifically, due to the lack of correspondences between consecutive frames of sparse Lidar point clouds, static objects might appear to be moving - the so-called swimming effect. This intertwines with the true object motion, thereby posing ambiguity in accurate estimation, especially for subtle motions. To address this, we propose to leverage local occupancy completion of object point clouds to densify the shape cue, and mitigate the impact of swimming artifacts. The occupancy completion is learned in an end-to-end fashion together with the detection of moving objects and the estimation of their motion, instantaneously as soon as objects start to move. Extensive experiments demonstrate superior performance compared to standard 3D motion estimation approaches, particularly highlighting our method's specialized treatment of subtle motions.

Abstract (translated)

周围交通参与者的3D运动感知对驾驶安全至关重要。虽然现有的工作主要关注大的运动,但我们认为微妙的运动的即时检测和量化同样重要。它们表明了驾驶行为中可能具有关键性的细微差别,比如靠近停车标志的行为。我们深入研究这个尚未被充分探索的任务,检查其独特挑战,并开发我们的解决方案,同时附带一个精心设计的基准。 具体来说,由于连续帧之间稀疏的Lidar点云之间没有对应关系,静止物体可能看起来在运动 - 所谓的游泳效应。这种交织与真实物体运动相互作用,从而导致对准确估计的模糊不确定性,特别是在微妙运动上。为了应对这个问题,我们提出了一种利用局部占有率完成物体点云的方法来填充形状线索,并减轻游泳伪影的影响。占有率完成是在物体开始运动时同时检测和估计其运动的过程中学习的。 大量的实验证明,与标准3D运动估计方法相比,我们的方法具有卓越的性能,特别是突出了我们方法对微妙运动的专门处理。

URL

https://arxiv.org/abs/2405.02781

PDF

https://arxiv.org/pdf/2405.02781.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot