Paper Reading AI Learner

Towards Consistent Object Detection via LiDAR-Camera Synergy

2024-05-02 13:04:26
Kai Luo, Hao Wu, Kefu Yi, Kailun Yang, Wei Hao, Rongdong Hu

Abstract

As human-machine interaction continues to evolve, the capacity for environmental perception is becoming increasingly crucial. Integrating the two most common types of sensory data, images, and point clouds, can enhance detection accuracy. However, currently, no model exists that can simultaneously detect an object's position in both point clouds and images and ascertain their corresponding relationship. This information is invaluable for human-machine interactions, offering new possibilities for their enhancement. In light of this, this paper introduces an end-to-end Consistency Object Detection (COD) algorithm framework that requires only a single forward inference to simultaneously obtain an object's position in both point clouds and images and establish their correlation. Furthermore, to assess the accuracy of the object correlation between point clouds and images, this paper proposes a new evaluation metric, Consistency Precision (CP). To verify the effectiveness of the proposed framework, an extensive set of experiments has been conducted on the KITTI and DAIR-V2X datasets. The study also explored how the proposed consistency detection method performs on images when the calibration parameters between images and point clouds are disturbed, compared to existing post-processing methods. The experimental results demonstrate that the proposed method exhibits excellent detection performance and robustness, achieving end-to-end consistency detection. The source code will be made publicly available at this https URL.

Abstract (translated)

随着人机交互的不断进化,环境感知的能力变得越来越重要。将两种最常见的感官数据,图像和点云,集成在一起可以提高检测精度。然而,目前尚无同时检测物体在点云和图像中的位置以及确定它们相应关系的产品。这些信息对于人机交互非常重要,为它们的增强提供了新的可能性。因此,本文介绍了一个端到端的一致性物体检测(COD)算法框架,只需要一个前向推理即可同时获得物体在点云和图像中的位置并确定它们之间的关联。此外,为了评估点云和图像之间物体关联的准确性,本文提出了一个新的评估指标,一致性精度(CP)。为了验证所提出的框架的有效性,在KITTI和DAIR-V2X数据集上进行了大量实验。研究还探讨了在图像和点云之间的校准参数受到干扰时,所提出的一致性检测方法在图片上的表现,与现有后处理方法进行了比较。实验结果表明,与现有方法相比,所提出的检测方法具有卓越的检测性能和鲁棒性,实现了端到端的一致性检测。源代码将公开在https://这个网址上。

URL

https://arxiv.org/abs/2405.01258

PDF

https://arxiv.org/pdf/2405.01258.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot