Paper Reading AI Learner

Robust Collaborative Perception without External Localization and Clock Devices

2024-05-05 15:20:36
Zixing Lei, Zhenyang Ni, Ruize Han, Shuo Tang, Chen Feng, Siheng Chen, Yanfeng Wang

Abstract

A consistent spatial-temporal coordination across multiple agents is fundamental for collaborative perception, which seeks to improve perception abilities through information exchange among agents. To achieve this spatial-temporal alignment, traditional methods depend on external devices to provide localization and clock signals. However, hardware-generated signals could be vulnerable to noise and potentially malicious attack, jeopardizing the precision of spatial-temporal alignment. Rather than relying on external hardwares, this work proposes a novel approach: aligning by recognizing the inherent geometric patterns within the perceptual data of various agents. Following this spirit, we propose a robust collaborative perception system that operates independently of external localization and clock devices. The key module of our system,~\emph{FreeAlign}, constructs a salient object graph for each agent based on its detected boxes and uses a graph neural network to identify common subgraphs between agents, leading to accurate relative pose and time. We validate \emph{FreeAlign} on both real-world and simulated datasets. The results show that, the ~\emph{FreeAlign} empowered robust collaborative perception system perform comparably to systems relying on precise localization and clock devices.

Abstract (translated)

在多个代理器之间实现一致的空间-时间协调是协同感知的基本前提,该目标是通过代理器之间的信息交流来改善代理器的感知能力。要实现这种空间-时间对齐,传统方法依赖于外部设备提供定位和时钟信号。然而,硬件生成的信号可能受到噪声和潜在恶意攻击的影响,从而危及空间-时间对齐的精度。因此,我们提出了一个新方法:通过识别各种代理感知数据中的固有几何模式来对齐。 在這種精神下,我们提出了一个自適應的協同感知系統,該系統獨立於外部定位和時鐘設備運行。我們的关键模塊~\emph{FreeAlign}基於其檢測到的邊框構建每個代理器的顯著對象圖形,並使用圖神經網絡在代理之間識別共同子圖形,從而實現準確的相對姿態和時間。 我们在現實世界和模擬數據上驗證了~\emph{FreeAlign}。結果表明,~\emph{FreeAlign}輔助的堅實協同感知系統與依賴精確定位和時鐘設備的系統性能相当。

URL

https://arxiv.org/abs/2405.02965

PDF

https://arxiv.org/pdf/2405.02965.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot