Paper Reading AI Learner

Have We Scene It All? Scene Graph-Aware Deep Point Cloud Compression

2025-10-09 17:45:09
Nikolaos Stathoulopoulos, Christoforos Kanellakis, George Nikolakopoulos

Abstract

Efficient transmission of 3D point cloud data is critical for advanced perception in centralized and decentralized multi-agent robotic systems, especially nowadays with the growing reliance on edge and cloud-based processing. However, the large and complex nature of point clouds creates challenges under bandwidth constraints and intermittent connectivity, often degrading system performance. We propose a deep compression framework based on semantic scene graphs. The method decomposes point clouds into semantically coherent patches and encodes them into compact latent representations with semantic-aware encoders conditioned by Feature-wise Linear Modulation (FiLM). A folding-based decoder, guided by latent features and graph node attributes, enables structurally accurate reconstruction. Experiments on the SemanticKITTI and nuScenes datasets show that the framework achieves state-of-the-art compression rates, reducing data size by up to 98% while preserving both structural and semantic fidelity. In addition, it supports downstream applications such as multi-robot pose graph optimization and map merging, achieving trajectory accuracy and map alignment comparable to those obtained with raw LiDAR scans.

Abstract (translated)

三维点云数据的高效传输对于集中式和分布式多机器人系统的高级感知至关重要,尤其是在如今越来越依赖边缘计算和基于云端处理的情况下。然而,由于点云的大尺寸和复杂性,在带宽限制和间歇连接条件下进行传输会产生挑战,通常会降低系统性能。我们提出了一种基于语义场景图的深度压缩框架。该方法将点云分解为语义一致的补丁,并使用感知条件编码器(通过逐特征线性调制 FiLM 调节)将其编码为紧凑的潜在表示形式。一种由潜在特性及图形节点属性引导的折叠解码器,可以实现结构准确的重建。在 SemanticKITTI 和 nuScenes 数据集上的实验表明,该框架实现了最先进的压缩率,在保持结构和语义保真度的同时可将数据大小减少多达98%。此外,它支持下游应用如多机器人姿态图优化和地图合并,并且与使用原始激光雷达扫描获得的轨迹精度及地图对齐相当。

URL

https://arxiv.org/abs/2510.08512

PDF

https://arxiv.org/pdf/2510.08512.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot