Paper Reading AI Learner

NeuSE: Neural SE-Equivariant Embedding for Consistent Spatial Understanding with Objects

2023-03-13 17:30:43
Jiahui Fu, Yilun Du, Kurran Singh, Joshua B. Tenenbaum, John J. Leonard

Abstract

We present NeuSE, a novel Neural SE(3)-Equivariant Embedding for objects, and illustrate how it supports object SLAM for consistent spatial understanding with long-term scene changes. NeuSE is a set of latent object embeddings created from partial object observations. It serves as a compact point cloud surrogate for complete object models, encoding full shape information while transforming SE(3)-equivariantly in tandem with the object in the physical world. With NeuSE, relative frame transforms can be directly derived from inferred latent codes. Our proposed SLAM paradigm, using NeuSE for object shape and pose characterization, can operate independently or in conjunction with typical SLAM systems. It directly infers SE(3) camera pose constraints that are compatible with general SLAM pose graph optimization, while also maintaining a lightweight object-centric map that adapts to real-world changes. Our approach is evaluated on synthetic and real-world sequences featuring changed objects and shows improved localization accuracy and change-aware mapping capability, when working either standalone or jointly with a common SLAM pipeline.

Abstract (translated)

我们提出了 NeuSE 对象特性编码方案,一种全新的 Neural SE(3)-Equivariant Embedding 方法,用于实现对象 SLAM,并展示它如何支持对象在长期场景变化中 consistent 空间理解。NeuSE 是从部分对象观测中提取的隐含对象Embeddings,充当完整对象模型的紧凑点云模拟,同时与现实世界的对象协同编码 full 形状信息,实现 SE(3)-equivariant 变换。与 NeuSE 配合使用,可以直接从推断的隐含编码中推导出相对帧变换。我们提出的 SLAM 范式,使用 NeuSE 对对象形状和姿态进行特征化,可以独立运行或与典型 SLAM 系统协同工作。它直接推断与一般 SLAM 姿态图优化兼容的 SE(3)相机姿态限制,同时保持轻量级的对象中心地图,适应现实世界的变化。我们的方法在模拟和实际场景中进行了验证,展示了在单独运行或与通用 SLAM 流程共同工作时提高定位精度和变化感知能力的能力。

URL

https://arxiv.org/abs/2303.07308

PDF

https://arxiv.org/pdf/2303.07308.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot