Paper Reading AI Learner

BEINGS: Bayesian Embodied Image-goal Navigation with Gaussian Splatting

2024-09-16 12:07:02
Wugang Meng, Tianfu Wu, Huan Yin, Fumin Zhang

Abstract

Image-goal navigation enables a robot to reach the location where a target image was captured, using visual cues for guidance. However, current methods either rely heavily on data and computationally expensive learning-based approaches or lack efficiency in complex environments due to insufficient exploration strategies. To address these limitations, we propose Bayesian Embodied Image-goal Navigation Using Gaussian Splatting, a novel method that formulates ImageNav as an optimal control problem within a model predictive control framework. BEINGS leverages 3D Gaussian Splatting as a scene prior to predict future observations, enabling efficient, real-time navigation decisions grounded in the robot's sensory experiences. By integrating Bayesian updates, our method dynamically refines the robot's strategy without requiring extensive prior experience or data. Our algorithm is validated through extensive simulations and physical experiments, showcasing its potential for embodied robot systems in visually complex scenarios.

Abstract (translated)

图像目标导航使机器人能够使用视觉提示到达捕捉目标图像的位置,实现基于视觉的指导。然而,现有方法要么过于依赖数据和计算密集型基于学习的 approach,要么在复杂环境中缺乏效率,因为缺乏足够的探索策略。为了克服这些局限,我们提出了使用高斯展平的高熵图像目标导航方法,一种将图像导航视为模型预测控制框架中的最优控制问题的新颖方法。BEINGS 利用 3D 高斯展平作为场景先验,从而实现基于机器人感官体验的高效、实时导航决策。通过实现贝叶斯更新,我们的方法动态地优化机器人的策略,而无需大量的前经验或数据。通过广泛的仿真和实验验证,我们的算法证明了其在视觉复杂场景中 embodied 机器人系统的巨大潜力。

URL

https://arxiv.org/abs/2409.10216

PDF

https://arxiv.org/pdf/2409.10216.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot