Paper Reading AI Learner

GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation

2024-05-08 08:39:25
Ivan Bili\'c, Filip Mari\'c, Fabio Bonsignorio, Ivan Petrovi\'c

Abstract

For autonomous robotics applications, it is crucial that robots are able to accurately measure their potential state and perceive their environment, including other agents within it (e.g., cobots interacting with humans). The redundancy of these measurements is important, as it allows for planning and execution of recovery protocols in the event of sensor failure or external disturbances. Visual estimation can provide this redundancy through the use of low-cost sensors and server as a standalone source of proprioception when no encoder-based sensing is available. Therefore, we estimate the configuration of the robot jointly with its pose, which provides a complete spatial understanding of the observed robot. We present GISR - a method for deep configuration and robot-to-camera pose estimation that prioritizes real-time execution. GISR is comprised of two modules: (i) a geometric initialization module, efficiently computing an approximate robot pose and configuration, and (ii) an iterative silhouette-based refinement module that refines the initial solution in only a few iterations. We evaluate our method on a publicly available dataset and show that GISR performs competitively with existing state-of-the-art approaches, while being significantly faster compared to existing methods of the same class. Our code is available at this https URL.

Abstract (translated)

对于自主机器人应用,确保机器人能够准确测量其潜在状态并感知其环境(包括其内部的其他机器人,例如与人类交互的协作机器人),对冗余进行重要评估,以便在传感器故障或外部干扰的情况下执行恢复协议。视觉估计可以通过使用低成本传感器和服务器作为自包含姿态感觉器时提供冗余来实现。因此,我们与姿态一起估计机器人的配置,这提供了对观察到的机器人的完整空间理解。我们提出了GISR - 一个注重实时执行的机器人配置和机器人-相机姿态估计方法。GISR由两个模块组成:(i)一个几何初始化模块,高效计算出近似的机器人姿态和配置;(ii)一个迭代轮廓基于平滑的优化模块,仅在几次迭代后对初始解决方案进行平滑。我们在公开可用的数据集上评估我们的方法,并证明了GISR与现有高级方法具有竞争力,同时比相同类型的现有方法速度更快。我们的代码可在此处访问:https://www.thorlabs.com/newgrouppage9.cfm?objectgroup_id=11375

URL

https://arxiv.org/abs/2405.04890

PDF

https://arxiv.org/pdf/2405.04890.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot