Paper Reading AI Learner

Fiducial Exoskeletons: Image-Centric Robot State Estimation

2026-01-12 22:04:25
Cameron Smith, Basile Van Hoorick, Vitor Guizilini, Yue Wang

Abstract

We introduce Fiducial Exoskeletons, an image-based reformulation of 3D robot state estimation that replaces cumbersome procedures and motor-centric pipelines with single-image inference. Traditional approaches - especially robot-camera extrinsic estimation - often rely on high-precision actuators and require time-consuming routines such as hand-eye calibration. In contrast, modern learning-based robot control is increasingly trained and deployed from RGB observations on lower-cost hardware. Our key insight is twofold. First, we cast robot state estimation as 6D pose estimation of each link from a single RGB image: the robot-camera base transform is obtained directly as the estimated base-link pose, and the joint state is recovered via a lightweight global optimization that enforces kinematic consistency with the observed link poses (optionally warm-started with encoder readings). Second, we make per-link 6D pose estimation robust and simple - even without learning - by introducing the fiducial exoskeleton: a lightweight 3D-printed mount with a fiducial marker on each link and known marker-link geometry. This design yields robust camera-robot extrinsics, per-link SE(3) poses, and joint-angle state from a single image, enabling robust state estimation even on unplugged robots. Demonstrated on a low-cost robot arm, fiducial exoskeletons substantially simplify setup while improving calibration, state accuracy, and downstream 3D control performance. We release code and printable hardware designs to enable further algorithm-hardware co-design.

Abstract (translated)

我们介绍了一种名为“基准外骨骼”的图像基三维机器人状态估计方法,这种方法用单张图片的推断取代了复杂的操作和以电机为中心的工作流程。传统的方法——尤其是机器人的相机外部参数估算——常常依赖于高精度执行器,并且需要诸如手动眼睛校准之类的耗时过程。相比之下,现代基于学习的机器人控制越来越多地使用低成本硬件上的RGB观察结果进行训练和部署。 我们的关键见解有两个方面。首先,我们将机器人状态估计重新定义为从单张RGB图像中估算每个链节的6D姿态:机器人相机的基础变换直接通过基础-链节的姿态估计获得,并且关节状态可以通过轻量级全局优化恢复,该过程强制执行与观察到的链节姿态一致的运动学一致性(在使用编码器读数进行热启动时可选)。 其次,我们通过引入基准外骨骼来使每个链节的6D姿态估算更加稳健和简单——即使不采用学习方法。这种设计包括一个轻量级的3D打印支架,在每个链节上安装了一个基准标记,并且已知标记-链节几何关系。这一设计理念能够生成稳健的相机机器人外部参数,单个图像中的每个链接SE(3)姿态以及关节角度状态,从而即使在断电的情况下也能实现精确的状态估算。 在一个低成本的机械臂上进行演示后,我们发现使用基准外骨骼大大简化了设置过程,并且提高了校准、状态精度和下游三维控制性能。为了进一步促进算法与硬件的设计协同工作,我们将发布代码及可打印的硬件设计。

URL

https://arxiv.org/abs/2601.08034

PDF

https://arxiv.org/pdf/2601.08034.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot