Paper Reading AI Learner

MoSS: Monocular Shape Sensing for Continuum Robots

2023-03-02 01:14:32
Chengnan Shentu, Enxu Li, Chaojun Chen, Puspita Triana Dewi, David B. Lindell, Jessica Burgner-Kahrs

Abstract

Continuum robots are promising candidates for interactive tasks in various applications due to their unique shape, compliance, and miniaturization capability. Accurate and real-time shape sensing is essential for such tasks yet remains a challenge. Embedded shape sensing has high hardware complexity and cost, while vision-based methods require stereo setup and struggle to achieve real-time performance. This paper proposes the first eye-to-hand monocular approach to continuum robot shape sensing. Utilizing a deep encoder-decoder network, our method, MoSSNet, eliminates the computation cost of stereo matching and reduces requirements on sensing hardware. In particular, MoSSNet comprises an encoder and three parallel decoders to uncover spatial, length, and contour information from a single RGB image, and then obtains the 3D shape through curve fitting. A two-segment tendon-driven continuum robot is used for data collection and testing, demonstrating accurate (mean shape error of 0.91 mm, or 0.36% of robot length) and real-time (70 fps) shape sensing on real-world data. Additionally, the method is optimized end-to-end and does not require fiducial markers, manual segmentation, or camera calibration. Code and datasets will be made available at this https URL.

Abstract (translated)

连续性机器人在各种应用中作为交互任务的理想候选者,因为它们独特的形状、合规性和小型化能力。准确实时的形状感知对于此类任务至关重要,但仍然是一个挑战。嵌入的形状感知具有高硬件复杂性和成本,而视觉方法需要立体设置并努力实现实时性能。本文提出了第一个从眼睛到手部的单向连续性机器人形状感知方法。利用深度编码器和解码网络,我们的方法MoSSNet消除了立体匹配的计算成本,并减少了感知硬件的要求。特别是,MoSSNet由编码器和三个并行解码器组成,从单个RGB图像中揭露空间、长度和轮廓信息,然后通过曲线 fitting获取3D形状。使用两个段的神经驱动连续性机器人用于数据收集和测试,在现实世界数据上展示了准确的(均值形状误差为0.91毫米,或机器人长度的0.36%)实时形状感知(70帧每秒)。此外,该方法实现了端到端优化,不需要标志点、手动分割或相机校准。代码和数据集将在此httpsURL上提供。

URL

https://arxiv.org/abs/2303.00891

PDF

https://arxiv.org/pdf/2303.00891.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot