Abstract
Continuum robots are promising candidates for interactive tasks in various applications due to their unique shape, compliance, and miniaturization capability. Accurate and real-time shape sensing is essential for such tasks yet remains a challenge. Embedded shape sensing has high hardware complexity and cost, while vision-based methods require stereo setup and struggle to achieve real-time performance. This paper proposes the first eye-to-hand monocular approach to continuum robot shape sensing. Utilizing a deep encoder-decoder network, our method, MoSSNet, eliminates the computation cost of stereo matching and reduces requirements on sensing hardware. In particular, MoSSNet comprises an encoder and three parallel decoders to uncover spatial, length, and contour information from a single RGB image, and then obtains the 3D shape through curve fitting. A two-segment tendon-driven continuum robot is used for data collection and testing, demonstrating accurate (mean shape error of 0.91 mm, or 0.36% of robot length) and real-time (70 fps) shape sensing on real-world data. Additionally, the method is optimized end-to-end and does not require fiducial markers, manual segmentation, or camera calibration. Code and datasets will be made available at this https URL.
Abstract (translated)
连续性机器人在各种应用中作为交互任务的理想候选者,因为它们独特的形状、合规性和小型化能力。准确实时的形状感知对于此类任务至关重要,但仍然是一个挑战。嵌入的形状感知具有高硬件复杂性和成本,而视觉方法需要立体设置并努力实现实时性能。本文提出了第一个从眼睛到手部的单向连续性机器人形状感知方法。利用深度编码器和解码网络,我们的方法MoSSNet消除了立体匹配的计算成本,并减少了感知硬件的要求。特别是,MoSSNet由编码器和三个并行解码器组成,从单个RGB图像中揭露空间、长度和轮廓信息,然后通过曲线 fitting获取3D形状。使用两个段的神经驱动连续性机器人用于数据收集和测试,在现实世界数据上展示了准确的(均值形状误差为0.91毫米,或机器人长度的0.36%)实时形状感知(70帧每秒)。此外,该方法实现了端到端优化,不需要标志点、手动分割或相机校准。代码和数据集将在此httpsURL上提供。
URL
https://arxiv.org/abs/2303.00891