Dexterity from Touch: Self-Supervised Pre-Training of Tactile Representations with Robotic Play

Abstract
Abstract (translated)
URL
PDF

Abstract

Teaching dexterity to multi-fingered robots has been a longstanding challenge in robotics. Most prominent work in this area focuses on learning controllers or policies that either operate on visual observations or state estimates derived from vision. However, such methods perform poorly on fine-grained manipulation tasks that require reasoning about contact forces or about objects occluded by the hand itself. In this work, we present T-Dex, a new approach for tactile-based dexterity, that operates in two phases. In the first phase, we collect 2.5 hours of play data, which is used to train self-supervised tactile encoders. This is necessary to bring high-dimensional tactile readings to a lower-dimensional embedding. In the second phase, given a handful of demonstrations for a dexterous task, we learn non-parametric policies that combine the tactile observations with visual ones. Across five challenging dexterous tasks, we show that our tactile-based dexterity models outperform purely vision and torque-based models by an average of 1.7X. Finally, we provide a detailed analysis on factors critical to T-Dex including the importance of play data, architectures, and representation learning.

Abstract (translated)

将多指机器人的教育 dexterity 问题已经持续了多年的 robotics 领域的挑战。该领域的主要工作都关注于学习控制器或政策,它们要么基于视觉观察或从视觉中推断的状态估计进行操作。然而,这些方法在处理需要对接触力量或手部本身遮盖的对象进行推理的精细操作任务时表现不佳。在这项工作中,我们提出了 T-Dex,一种基于触觉的 dexterity 新 approach,并采用了两个阶段的运行方式。在第一阶段,我们收集了 2.5 小时的玩耍数据,用于训练自我监督触觉编码器。这是将高维触觉读数嵌入低维空间的必要步骤。在第二阶段,我们提供了少量演示来完成一个灵巧的任务,并学习基于非参数政策的结合触觉观察的视觉政策。在五个具有挑战性的灵巧任务中,我们表明,我们基于触觉的 dexterity 模型平均比纯视觉和触觉动力模型表现更好,提高了 1.7 倍。最后,我们详细分析了 T-Dex 中的关键因素,包括玩耍数据、架构和表示学习的重要性。

URL

https://arxiv.org/abs/2303.12076

PDF

https://arxiv.org/pdf/2303.12076.pdf