Paper Reading AI Learner

Dexterity from Touch: Self-Supervised Pre-Training of Tactile Representations with Robotic Play

2023-03-21 17:59:20
Irmak Guzey, Ben Evans, Soumith Chintala, Lerrel Pinto

Abstract

Teaching dexterity to multi-fingered robots has been a longstanding challenge in robotics. Most prominent work in this area focuses on learning controllers or policies that either operate on visual observations or state estimates derived from vision. However, such methods perform poorly on fine-grained manipulation tasks that require reasoning about contact forces or about objects occluded by the hand itself. In this work, we present T-Dex, a new approach for tactile-based dexterity, that operates in two phases. In the first phase, we collect 2.5 hours of play data, which is used to train self-supervised tactile encoders. This is necessary to bring high-dimensional tactile readings to a lower-dimensional embedding. In the second phase, given a handful of demonstrations for a dexterous task, we learn non-parametric policies that combine the tactile observations with visual ones. Across five challenging dexterous tasks, we show that our tactile-based dexterity models outperform purely vision and torque-based models by an average of 1.7X. Finally, we provide a detailed analysis on factors critical to T-Dex including the importance of play data, architectures, and representation learning.

Abstract (translated)

将多指机器人的教育 dexterity 问题已经持续了多年的 robotics 领域的挑战。该领域的主要工作都关注于学习控制器或政策,它们要么基于视觉观察或从视觉中推断的状态估计进行操作。然而,这些方法在处理需要对接触力量或手部本身遮盖的对象进行推理的精细操作任务时表现不佳。在这项工作中,我们提出了 T-Dex,一种基于触觉的 dexterity 新 approach,并采用了两个阶段的运行方式。在第一阶段,我们收集了 2.5 小时的玩耍数据,用于训练自我监督触觉编码器。这是将高维触觉读数嵌入低维空间的必要步骤。在第二阶段,我们提供了少量演示来完成一个灵巧的任务,并学习基于非参数政策的结合触觉观察的视觉政策。在五个具有挑战性的灵巧任务中,我们表明,我们基于触觉的 dexterity 模型平均比纯视觉和触觉动力模型表现更好,提高了 1.7 倍。最后,我们详细分析了 T-Dex 中的关键因素,包括玩耍数据、架构和表示学习的重要性。

URL

https://arxiv.org/abs/2303.12076

PDF

https://arxiv.org/pdf/2303.12076.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot