Paper Reading AI Learner

KineDex: Learning Tactile-Informed Visuomotor Policies via Kinesthetic Teaching for Dexterous Manipulation

2025-05-04 02:54:13
Di Zhang, Chengbo Yuan, Chuan Wen, Hai Zhang, Junqiao Zhao, Yang Gao

Abstract

Collecting demonstrations enriched with fine-grained tactile information is critical for dexterous manipulation, particularly in contact-rich tasks that require precise force control and physical interaction. While prior works primarily focus on teleoperation or video-based retargeting, they often suffer from kinematic mismatches and the absence of real-time tactile feedback, hindering the acquisition of high-fidelity tactile data. To mitigate this issue, we propose KineDex, a hand-over-hand kinesthetic teaching paradigm in which the operator's motion is directly transferred to the dexterous hand, enabling the collection of physically grounded demonstrations enriched with accurate tactile feedback. To resolve occlusions from human hand, we apply inpainting technique to preprocess the visual observations. Based on these demonstrations, we then train a visuomotor policy using tactile-augmented inputs and implement force control during deployment for precise contact-rich manipulation. We evaluate KineDex on a suite of challenging contact-rich manipulation tasks, including particularly difficult scenarios such as squeezing toothpaste onto a toothbrush, which require precise multi-finger coordination and stable force regulation. Across these tasks, KineDex achieves an average success rate of 74.4%, representing a 57.7% improvement over the variant without force control. Comparative experiments with teleoperation and user studies further validate the advantages of KineDex in data collection efficiency and operability. Specifically, KineDex collects data over twice as fast as teleoperation across two tasks of varying difficulty, while maintaining a near-100% success rate, compared to under 50% for teleoperation.

Abstract (translated)

收集包含精细触觉信息的演示对于灵巧操作至关重要,尤其是在需要精确力控制和物理互动的接触密集型任务中。虽然先前的工作主要集中在远程操作或基于视频的目标重定位上,但它们通常会因运动学不匹配以及缺乏实时触觉反馈而受到限制,从而阻碍了高质量触觉数据的采集。为了解决这个问题,我们提出了KineDex,一种手把手运动教学范式,在该范式中,操作者的动作直接转移到灵巧的手上,使得能够收集富含准确触觉反馈的物理基础演示。 为了处理人类手部引起的遮挡问题,我们将修复技术应用于视觉观察数据的预处理。基于这些演示,我们使用包含增强触觉输入的数据来训练视动政策,并在部署过程中实现力控制,以进行精确的接触密集型操作。我们在一系列具有挑战性的接触密集型任务上评估了KineDex,包括挤牙膏到牙刷这种特别困难的情况,这需要多指协调和稳定的力度调节。 在这些任务中,KineDex实现了74.4%的成功率,比没有力控制的变体提高了57.7%。与远程操作进行对比实验以及用户研究进一步验证了KineDex在数据收集效率和可操作性方面的优势。具体而言,在两个难度不同的任务上,KineDex的数据采集速度几乎是远程操作的两倍,并保持接近100%的成功率,而远程操作则低于50%。

URL

https://arxiv.org/abs/2505.01974

PDF

https://arxiv.org/pdf/2505.01974.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot