Paper Reading AI Learner

DexDiffuser: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

2024-11-27 18:03:26
Zhixuan Liang, Yao Mu, Yixiao Wang, Fei Ni, Tianxing Chen, Wenqi Shao, Wei Zhan, Masayoshi Tomizuka, Ping Luo, Mingyu Ding

Abstract

Dexterous manipulation with contact-rich interactions is crucial for advanced robotics. While recent diffusion-based planning approaches show promise for simpler manipulation tasks, they often produce unrealistic ghost states (e.g., the object automatically moves without hand contact) or lack adaptability when handling complex sequential interactions. In this work, we introduce DexDiffuser, an interaction-aware diffusion planning framework for adaptive dexterous manipulation. DexDiffuser models joint state-action dynamics through a dual-phase diffusion process which consists of pre-interaction contact alignment and post-contact goal-directed control, enabling goal-adaptive generalizable dexterous manipulation. Additionally, we incorporate dynamics model-based dual guidance and leverage large language models for automated guidance function generation, enhancing generalizability for physical interactions and facilitating diverse goal adaptation through language cues. Experiments on physical interaction tasks such as door opening, pen and block re-orientation, and hammer striking demonstrate DexDiffuser's effectiveness on goals outside training distributions, achieving over twice the average success rate (59.2% vs. 29.5%) compared to existing methods. Our framework achieves 70.0% success on 30-degree door opening, 40.0% and 36.7% on pen and block half-side re-orientation respectively, and 46.7% on hammer nail half drive, highlighting its robustness and flexibility in contact-rich manipulation.

Abstract (translated)

灵巧操作以及丰富的接触交互对于高级机器人技术至关重要。尽管最近基于扩散的规划方法在简单的操作任务上显示出潜力,但它们经常产生不现实的“幽灵状态”(例如,物体在没有手部接触的情况下自动移动),或者在处理复杂的顺序交互时缺乏适应性。在这项工作中,我们介绍了DexDiffuser,这是一个用于自适应灵巧操作的、具备互动感知的扩散规划框架。DexDiffuser通过一个双阶段扩散过程来建模联合状态-动作动力学,该过程包括预接触对齐和后接触目标导向控制,从而实现目标自适应且可泛化的灵巧操作。此外,我们整合了基于动力学模型的双重指导,并利用大型语言模型生成自动指导函数,增强了物理交互的泛化能力并通过语言提示促进多样化的目标适应性。在诸如开门、铅笔和方块重新定向以及锤击等物理互动任务上的实验显示,DexDiffuser在外部分布的目标上表现出有效性,其平均成功率(59.2% vs 29.5%)是现有方法的两倍以上。我们的框架在30度开门任务中实现了70.0%的成功率,在铅笔和方块半侧重新定向任务中的成功率分别为40.0%和36.7%,在锤钉半驱动任务中达到了46.7%的成功率,这突显了其在丰富的接触操作中的稳健性和灵活性。

URL

https://arxiv.org/abs/2411.18562

PDF

https://arxiv.org/pdf/2411.18562.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot