Paper Reading AI Learner

State- and context-dependent robotic manipulation and grasping via uncertainty-aware imitation learning

2024-10-31 15:32:32
Tim R. Winter, Ashok M. Sundaram, Werner Friedl, Maximo A. Roa, Freek Stulp, Jo\~ao Silv\'erio

Abstract

Generating context-adaptive manipulation and grasping actions is a challenging problem in robotics. Classical planning and control algorithms tend to be inflexible with regard to parameterization by external variables such as object shapes. In contrast, Learning from Demonstration (LfD) approaches, due to their nature as function approximators, allow for introducing external variables to modulate policies in response to the environment. In this paper, we utilize this property by introducing an LfD approach to acquire context-dependent grasping and manipulation strategies. We treat the problem as a kernel-based function approximation, where the kernel inputs include generic context variables describing task-dependent parameters such as the object shape. We build on existing work on policy fusion with uncertainty quantification to propose a state-dependent approach that automatically returns to demonstrations, avoiding unpredictable behavior while smoothly adapting to context changes. The approach is evaluated against the LASA handwriting dataset and on a real 7-DoF robot in two scenarios: adaptation to slippage while grasping and manipulating a deformable food item.

Abstract (translated)

生成适应环境的操纵和抓取动作是机器人技术中的一个挑战性问题。传统的规划和控制算法在对外部变量(如物体形状)进行参数化时往往缺乏灵活性。相比之下,由于演示学习(LfD)方法本质上是函数近似器,它们允许引入外部变量来调节策略以应对环境变化。在这篇论文中,我们利用这一特性,采用一种演示学习方法来获取依赖于上下文的抓取和操纵策略。我们将问题视为基于核的函数逼近,其中核输入包括描述任务相关参数(如物体形状)的一般上下文变量。我们在现有政策融合及不确定性量化工作的基础上提出了一种状态相关的方案,该方案能自动返回演示,避免不可预测的行为,并平滑地适应环境变化。此方法在LASA手写数据集上进行了评估,并在一个真实的7自由度机器人上两种场景下得到了验证:抓取和操纵可变形食品项目时对打滑的适应性。

URL

https://arxiv.org/abs/2410.24035

PDF

https://arxiv.org/pdf/2410.24035.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot