Paper Reading AI Learner

Constrained Reinforcement Learning for Dexterous Manipulation

2023-01-24 00:31:28
Abhineet Jain, Jack Kolb, Harish Ravichandar

Abstract

Existing learning approaches to dexterous manipulation use demonstrations or interactions with the environment to train black-box neural networks that provide little control over how the robot learns the skills or how it would perform post training. These approaches pose significant challenges when implemented on physical platforms given that, during initial stages of training, the robot's behavior could be erratic and potentially harmful to its own hardware, the environment, or any humans in the vicinity. A potential way to address these limitations is to add constraints during learning that restrict and guide the robot's behavior during training as well as roll outs. Inspired by the success of constrained approaches in other domains, we investigate the effects of adding position-based constraints to a 24-DOF robot hand learning to perform object relocation using Constrained Policy Optimization. We find that a simple geometric constraint can ensure the robot learns to move towards the object sooner than without constraints. Further, training with this constraint requires a similar number of samples as its unconstrained counterpart to master the skill. These findings shed light on how simple constraints can help robots achieve sensible and safe behavior quickly and ease concerns surrounding hardware deployment. We also investigate the effects of the strictness of these constraints and report findings that provide insights into how different degrees of strictness affect learning outcomes. Our code is available at this https URL.

Abstract (translated)

现有的灵活操纵学习方法使用演示或与外部环境的互动来训练黑盒神经网络,这些方法对机器人学习技能或训练后表现的控制几乎没有。在实现于物理平台上时,这些方法提出了重大挑战,因为训练初期,机器人的行为可能不可预测,可能对其自身硬件、环境或周围任何人类造成潜在危害。一种可能的方法是在学习期间添加限制条件,限制并指导机器人的行为,同时推广。受到其他领域限制方法的成功启发,我们研究在24自由度机器人手学习对象移动中使用位置限制条件的影响。我们发现,一个简单的几何限制可以确保机器人比没有限制条件更快地移动到目标对象。此外,训练使用这个限制条件需要与它的无限制版本相同的样本数量以掌握技能。这些发现阐明了简单的限制如何帮助机器人快速实现明智和安全的行为,并减轻硬件部署方面的担心。我们还研究了这些限制条件的严格程度的影响,并报告了发现,提供了对这些限制不同严格程度的影响的理解。我们的代码可在以下httpsURL上可用。

URL

https://arxiv.org/abs/2301.09766

PDF

https://arxiv.org/pdf/2301.09766.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot