Paper Reading AI Learner

Geometric Fabrics: a Safe Guiding Medium for Policy Learning

2024-05-03 17:07:45
Karl Van Wyk, Ankur Handa, Viktor Makoviychuk, Yijie Guo, Arthur Allshire, Nathan D. Ratliff

Abstract

Robotics policies are always subjected to complex, second order dynamics that entangle their actions with resulting states. In reinforcement learning (RL) contexts, policies have the burden of deciphering these complicated interactions over massive amounts of experience and complex reward functions to learn how to accomplish tasks. Moreover, policies typically issue actions directly to controllers like Operational Space Control (OSC) or joint PD control, which induces straightline motion towards these action targets in task or joint space. However, straightline motion in these spaces for the most part do not capture the rich, nonlinear behavior our robots need to exhibit, shifting the burden of discovering these behaviors more completely to the agent. Unlike these simpler controllers, geometric fabrics capture a much richer and desirable set of behaviors via artificial, second order dynamics grounded in nonlinear geometry. These artificial dynamics shift the uncontrolled dynamics of a robot via an appropriate control law to form behavioral dynamics. Behavioral dynamics unlock a new action space and safe, guiding behavior over which RL policies are trained. Behavioral dynamics enable bang-bang-like RL policy actions that are still safe for real robots, simplify reward engineering, and help sequence real-world, high-performance policies. We describe the framework more generally and create a specific instantiation for the problem of dexterous, in-hand reorientation of a cube by a highly actuated robot hand.

Abstract (translated)

机器人策略总是受到复杂、二级非线性动力学的交互作用,这些交互作用会它们的动作与结果状态紧密耦合。在强化学习(RL)背景下,策略有义务通过大量的经验和复杂的奖励函数来揭示这些复杂交互,从而学会完成任务。此外,策略通常直接向诸如操作空间控制(OSC)或联合体比例控制(JPC)等控制器发出动作,这导致在任务或关节空间中产生直线运动,指向这些动作目标。然而,这些空间中的直线运动在很大程度上并没有捕捉到机器人需要展现的丰富、非线性的行为,将发现这些行为的负担更多地转移给代理。 与这些简单的控制器不同,几何面料通过基于非线性几何的人工第二级动力学捕捉到一个更丰富、更具有吸引力的行为集合。这些人工动力学通过适当的控制律将机器人的无控制动力学转移到行为动态中。行为动态开辟了一个新的动作空间,并引导机器人通过RL策略进行安全、引导行为。行为动态简化了奖励工程,并帮助实现真实世界的高性能策略。 我们更一般地描述这个框架,并为掌握高度激活的机器人手进行立方体灵活调整的问题创建了一个具体的实例。

URL

https://arxiv.org/abs/2405.02250

PDF

https://arxiv.org/pdf/2405.02250.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot