Paper Reading AI Learner

Conditional Variational Auto Encoder Based Dynamic Motion for Multi-task Imitation Learning

2024-05-24 06:50:14
Binzhao Xu, Muhayy Ud Din, Irfan Hussain

Abstract

The dynamic motion primitive-based (DMP) method is an effective method of learning from demonstrations. However, most of the current DMP-based methods focus on learning one task with one module. Although, some deep learning-based frameworks can learn to multi-task at the same time. However, those methods require a large number of training data and have limited generalization of the learned behavior to the untrained state. In this paper, we propose a framework that combines the advantages of the traditional DMP-based method and conditional variational auto-encoder (CVAE). The encoder and decoder are made of a dynamic system and deep neural network. Deep neural networks are used to generate torque conditioned on the task ID. Then, this torque is used to create the desired trajectory in the dynamic system based on the final state. In this way, the generated tractory can adjust to the new goal position. We also propose a finetune method to guarantee the via-point constraint. Our model is trained on the handwriting number dataset and can be used to solve robotic tasks -- reaching and pushing directly. The proposed model is validated in the simulation environment. The results show that after training on the handwriting number dataset, it achieves a 100\% success rate on pushing and reaching tasks.

Abstract (translated)

动态运动原型为基础(DMP)方法是一种有效的学习演示的方法。然而,大多数现有的DMP方法都关注于学习一个任务的一个模块。虽然一些基于深度学习的框架可以同时学习多个任务,但这些方法需要大量的训练数据,并且对学习到的行为在未训练状态下的泛化能力有限。在本文中,我们提出了一个结合传统DMP方法和条件变分自编码器(CVAE)的优势的框架。编码器和解码器由动态系统和深度神经网络组成。深度神经网络用于根据任务ID生成扭矩。然后,这个扭矩用于在动态系统中根据最终状态创建所需轨迹。这样,生成的轨迹可以调整到新的目标位置。我们还提出了一个微调方法来确保局部约束。我们的模型在手写数字数据集上进行训练,可以用于解决机器人任务——到达和推动。所提出的模型在仿真环境中进行了验证。结果表明,在训练在手写数字数据集后,它在推和拉任务上获得了100%的成功率。

URL

https://arxiv.org/abs/2405.15266

PDF

https://arxiv.org/pdf/2405.15266.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot