Paper Reading AI Learner

PlaNet-Pick: Effective Cloth Flattening Based on Latent Dynamic Planning

2023-03-02 15:22:34
Halid Abdulrahim Kadi, Kasim Terzic

Abstract

Why do Recurrent State Space Models such as PlaNet fail at cloth manipulation tasks? Recent work has attributed this to the blurry reconstruction of the observation, which makes it difficult to plan directly in the latent space. This paper explores the reasons behind this by applying PlaNet in the pick-and-place cloth-flattening domain. We find that the sharp discontinuity of the transition function on the contour of the article makes it difficult to learn an accurate latent dynamic model. By adopting KL balancing and latent overshooting in the training loss and adjusting the planned picking position to the closest part of the cloth, we show that the updated PlaNet-Pick model can achieve state-of-the-art performance using latent MPC algorithms in simulation.

Abstract (translated)

为什么循环状态空间模型(如 PlaNet)在衣物操作任务中失败?最近的研究表明,这可能是由于观察的模糊重构导致的,这使得在潜在空间中直接计划变得困难。本论文通过在挑选和放置衣物平移领域的 PlaNet 应用来探索这个问题的原因。我们发现,文章轮廓上的导数函数的尖锐中断使学习准确的潜在动态模型变得困难。通过在训练损失中采用KL平衡和潜在过度估计,并将计划选取位置调整至衣物最接近的部分,我们表明,更新的 PlaNet-挑选模型可以使用潜在 MPC 算法在模拟中实现最先进的性能。

URL

https://arxiv.org/abs/2303.01345

PDF

https://arxiv.org/pdf/2303.01345.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot