Paper Reading AI Learner

Winning Solution of Real Robot Challenge III

2023-01-30 15:55:02
Qiang Wang, Robert McCarthy, David Cordova Bulens, Stephen J. Redmond

Abstract

This report introduces our winning solution of the real-robot phase of the Real Robot Challenge (RRC) 2022. The goal of this year's challenge is to solve dexterous manipulation tasks with offline reinforcement learning (RL) or imitation learning. To this end, participants are provided with datasets containing dozens of hours of robotic data. For each task an expert and a mixed dataset are provided. In our experiments, when learning from the expert datasets, we find standard Behavioral Cloning (BC) outperforms state-of-the-art offline RL algorithms. When learning from the mixed datasets, BC performs poorly, as expected, while surprisingly offline RL performs suboptimally, failing to match the average performance of the baseline model used for collecting the datasets. To remedy this, motivated by the strong performance of BC on the expert datasets we elect to use a semi-supervised classification technique to filter the subset of expert data out from the mixed datasets, and subsequently perform BC on this extracted subset of data. To further improve results, in all settings we use a simple data augmentation method that exploits the geometric symmetry of the RRC physical robotic environment. Our submitted BC policies each surpass the mean return of their respective raw datasets, and the policies trained on the filtered mixed datasets come close to matching the performances of those trained on the expert datasets.

Abstract (translated)

本报告介绍了我们在2022年真实机器人挑战(RRC)真实机器人阶段获胜的解决方案。今年的挑战目标是使用 offline reinforcement learning (RL) 或模仿学习解决复杂的操纵任务。为此,参与者被提供包含数十个小时的机器人数据的dataset。对于每个任务,一个专家数据和一个混合数据集都被提供。在我们的实验中,当从专家数据集中学习时,我们发现标准行为克隆(BC)比最先进的 offline RL 算法表现更好。当从混合数据集中学习时,BC表现很不好,正如我们所期望的,而 offline RL 算法的表现却很糟糕,无法与用于收集数据的基准模型的平均表现相匹配。为了解决这个问题,受到 BC 在专家数据集中表现强劲的激励,我们选择了一种半监督分类技术,从混合数据集中过滤出专家数据集的子集,然后进行 BC 处理。为了进一步改善结果,在所有设置中使用一种简单的数据增强方法,利用RRC物理机器人环境的空间对称。我们提交的 BC 政策每个都超过其各自原始数据集的平均回报,而训练在过滤后的混合数据集中的政策的表现几乎与训练在专家数据集中的政策的表现相同。

URL

https://arxiv.org/abs/2301.13019

PDF

https://arxiv.org/pdf/2301.13019.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot