Paper Reading AI Learner

SMART: Self-supervised Multi-task pretrAining with contRol Transformers

2023-01-24 05:01:23
Yanchao Sun, Shuang Ma, Ratnesh Madaan, Rogerio Bonatti, Furong Huang, Ashish Kapoor

Abstract

Self-supervised pretraining has been extensively studied in language and vision domains, where a unified model can be easily adapted to various downstream tasks by pretraining representations without explicit labels. When it comes to sequential decision-making tasks, however, it is difficult to properly design such a pretraining approach that can cope with both high-dimensional perceptual information and the complexity of sequential control over long interaction horizons. The challenge becomes combinatorially more complex if we want to pretrain representations amenable to a large variety of tasks. To tackle this problem, in this work, we formulate a general pretraining-finetuning pipeline for sequential decision making, under which we propose a generic pretraining framework \textit{Self-supervised Multi-task pretrAining with contRol Transformer (SMART)}. By systematically investigating pretraining regimes, we carefully design a Control Transformer (CT) coupled with a novel control-centric pretraining objective in a self-supervised manner. SMART encourages the representation to capture the common essential information relevant to short-term control and long-term control, which is transferrable across tasks. We show by extensive experiments in DeepMind Control Suite that SMART significantly improves the learning efficiency among seen and unseen downstream tasks and domains under different learning scenarios including Imitation Learning (IL) and Reinforcement Learning (RL). Benefiting from the proposed control-centric objective, SMART is resilient to distribution shift between pretraining and finetuning, and even works well with low-quality pretraining datasets that are randomly collected.

Abstract (translated)

自监督预训练在语言和视觉领域被广泛研究,其中一个统一模型可以轻松适应各种后续任务,通过无显式标签的预训练表示进行。然而,当涉及到序列决策任务时,很难适当地设计一种能够应对高维感知信息以及长期交互 horizon 下的序列控制复杂性的预训练方法。如果我们想要适应多种任务的预训练表示,那么预训练挑战就会变得更加组合复杂。为了解决这一问题,在本研究中,我们提出了一种通用的预训练-微调管道,用于序列决策任务,在该管道中,我们提出了一个通用的预训练框架 extit{Self-supervised Multi-task pretrAining withContRol Transformer (SMART)}。通过系统研究预训练阶段,我们 carefully 设计了一个控制Transformer(CT)并提出了一种新的控制中心预训练目标。SMART鼓励表示捕获与短期控制和长期控制相关的常见关键信息,这些信息可以跨任务共享。我们在DeepMind控制 Suite中进行广泛的实验,证明SMART在可见和不可见的后续任务和领域、包括模仿学习(IL)和强化学习(RL)的不同学习场景下,显著提高了学习效率。得益于所提出的控制中心预训练目标,SMART能够抵御预训练和微调之间的分布 shift,甚至与随机收集的质量较低的预训练数据集良好工作。

URL

https://arxiv.org/abs/2301.09816

PDF

https://arxiv.org/pdf/2301.09816.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot