Paper Reading AI Learner

Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning

2025-06-11 19:32:41
Daniel Lawson, Adriana Hugessen, Charlotte Cloutier, Glen Berseth, Khimya Khetarpal

Abstract

Behavioral cloning (BC) methods trained with supervised learning (SL) are an effective way to learn policies from human demonstrations in domains like robotics. Goal-conditioning these policies enables a single generalist policy to capture diverse behaviors contained within an offline dataset. While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally related states are encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. Hence, encouraging this temporal consistency in the representation space should facilitate combinatorial generalization. Successor representations, which encode the distribution of future states visited from the current state, nicely encapsulate this property. However, previous methods for learning successor representations have relied on contrastive samples, temporal-difference (TD) learning, or both. In this work, we propose a simple yet effective representation learning objective, $\text{BYOL-}\gamma$ augmented GCBC, which is not only able to theoretically approximate the successor representation in the finite MDP case without contrastive samples or TD learning, but also, results in competitive empirical performance across a suite of challenging tasks requiring combinatorial generalization.

Abstract (translated)

行为克隆(Behavioral Cloning,BC)方法通过监督学习(Supervised Learning,SL)训练,在机器人等领域从人类演示中学习策略是一种有效的方式。通过对这些策略进行目标条件设定,可以使单一的通用策略捕获离线数据集中包含的各种行为。虽然目标导向的行为克隆(Goal-Conditioned Behavior Cloning,GCBC)方法在分布内任务上表现良好,但它们并不能必然地零样本泛化到需要对新颖的状态-目标对进行条件设定的任务中,即组合型泛化。这一限制部分归因于由行为克隆学习得到的状态表示缺乏时间一致性;如果相关的时间状态被编码成相似的潜在表示,则对于新出现的状态-目标对分布外差距将会减小。因此,在表现空间中鼓励这种时间一致性应该有助于实现组合型泛化。后继表示,即从当前状态访问到未来状态分布的编码方式,完美地封装了这一属性。然而,以前用于学习后继表示的方法依赖于对比样本、时差(Temporal-Difference,TD)学习或两者兼而有之。 在这项工作中,我们提出了一种简单且有效的表征学习目标——$\text{BYOL-}\gamma$增强的GCBC方法。该方法不仅能够在有限马尔可夫决策过程(MDP)的情况下理论上逼近后继表示,并且不需要对比样本或TD学习,而且还能在一系列需要组合型泛化的具有挑战性的任务中表现出竞争性的实证性能。

URL

https://arxiv.org/abs/2506.10137

PDF

https://arxiv.org/pdf/2506.10137.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot