Paper Reading AI Learner

Out-of-Distribution Adaptation in Offline RL: Counterfactual Reasoning via Causal Normalizing Flows

2024-05-06 22:44:32
Minjae Cho, Jonathan P. How, Chuangchuang Sun

Abstract

Despite notable successes of Reinforcement Learning (RL), the prevalent use of an online learning paradigm prevents its widespread adoption, especially in hazardous or costly scenarios. Offline RL has emerged as an alternative solution, learning from pre-collected static datasets. However, this offline learning introduces a new challenge known as distributional shift, degrading the performance when the policy is evaluated on scenarios that are Out-Of-Distribution (OOD) from the training dataset. Most existing offline RL resolves this issue by regularizing policy learning within the information supported by the given dataset. However, such regularization overlooks the potential for high-reward regions that may exist beyond the dataset. This motivates exploring novel offline learning techniques that can make improvements beyond the data support without compromising policy performance, potentially by learning causation (cause-and-effect) instead of correlation from the dataset. In this paper, we propose the MOOD-CRL (Model-based Offline OOD-Adapting Causal RL) algorithm, which aims to address the challenge of extrapolation for offline policy training through causal inference instead of policy-regularizing methods. Specifically, Causal Normalizing Flow (CNF) is developed to learn the transition and reward functions for data generation and augmentation in offline policy evaluation and training. Based on the data-invariant, physics-based qualitative causal graph and the observational data, we develop a novel learning scheme for CNF to learn the quantitative structural causal model. As a result, CNF gains predictive and counterfactual reasoning capabilities for sequential decision-making tasks, revealing a high potential for OOD adaptation. Our CNF-based offline RL approach is validated through empirical evaluations, outperforming model-free and model-based methods by a significant margin.

Abstract (translated)

尽管强化学习(RL)取得了显著的成功,但在线学习范式普遍使用导致其广泛应用受限,尤其是在危险或昂贵的情景中。离线强化学习(Offline RL)作为一种替代方案应运而生,通过预先收集的静态数据集进行学习。然而,这种离线学习引入了一个名为分布平滑的新挑战,当策略在训练数据集之外的场景上评估时,会降低其性能。为解决此问题,大多数现有的离线强化学习方法通过在给定数据集支持的范围内对策略进行规范化来解决。然而,这种规范化方法忽视了数据之外可能存在高奖励区域的事实。因此,探索新的离线学习方法具有提高数据支持下的策略性能而不会牺牲策略性能潜力,通过从数据中学习因果关系(原因和结果)来解决此问题。在本文中,我们提出了MOOD-CRL(基于模型的离线OUD适应因果RL)算法,旨在通过因果推理而不是策略规范化方法来解决离线策略训练的扩展问题。具体来说,我们开发了因果正常化流(CNF)来学习数据生成和增强在离线策略评估和训练中的转移和奖励函数。基于数据无关、基于物理的定性因果图和观测数据,我们为CNF开发了一种新的学习方案,以学习量化结构因果模型。因此,CNF在序列决策任务中获得了预测和反事实推理能力,揭示了其在大数据迁移方面的巨大潜力。我们的基于CNF的离线强化学习方法通过实证评估证明了比模型免费和基于模型的方法具有显著的优越性。

URL

https://arxiv.org/abs/2405.03892

PDF

https://arxiv.org/pdf/2405.03892.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot