Paper Reading AI Learner

Toward Explainable Offline RL: Analyzing Representations in Intrinsically Motivated Decision Transformers

2025-06-16 20:01:24
Leonardo Guiducci, Antonio Rizzo, Giovanna Maria Dimitri

Abstract

Elastic Decision Transformers (EDTs) have proved to be particularly successful in offline reinforcement learning, offering a flexible framework that unifies sequence modeling with decision-making under uncertainty. Recent research has shown that incorporating intrinsic motivation mechanisms into EDTs improves performance across exploration tasks, yet the representational mechanisms underlying these improvements remain unexplored. In this paper, we introduce a systematic post-hoc explainability framework to analyze how intrinsic motivation shapes learned embeddings in EDTs. Through statistical analysis of embedding properties (including covariance structure, vector magnitudes, and orthogonality), we reveal that different intrinsic motivation variants create fundamentally different representational structures. Our analysis demonstrates environment-specific correlation patterns between embedding metrics and performance that explain why intrinsic motivation improves policy learning. These findings show that intrinsic motivation operates beyond simple exploration bonuses, acting as a representational prior that shapes embedding geometry in biologically plausible ways, creating environment-specific organizational structures that facilitate better decision-making.

Abstract (translated)

弹性决策变换器(EDTs)在离线强化学习中已被证明特别成功,它们提供了一个灵活的框架,将序列建模与不确定性条件下的决策制定统一起来。最近的研究表明,在EDTs中加入内在动机机制可以提高探索任务中的性能,然而这些改进背后的表示机制仍不清楚。在这篇论文中,我们引入了一种系统性的事后可解释性框架来分析内在动机如何塑造EDTs中的学习嵌入。通过统计分析嵌入属性(包括协方差结构、向量幅度和正交性),我们揭示了不同的内在动机变体创建出根本上不同的表示结构。我们的分析表明,特定环境下的嵌入度量与性能之间的相关模式解释了为什么内在动机能够改善策略学习。 这些发现表明,内在动机的作用不仅限于简单的探索奖励机制,它还作为一种表征先验知识,在生物学上合理的方式塑造嵌入几何形状,并创建出适合不同环境的组织结构,从而促进更好的决策制定。

URL

https://arxiv.org/abs/2506.13958

PDF

https://arxiv.org/pdf/2506.13958.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot