Abstract
Elastic Decision Transformers (EDTs) have proved to be particularly successful in offline reinforcement learning, offering a flexible framework that unifies sequence modeling with decision-making under uncertainty. Recent research has shown that incorporating intrinsic motivation mechanisms into EDTs improves performance across exploration tasks, yet the representational mechanisms underlying these improvements remain unexplored. In this paper, we introduce a systematic post-hoc explainability framework to analyze how intrinsic motivation shapes learned embeddings in EDTs. Through statistical analysis of embedding properties (including covariance structure, vector magnitudes, and orthogonality), we reveal that different intrinsic motivation variants create fundamentally different representational structures. Our analysis demonstrates environment-specific correlation patterns between embedding metrics and performance that explain why intrinsic motivation improves policy learning. These findings show that intrinsic motivation operates beyond simple exploration bonuses, acting as a representational prior that shapes embedding geometry in biologically plausible ways, creating environment-specific organizational structures that facilitate better decision-making.
Abstract (translated)
弹性决策变换器(EDTs)在离线强化学习中已被证明特别成功,它们提供了一个灵活的框架,将序列建模与不确定性条件下的决策制定统一起来。最近的研究表明,在EDTs中加入内在动机机制可以提高探索任务中的性能,然而这些改进背后的表示机制仍不清楚。在这篇论文中,我们引入了一种系统性的事后可解释性框架来分析内在动机如何塑造EDTs中的学习嵌入。通过统计分析嵌入属性(包括协方差结构、向量幅度和正交性),我们揭示了不同的内在动机变体创建出根本上不同的表示结构。我们的分析表明,特定环境下的嵌入度量与性能之间的相关模式解释了为什么内在动机能够改善策略学习。 这些发现表明,内在动机的作用不仅限于简单的探索奖励机制,它还作为一种表征先验知识,在生物学上合理的方式塑造嵌入几何形状,并创建出适合不同环境的组织结构,从而促进更好的决策制定。
URL
https://arxiv.org/abs/2506.13958