Paper Reading AI Learner

Recursive Deep Inverse Reinforcement Learning

2025-04-17 17:39:35
Paul Ghanem, Michael Potter, Owen Howell, Pau Closas, Alireza Ramezani, Deniz Erdogmus, Robert Platt, Tales Imbiriba

Abstract

Inferring an adversary's goals from exhibited behavior is crucial for counterplanning and non-cooperative multi-agent systems in domains like cybersecurity, military, and strategy games. Deep Inverse Reinforcement Learning (IRL) methods based on maximum entropy principles show promise in recovering adversaries' goals but are typically offline, require large batch sizes with gradient descent, and rely on first-order updates, limiting their applicability in real-time scenarios. We propose an online Recursive Deep Inverse Reinforcement Learning (RDIRL) approach to recover the cost function governing the adversary actions and goals. Specifically, we minimize an upper bound on the standard Guided Cost Learning (GCL) objective using sequential second-order Newton updates, akin to the Extended Kalman Filter (EKF), leading to a fast (in terms of convergence) learning algorithm. We demonstrate that RDIRL is able to recover cost and reward functions of expert agents in standard and adversarial benchmark tasks. Experiments on benchmark tasks show that our proposed approach outperforms several leading IRL algorithms.

Abstract (translated)

从对手的行为推断其目标对于网络安全、军事和策略游戏中的反规划及非合作多智能体系统至关重要。基于最大熵原则的深度逆向强化学习(IRL)方法在恢复对手的目标方面展现出潜力,但这些方法通常为离线操作,需要大量批次大小以及使用梯度下降,并且依赖于一阶更新,这限制了它们在实时场景中的应用。我们提出了一种在线递归深度逆向强化学习(RDIRL)的方法来恢复控制对手行动和目标的成本函数。具体来说,我们通过类似扩展卡尔曼滤波器(EKF)的顺序二阶牛顿更新最小化标准引导成本学习(GCL)目标上的上限,从而获得一个快速收敛的学习算法。我们在标准和对抗性基准任务中展示了RDIRL能够恢复专家代理的成本和奖励函数的能力。在基准任务上的实验表明,我们提出的方法优于几种领先的IRL算法。 总结来说,这项研究提供了一种新的方法来实时地理解和预测对手的行为,在网络安全、军事策略以及多智能体系统等领域具有重要应用价值。

URL

https://arxiv.org/abs/2504.13241

PDF

https://arxiv.org/pdf/2504.13241.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot