Paper Reading AI Learner

POMDP inference and robust solution via deep reinforcement learning: An application to railway optimal maintenance

2023-07-16 15:44:58
Giacomo Arcieri, Cyprien Hoelzl, Oliver Schwery, Daniel Straub, Konstantinos G. Papakonstantinou, Eleni Chatzi

Abstract

Partially Observable Markov Decision Processes (POMDPs) can model complex sequential decision-making problems under stochastic and uncertain environments. A main reason hindering their broad adoption in real-world applications is the lack of availability of a suitable POMDP model or a simulator thereof. Available solution algorithms, such as Reinforcement Learning (RL), require the knowledge of the transition dynamics and the observation generating process, which are often unknown and non-trivial to infer. In this work, we propose a combined framework for inference and robust solution of POMDPs via deep RL. First, all transition and observation model parameters are jointly inferred via Markov Chain Monte Carlo sampling of a hidden Markov model, which is conditioned on actions, in order to recover full posterior distributions from the available data. The POMDP with uncertain parameters is then solved via deep RL techniques with the parameter distributions incorporated into the solution via domain randomization, in order to develop solutions that are robust to model uncertainty. As a further contribution, we compare the use of transformers and long short-term memory networks, which constitute model-free RL solutions, with a model-based/model-free hybrid approach. We apply these methods to the real-world problem of optimal maintenance planning for railway assets.

Abstract (translated)

partiallyObservable Markov Decision Processes (POMDPs)可以在随机和不确定的环境中模拟复杂的Sequential决策问题。一个主要障碍是在现实世界中广泛采用POMDP模型或其模拟器的原因是缺乏适当的POMDP模型或其模拟器。可用的解决方案算法,如强化学习(RL),需要了解转移动态和观察生成过程的知识,这些往往 unknown 且难以推断。在这项工作中,我们提出了一种综合框架,通过深度强化学习来推断和 robust 解决方案 POMDPs。首先,通过 hidden Markov模型的马尔可夫链蒙特卡罗采样,联合推断所有转移和观察模型参数,以从可用数据中恢复完整的后验分布。对于具有不确定参数的POMDP,我们使用深度强化学习技术,通过领域随机化将参数分布集成到解决方案中,以开发 robust 的解决方案,以应对模型不确定性。作为进一步的贡献,我们比较了使用Transformers和长期短期记忆网络,组成了模型无关的强化学习解决方案,与基于模型/模型无关的混合方法。我们将这些方法应用于铁路资产最优维护计划的现实世界问题。

URL

https://arxiv.org/abs/2307.08082

PDF

https://arxiv.org/pdf/2307.08082.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot