Paper Reading AI Learner

Optimal Decision Tree Policies for Markov Decision Processes

2023-01-30 18:51:02
Daniël Vos, Sicco Verwer

Abstract

Interpretability of reinforcement learning policies is essential for many real-world tasks but learning such interpretable policies is a hard problem. Particularly rule-based policies such as decision trees and rules lists are difficult to optimize due to their non-differentiability. While existing techniques can learn verifiable decision tree policies there is no guarantee that the learners generate a decision that performs optimally. In this work, we study the optimization of size-limited decision trees for Markov Decision Processes (MPDs) and propose OMDTs: Optimal MDP Decision Trees. Given a user-defined size limit and MDP formulation OMDT directly maximizes the expected discounted return for the decision tree using Mixed-Integer Linear Programming. By training optimal decision tree policies for different MDPs we empirically study the optimality gap for existing imitation learning techniques and find that they perform sub-optimally. We show that this is due to an inherent shortcoming of imitation learning, namely that complex policies cannot be represented using size-limited trees. In such cases, it is better to directly optimize the tree for expected return. While there is generally a trade-off between the performance and interpretability of machine learning models, we find that OMDTs limited to a depth of 3 often perform close to the optimal limit.

Abstract (translated)

强化学习政策的可解释性对于许多实际任务是至关重要的,但学习这样的可解释性政策是一个困难的问题。特别是基于规则的政策,如决策树和规则列表,由于它们的不变性难以优化。虽然现有的技术可以学习可验证的决策树政策,但没有任何保证学习的是最优的决策。在这项工作中,我们研究了基于限制的决策树优化对于马尔可夫决策过程(MPDs)的实现,并提出了ODMTTS:最优MDP决策树。给定一个用户定义的大小限制和MDP的构建,ODMTTS使用混合整数线性 Programming直接最大化决策树的期望折扣回报。通过训练不同的MDP的最优决策树政策,我们经验证了现有模仿学习技术的性能差距,并发现它们的表现不如最优。我们表明,这是模仿学习固有的缺点所导致的,即复杂的政策不能用大小限制的树来表示。在这种情况下,最好直接优化树的期望回报。虽然机器学习模型的性能与解释性之间存在权衡,但我们发现,ODMTTS限制在深度为3时通常表现接近最优极限。

URL

https://arxiv.org/abs/2301.13185

PDF

https://arxiv.org/pdf/2301.13185.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot