Paper Reading AI Learner

Partial advantage estimator for proximal policy optimization

2023-01-26 03:42:39
Xiulei Song, Yizhao Jin, Greg Slabaugh, Simon Lucas

Abstract

Estimation of value in policy gradient methods is a fundamental problem. Generalized Advantage Estimation (GAE) is an exponentially-weighted estimator of an advantage function similar to $\lambda$-return. It substantially reduces the variance of policy gradient estimates at the expense of bias. In practical applications, a truncated GAE is used due to the incompleteness of the trajectory, which results in a large bias during estimation. To address this challenge, instead of using the entire truncated GAE, we propose to take a part of it when calculating updates, which significantly reduces the bias resulting from the incomplete trajectory. We perform experiments in MuJoCo and $\mu$RTS to investigate the effect of different partial coefficient and sampling lengths. We show that our partial GAE approach yields better empirical results in both environments.

Abstract (translated)

政策梯度方法中的价值估计是一个基本问题。Generalized Advantage Estimation (GAE) 是一种以指数加权方式估计类似于 $lambda$-return 的优势函数的方法。它在很大程度上减少了政策梯度估计的方差,而同时减少了偏差。在实际应用中,由于轨迹的不完整,通常会使用截断 GAE 进行估计,这会导致在估计过程中存在较大偏差。为了解决这个问题,我们提议在计算更新时仅使用部分截断 GAE,这在很大程度上减少了不完整轨迹所导致的偏差。我们在 MuJoCo 和 $mu$RTS 中进行实验,以研究不同 partial coefficient 和采样长度对实验结果的影响。我们表明,我们 partial GAE 方法在两个环境中都取得了更好的经验结果。

URL

https://arxiv.org/abs/2301.10920

PDF

https://arxiv.org/pdf/2301.10920.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot