Paper Reading AI Learner

Quality with Just Enough Diversity in Evolutionary Policy Search

2024-05-07 13:33:36
Paul Templier, Luca Grillotti, Emmanuel Rachelson, Dennis G. Wilson, Antoine Cully

Abstract

Evolution Strategies (ES) are effective gradient-free optimization methods that can be competitive with gradient-based approaches for policy search. ES only rely on the total episodic scores of solutions in their population, from which they estimate fitness gradients for their update with no access to true gradient information. However this makes them sensitive to deceptive fitness landscapes, and they tend to only explore one way to solve a problem. Quality-Diversity methods such as MAP-Elites introduced additional information with behavior descriptors (BD) to return a population of diverse solutions, which helps exploration but leads to a large part of the evaluation budget not being focused on finding the best performing solution. Here we show that behavior information can also be leveraged to find the best policy by identifying promising search areas which can then be efficiently explored with ES. We introduce the framework of Quality with Just Enough Diversity (JEDi) which learns the relationship between behavior and fitness to focus evaluations on solutions that matter. When trying to reach higher fitness values, JEDi outperforms both QD and ES methods on hard exploration tasks like mazes and on complex control problems with large policies.

Abstract (translated)

进化策略(ES)是一种有效的无需梯度的优化方法,在策略搜索中可以与基于梯度的方法竞争。ES仅依赖于其种群中解决方案的全面状态,然后根据这些状态估计更新时的适应度梯度,而无需访问真实的梯度信息。然而,这使得它们对欺骗性的 fitness 景观敏感,并且它们倾向于只探索一个问题。诸如 MAP-Elites 这样的质量多样性方法通过引入行为描述符(BD)为种群返回了一个多样性的解决方案,这有助于探索,但导致评估预算的大部分没有集中在找到最佳策略上。我们在这里展示,行为信息也可以用于通过确定有前景的搜索区域来找到最佳策略。我们引入了 Quality with Just Enough Diversity (JEDi) 框架,该框架学会了行为和适应度之间的关系,将评估重点放在解决方案上。在尝试达到更高的 fitness 值时,JEDi 超越了 QD 和 ES 方法在具有挑战性的探索任务(如迷宫)和复杂控制问题(具有大量策略)上的表现。

URL

https://arxiv.org/abs/2405.04308

PDF

https://arxiv.org/pdf/2405.04308.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot