Paper Reading AI Learner

Breaching the Bottleneck: Evolutionary Transition from Reward-Driven Learning to Reward-Agnostic Domain-Adapted Learning in Neuromodulated Neural Nets

2024-04-19 05:14:47
Solvi Arnold, Reiji Suzuki, Takaya Arita, Kimitoshi Yamazaki

Abstract

Advanced biological intelligence learns efficiently from an information-rich stream of stimulus information, even when feedback on behaviour quality is sparse or absent. Such learning exploits implicit assumptions about task domains. We refer to such learning as Domain-Adapted Learning (DAL). In contrast, AI learning algorithms rely on explicit externally provided measures of behaviour quality to acquire fit behaviour. This imposes an information bottleneck that precludes learning from diverse non-reward stimulus information, limiting learning efficiency. We consider the question of how biological evolution circumvents this bottleneck to produce DAL. We propose that species first evolve the ability to learn from reward signals, providing inefficient (bottlenecked) but broad adaptivity. From there, integration of non-reward information into the learning process can proceed via gradual accumulation of biases induced by such information on specific task domains. This scenario provides a biologically plausible pathway towards bottleneck-free, domain-adapted learning. Focusing on the second phase of this scenario, we set up a population of NNs with reward-driven learning modelled as Reinforcement Learning (A2C), and allow evolution to improve learning efficiency by integrating non-reward information into the learning process using a neuromodulatory update mechanism. On a navigation task in continuous 2D space, evolved DAL agents show a 300-fold increase in learning speed compared to pure RL agents. Evolution is found to eliminate reliance on reward information altogether, allowing DAL agents to learn from non-reward information exclusively, using local neuromodulation-based connection weight updates only.

Abstract (translated)

高级生物智能从信息丰富刺激信息的流中高效学习,即使对于行为质量的反馈稀少或缺失。这种学习利用了关于任务领域的隐含假设。我们将这种学习称为领域适应学习(DAL)。相比之下,人工智能学习算法依赖于明确的外部提供的行为质量指标来获得适应行为。这导致信息瓶颈,限制了学习从多样非奖励刺激信息中获取知识的能力,降低了学习效率。我们考虑生物进化如何绕过这一瓶颈,产生领域适应学习。我们提出,物种首先进化出从奖励信号中学习的能力,提供低效(瓶颈ed)但广泛的适应性。从那时开始,将非奖励信息整合到学习过程中可以通过渐进积累由这种信息引起的偏差来逐步进行。这种情况为无瓶颈、领域适应学习提供了生物学的合理途径。专注于这个情景的第二阶段,我们建立了一个由奖励驱动学习建模为强化学习(A2C)的NN种群,并使用神经调节更新机制逐步将非奖励信息整合到学习过程中,以提高学习效率。在连续2D空间中的导航任务中,进化的领域适应学习代理表现出比纯RL代理学习速度快300倍。进化被发现完全消除了对奖励信息的依赖,使领域适应学习代理能够仅从非奖励信息中学习,并通过局部神经调节基于连接权重更新来完成。

URL

https://arxiv.org/abs/2404.12631

PDF

https://arxiv.org/pdf/2404.12631.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot