Paper Reading AI Learner

Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs

2024-04-29 14:53:48
Bahar Radmehr, Adish Singla, Tanja Käser

Abstract

There has been a growing interest in developing learner models to enhance learning and teaching experiences in educational environments. However, existing works have primarily focused on structured environments relying on meticulously crafted representations of tasks, thereby limiting the agent's ability to generalize skills across tasks. In this paper, we aim to enhance the generalization capabilities of agents in open-ended text-based learning environments by integrating Reinforcement Learning (RL) with Large Language Models (LLMs). We investigate three types of agents: (i) RL-based agents that utilize natural language for state and action representations to find the best interaction strategy, (ii) LLM-based agents that leverage the model's general knowledge and reasoning through prompting, and (iii) hybrid LLM-assisted RL agents that combine these two strategies to improve agents' performance and generalization. To support the development and evaluation of these agents, we introduce PharmaSimText, a novel benchmark derived from the PharmaSim virtual pharmacy environment designed for practicing diagnostic conversations. Our results show that RL-based agents excel in task completion but lack in asking quality diagnostic questions. In contrast, LLM-based agents perform better in asking diagnostic questions but fall short of completing the task. Finally, hybrid LLM-assisted RL agents enable us to overcome these limitations, highlighting the potential of combining RL and LLMs to develop high-performing agents for open-ended learning environments.

Abstract (translated)

近年来,在教育环境中开发学习者模型的兴趣逐渐增加。然而,现有的作品主要关注于依赖于精心设计的任务表示结构的场景,从而限制了代理在任务间泛化技能的能力。在本文中,我们旨在通过将强化学习(RL)与大型语言模型(LLM)相结合,提高开放性文本基学习环境中代理的泛化能力。我们研究了三种类型的代理:基于RL的代理(i)利用自然语言进行状态和动作表示来寻找最佳交互策略,基于LLM的代理(ii)利用模型的通用知识和推理通过提示,以及基于LLM辅助的RL代理(iii)结合这两种策略来提高代理的表现和泛化能力。为了支持这些代理的开发和评估,我们引入了PharmaSimText,一种基于PharmaSim虚拟药房环境的全新基准,用于练习诊断对话。我们的结果表明,基于RL的代理在任务完成方面表现优异,但在提出高质量诊断问题方面存在不足。相反,基于LLM的代理在提出高质量诊断问题方面表现更好,但未能完成任务。最后,混合LLM辅助的RL代理使我们能够克服这些限制,突出将RL和LLM结合以开发高性能代理的可能性,从而为开放性学习环境中的代理开发提供了新的思路。

URL

https://arxiv.org/abs/2404.18978

PDF

https://arxiv.org/pdf/2404.18978.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot