Paper Reading AI Learner

On the Effects of Fine-tuning Language Models for Text-Based Reinforcement Learning

2024-04-15 23:05:57
Mauricio Gruppi, Soham Dan, Keerthiram Murugesan, Subhajit Chaudhury

Abstract

Text-based reinforcement learning involves an agent interacting with a fictional environment using observed text and admissible actions in natural language to complete a task. Previous works have shown that agents can succeed in text-based interactive environments even in the complete absence of semantic understanding or other linguistic capabilities. The success of these agents in playing such games suggests that semantic understanding may not be important for the task. This raises an important question about the benefits of LMs in guiding the agents through the game states. In this work, we show that rich semantic understanding leads to efficient training of text-based RL agents. Moreover, we describe the occurrence of semantic degeneration as a consequence of inappropriate fine-tuning of language models in text-based reinforcement learning (TBRL). Specifically, we describe the shift in the semantic representation of words in the LM, as well as how it affects the performance of the agent in tasks that are semantically similar to the training games. We believe these results may help develop better strategies to fine-tune agents in text-based RL scenarios.

Abstract (translated)

基于文本的强化学习涉及一个智能体使用观察到的文本和可允许的动作与虚构环境交互以完成任务。以前的工作表明,即使缺乏语义理解或其他语言能力,基于文本的交互环境中的智能体也可以成功。这些智能体在玩这类游戏中的成功表明,语义理解可能不是任务完成所必需的。这引发了一个重要的问题,即自然语言处理(NLP)模型在引导智能体通过游戏状态方面的优势。在这项工作中,我们证明了丰富的语义理解会导致基于文本的强化学习(TBRL)智能体的有效训练。此外,我们描述了语义退化作为文本基于强化学习(TBRL)中不合适的语言模型微调的结果。具体来说,我们描述了LM中单词语义表示的转移,以及它如何影响与训练游戏相似任务的智能体表现。我们相信,这些结果有助于在文本基于强化学习场景中开发更好的微调策略。

URL

https://arxiv.org/abs/2404.10174

PDF

https://arxiv.org/pdf/2404.10174.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot