Paper Reading AI Learner

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

2024-04-18 15:21:34
Ye Tian, Baolin Peng, Linfeng Song, Lifeng Jin, Dian Yu, Haitao Mi, Dong Yu

Abstract

Despite the impressive capabilities of Large Language Models (LLMs) on various tasks, they still struggle with scenarios that involves complex reasoning and planning. Recent work proposed advanced prompting techniques and the necessity of fine-tuning with high-quality data to augment LLMs' reasoning abilities. However, these approaches are inherently constrained by data availability and quality. In light of this, self-correction and self-learning emerge as viable solutions, employing strategies that allow LLMs to refine their outputs and learn from self-assessed rewards. Yet, the efficacy of LLMs in self-refining its response, particularly in complex reasoning and planning task, remains dubious. In this paper, we introduce AlphaLLM for the self-improvements of LLMs, which integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop, thereby enhancing the capabilities of LLMs without additional annotations. Drawing inspiration from the success of AlphaGo, AlphaLLM addresses the unique challenges of combining MCTS with LLM for self-improvement, including data scarcity, the vastness search spaces of language tasks, and the subjective nature of feedback in language tasks. AlphaLLM is comprised of prompt synthesis component, an efficient MCTS approach tailored for language tasks, and a trio of critic models for precise feedback. Our experimental results in mathematical reasoning tasks demonstrate that AlphaLLM significantly enhances the performance of LLMs without additional annotations, showing the potential for self-improvement in LLMs.

Abstract (translated)

尽管大型语言模型(LLMs)在各种任务上的表现非常出色,但它们仍然很难处理涉及复杂推理和计划的情境。为了增强LLMs的推理能力,最新工作提出了先进的提示技术和高质量数据微调的必要性。然而,这些方法在数据可用性和质量方面存在固有限制。鉴于这一点,自纠正和自学习变得可行,并采用策略让LLMs优化其输出并从自我评估的奖励中学习。然而,LLM在自我校正其响应方面的有效性,尤其是在复杂推理和计划任务上,仍然存在争议。在本文中,我们引入AlphaLLM,用于LLM的自我改进,它将蒙特卡洛树搜索(MCTS)与LLM相结合,建立了一个自我改进循环,从而增强LLMs的功能,而无需增加注释。受到AlphaGo的成功启发,AlphaLLM解决了将MCTS与LLM相结合进行自我改进的独特挑战,包括数据稀缺性、语言任务的搜索空间巨大以及语言任务的反馈具有主观性。AlphaLLM由提示合成组件、专为语言任务高效的MCTS方法和三个批评模型组成。我们在数学推理任务上的实验结果表明,AlphaLLM显著增强了没有额外注释的LLM的性能,展示了LLM的自我改进潜力。

URL

https://arxiv.org/abs/2404.12253

PDF

https://arxiv.org/pdf/2404.12253.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot