Paper Reading AI Learner

Weak-to-Strong Extrapolation Expedites Alignment

2024-04-25 17:39:50
Chujie Zheng, Ziqi Wang, Heng Ji, Minlie Huang, Nanyun Peng

Abstract

Although the capabilities of large language models (LLMs) ideally scale up with increasing data and compute, they are inevitably constrained by limited resources in reality. Suppose we have a moderately trained LLM (e.g., trained to align with human preference) in hand, can we further exploit its potential and cheaply acquire a stronger model? In this paper, we propose a simple method called ExPO to boost LLMs' alignment with human preference. ExPO assumes that a medium-aligned model can be interpolated between a less-aligned (weaker) model, e.g., the initial SFT model, and a better-aligned (stronger) one, thereby directly obtaining this stronger model by extrapolating from the weights of the former two relatively weaker models. On the AlpacaEval 2.0 benchmark, we show that ExPO pushes models trained with less preference data (e.g., 10% or 20%) to reach and even surpass the fully-trained one, without any additional training. Furthermore, ExPO also significantly improves off-the-shelf DPO/RLHF models and exhibits decent scalability across model sizes from 7B to 70B. Our work demonstrates the efficacy of model extrapolation in exploiting LLMs' capabilities, suggesting a promising direction that deserves future exploration.

Abstract (translated)

尽管大型语言模型(LLMs)在理想情况下能够随着数据和计算能力的增加而扩展其能力,但它们在现实中受到有限资源的限制。假设我们手中有一个适度训练的LLM(例如,训练以与人类偏好对齐),我们能否进一步发掘其潜力并以较低的成本获得更强的模型?在本文中,我们提出了一个简单的方法叫做ExPO,用于提高LLMs与人类偏好的对齐程度。ExPO假设一个中庸对齐的模型可以平滑地存在于一个较不满意的(较弱)模型和更好对齐的(较强)模型之间,从而通过从这两个较弱模型的权重中进行扩展直接获得这个更强的模型。在AlpacaEval 2.0基准上,我们证明了ExPO将偏好数据较少的模型(例如10%或20%)推向并甚至超过完全训练的模型,而没有任何额外的训练。此外,ExPO还显著地改善了标准DPO/RLHF模型,并在模型规模从7B到70B时表现出良好的可扩展性。我们的工作表明,模型扩展在利用LLM的能力方面具有有效性,为未来的探索提供了一个有前景的方向。

URL

https://arxiv.org/abs/2404.16792

PDF

https://arxiv.org/pdf/2404.16792.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot