Paper Reading AI Learner

OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data

2024-04-18 13:57:18
Chandeepa Dissanayake, Lahiru Lowe, Sachith Gunasekara, Yasiru Ratnayake

Abstract

Instruction fine-tuning pretrained LLMs for diverse downstream tasks has demonstrated remarkable success and has captured the interest of both academics and practitioners. To ensure such fine-tuned LLMs align with human preferences, techniques such as RLHF and DPO have emerged. At the same time, there is increasing interest in smaller parameter counts for models. In this work, using OpenLLaMA 3Bv2 as a base model, we describe the recipe used to fine-tune the OpenBezoar family of models. In this recipe: We first generate synthetic instruction fine-tuning data using an open and commercially non-restrictive instruction fine-tuned variant of the Falcon-40B model under three schemes based on: LaMini-LM, WizardLM/Evol-Instruct (with databricks-dolly-15k as a seed dataset) and Orca (with the Flan Collection as a seed dataset), then filter these generations using GPT-4 as a human proxy. We then perform cost-effective QLoRA-based supervised fine-tuning sequentially with each scheme. The resulting checkpoint is further fine-tuned with a subset of the HH-RLHF dataset to minimize distribution shift prior to using the DPO loss to obtain the final checkpoint. Evaluation is done with the LM Eval Harness tasks/metrics as well as on MT-Bench using the "LLM-as-a-judge" framework with Claude 2.1, with the finding that the final checkpoint, "OpenBezoar-HH-RLHF-DPO", demonstrates superior performance over many models at the 3B parameter scale, even outperforming the top model in one of the categories on the Huggingface Open LLM Leaderboard. We release "OpenBezoar-SFT", "OpenBezoar-HH-RLHF-SFT", "OpenBezoar-HH-RLHF-DPO" checkpoints, alongside our generated datasets on HuggingFace at this https URL and our codebase at this https URL.

Abstract (translated)

翻译:对 diverse下游任务的指令微调预训练语言模型已经取得了显著的成功,并吸引了学术界和实践界的广泛关注。为了确保微调后的 LLM 符合人类偏好,出现了诸如 RLHF 和 DPO 等技术。与此同时,对于模型参数数量的需求也在增加。在这项工作中,我们使用 OpenLLaMA 3Bv2 作为基础模型,描述了用于微调 OpenBezoar 模型的食谱。在这个食谱中: 我们首先使用一个基于 Falcon-40B 模型,在三个方案(基于 LaMini-LM、WizardLM/Evol-Instruct(使用 databricks-dolly-15k 作为 seed 数据集)和 Orca(使用 Flan Collection 作为 seed 数据集)下生成合成指令微调数据,然后使用 GPT-4 作为人类代理过滤这些世代。接着,我们使用每个方案的成本效益 QLoRA 进行逐步微调。得到的最终checkpoint 进一步通过 HH-RLHF 子集的微调来最小化在使用 DPO 损失之前分布漂移。使用LM Eval Harness任务/指标以及MT-Bench 使用 "LLM-as-a-judge"框架对Claude 2.1进行评估。结果表明,在3B参数级别,"OpenBezoar-HH-RLHF-DPO" 显示的性能优于许多模型,即使在其中一个分类上,也超过了该分类中的顶级模型。我们发布了 "OpenBezoar-SFT"、"OpenBezoar-HH-RLHF-SFT" 和 "OpenBezoar-HH-RLHF-DPO" 检查点,这些检查点与我们的生成数据一起存放在 HuggingFace 的这个链接:https://www.huggingface.co/openbezoar-sft/。

URL

https://arxiv.org/abs/2404.12195

PDF

https://arxiv.org/pdf/2404.12195.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot