Paper Reading AI Learner

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

2024-04-22 23:12:03
Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari

Abstract

The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. For example, with a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring $2\times$ fewer pre-training tokens. Diverging from prior practices that only provide model weights and inference code, and pre-train on private datasets, our release includes the complete framework for training and evaluation of the language model on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations. We also release code to convert models to MLX library for inference and fine-tuning on Apple devices. This comprehensive release aims to empower and strengthen the open research community, paving the way for future open research endeavors. Our source code along with pre-trained model weights and training recipes is available at \url{this https URL}. Additionally, \model models can be found on HuggingFace at: \url{this https URL}.

Abstract (translated)

可重复性和透明度是大型语言模型的关键,对于推动开放研究、确保结果的可信度以及数据和模型偏见的研究和潜在风险的调查都至关重要。为此,我们发布了OpenELM,一个最先进的开放式语言模型。OpenELM使用分层扩展策略,在变换器模型的每个层次上有效地分配参数,导致准确度得到提高。例如,在参数预算约为10亿个参数的情况下,OpenELM在OLMo上的准确度提高了2.36%,而需要2倍 fewer 的预训练标记符。 我们发布了一个完全框架,用于在公开可用的数据集上训练和评估语言模型,包括训练日志、多个检查点和预训练配置。我们还发布了将模型转换为MLX库用于推理和微调的代码,以便在苹果设备上进行使用。这个全面的发布旨在增强和加强开放研究社区,为未来的开放研究探索奠定基础。我们的源代码以及预训练模型权重和训练 recipe都可以在[这个链接](https://github.com/yourggroup/openelm)上找到。此外,模型的预训练模型可以在HuggingFace找到:[这个链接](https://github.com/yourggroup/openelm)。

URL

https://arxiv.org/abs/2404.14619

PDF

https://arxiv.org/pdf/2404.14619.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot