Abstract
The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. For example, with a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring $2\times$ fewer pre-training tokens. Diverging from prior practices that only provide model weights and inference code, and pre-train on private datasets, our release includes the complete framework for training and evaluation of the language model on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations. We also release code to convert models to MLX library for inference and fine-tuning on Apple devices. This comprehensive release aims to empower and strengthen the open research community, paving the way for future open research endeavors. Our source code along with pre-trained model weights and training recipes is available at \url{this https URL}. Additionally, \model models can be found on HuggingFace at: \url{this https URL}.
Abstract (translated)
可重复性和透明度是大型语言模型的关键,对于推动开放研究、确保结果的可信度以及数据和模型偏见的研究和潜在风险的调查都至关重要。为此,我们发布了OpenELM,一个最先进的开放式语言模型。OpenELM使用分层扩展策略,在变换器模型的每个层次上有效地分配参数,导致准确度得到提高。例如,在参数预算约为10亿个参数的情况下,OpenELM在OLMo上的准确度提高了2.36%,而需要2倍 fewer 的预训练标记符。 我们发布了一个完全框架,用于在公开可用的数据集上训练和评估语言模型,包括训练日志、多个检查点和预训练配置。我们还发布了将模型转换为MLX库用于推理和微调的代码,以便在苹果设备上进行使用。这个全面的发布旨在增强和加强开放研究社区,为未来的开放研究探索奠定基础。我们的源代码以及预训练模型权重和训练 recipe都可以在[这个链接](https://github.com/yourggroup/openelm)上找到。此外,模型的预训练模型可以在HuggingFace找到:[这个链接](https://github.com/yourggroup/openelm)。
URL
https://arxiv.org/abs/2404.14619