Abstract
Large language models (LLMs) have showcased profound capabilities in language understanding and generation, facilitating a wide array of applications. However, there is a notable paucity of detailed, open-sourced methodologies on efficiently scaling LLMs beyond 50 billion parameters with minimum trial-and-error cost and computational resources. In this report, we introduce Tele-FLM (aka FLM-2), a 52B open-sourced multilingual large language model that features a stable, efficient pre-training paradigm and enhanced factual judgment capabilities. Tele-FLM demonstrates superior multilingual language modeling abilities, measured by BPB on textual corpus. Besides, in both English and Chinese foundation model evaluation, it is comparable to strong open-sourced models that involve larger pre-training FLOPs, such as Llama2-70B and DeepSeek-67B. In addition to the model weights, we share the core designs, engineering practices, and training details, which we expect to benefit both the academic and industrial communities.
Abstract (translated)
大语言模型(LLMs)在语言理解和生成方面展示了深刻的应用潜力,推动了各种应用的发展。然而,在将LLMs扩展到具有500亿参数以上最小尝试和错误成本的范围内时,特别是在计算资源方面,依然存在显著的缺乏详细、开源的方法论。在本文中,我们介绍了Tele-FLM(即FLM-2),一个520亿参数开源的多语言大语言模型,具有稳定的预训练范式和增强的的事实判断能力。Tele-FLM展示了卓越的多语言语言建模能力,以BPB衡量文本语料库。此外,在英语和中文基础模型评估中,它与涉及更大预训练FLOPs的强开源模型(如Llama2-70B和DeepSeek-67B)相当。除了模型权重外,我们还分享了核心设计、工程实践和训练细节,我们期望这将有益于学术界和工业界。
URL
https://arxiv.org/abs/2404.16645