Abstract
This report details the development and key achievements of our latest language model designed for custom large language models. The advancements introduced include a novel Online Data Scheduler that supports flexible training data adjustments and curriculum learning. The model's architecture is fortified with state-of-the-art techniques such as Rotary Positional Embeddings, QK-LayerNorm, and a specially crafted multilingual tokenizer to enhance stability and performance. Moreover, our robust training framework incorporates advanced monitoring and rapid recovery features to ensure optimal efficiency. Our Wonton 7B model has demonstrated competitive performance on a range of multilingual and English benchmarks. Future developments will prioritize narrowing the performance gap with more extensively trained models, thereby enhancing the model's real-world efficacy and adaptability.GitHub: \url{this https URL}
Abstract (translated)
本报告详细介绍了我们最新的为定制大型语言模型而设计的语言模型的开发关键成就。引入的改进包括一个支持灵活训练数据调整和课程学习的新颖在线数据调度器。模型的架构由最先进的技术 such as Rotary Positional Embeddings, QK-LayerNorm 和专门设计的多语言标记符强化稳定性 and 性能。此外,我们的稳健训练框架包括先进的监控和快速恢复功能,以确保最佳效率。我们的Wonton 7B模型在多语言和英语基准测试中表现出竞争力的性能。未来的发展将优先考虑通过更广泛训练的模型来缩小性能差距,从而增强模型的真实世界效果和适应性。 GitHub:\url{this <https://github.com> URL}
URL
https://arxiv.org/abs/2404.15702