DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Abstract
Abstract (translated)
URL
PDF

Abstract

We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models. The model checkpoints are available at "this https URL.

Abstract (translated)

我们提出了DeepSeek-V2，一种强大的Mixture-of-Experts（MoE）语言模型，具有经济实惠的训练和高效的推理。它包括236B总参数，其中21B针对每个词条激活，并支持128K个词条的上下文长度。DeepSeek-V2采用了创新架构，包括多头潜在注意力（MLA）和DeepSeekMoE。MLA通过显著压缩键值（KV）缓存将其压缩为潜在向量，从而保证高效的推理，而DeepSeekMoE则通过稀疏计算以经济实惠的成本训练强大的模型。与DeepSeek 67B相比，DeepSeek-V2实现了显著的性能提升，同时降低了训练成本，将KV缓存减小了93.3%，并将最大生成吞吐量提高了5.76倍。我们对DeepSeek-V2进行了预训练，在由8.1T个词条组成的高质量多源语料库上，并进一步进行了监督微调（SFT）和强化学习（RL）以充分发掘其潜力。评估结果显示，即使只有21B个激活参数，DeepSeek-V2及其聊天版本仍然在开源模型中实现了顶级性能。模型检查点可在“此https URL上找到。

URL

https://arxiv.org/abs/2405.04434

PDF

https://arxiv.org/pdf/2405.04434.pdf

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Abstract

Abstract (translated)

URL

PDF Copy

PDF