Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Abstract
Abstract (translated)
URL
PDF

Abstract

We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka. Reka models are able to process and reason with text, images, video, and audio inputs. This technical report discusses details of training some of these models and provides comprehensive evaluation results. We show that Reka Edge and Reka Flash are not only state-of-the-art but also outperform many much larger models, delivering outsized values for their respective compute class. Meanwhile, our most capable and largest model, Reka Core, approaches the best frontier models on both automatic evaluations and blind human evaluations. On image question answering benchmarks (e.g. MMMU, VQAv2), Core performs competitively to GPT4-V. Meanwhile, on multimodal chat, Core ranks as the second most preferred model under a blind third-party human evaluation setup, outperforming other models such as Claude 3 Opus. On text benchmarks, Core not only performs competitively to other frontier models on a set of well-established benchmarks (e.g. MMLU, GSM8K) but also outperforms GPT4-0613 on human evaluation. On video question answering (Perception-Test), Core outperforms Gemini Ultra. Models are shipped in production at this http URL . A showcase of non cherry picked qualitative examples can also be found at this http URL .

Abstract (translated)

我们介绍了由Reka开发的 Reka Core、Flash 和 Edge 一系列强大的多模态语言模型。这些模型是由 Reka 从头训练的，能够处理和推理文本、图像、视频和音频输入。本技术报告讨论了训练这些模型的细节，并提供了全面评估结果。我们发现，Reka Edge 和 Reka Flash 不仅是最先进的，而且在各自的计算类别中还表现出色，为它们各自的计算类别带来了超额价值。与此同时，我们最强大和最大的模型 Reka Core，在自动评估和盲人评估方面都接近最佳前沿模型。在图像问答基准（如MMMU，VQAv2）上，Core与GPT4-V竞争相当。在多模态聊天中，Core在盲第三方人类评估设置下排名第二，超过了像Claude 3 Opus这样的其他模型。在文本基准上，Core不仅在一批经过良好检验的基准（如MMLU，GSM8K）上表现竞争力，而且也在人类评估中超过了GPT4-0613。在视频问答（Perception-Test）中，Core超过了Gemini Ultra。模型现在在生产环境中通过这个链接发送：http://www.reka.ai/。也可以在这个链接找到展示非 cherry-picked 高质量示例的示例：http://www.reka.ai/quality-examples/。

URL

https://arxiv.org/abs/2404.12387

PDF

https://arxiv.org/pdf/2404.12387.pdf

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Abstract

Abstract (translated)

URL

PDF Copy

PDF