Abstract
Language models have been effective in a wide range of applications, yet the most sophisticated models are often proprietary. For example, GPT-4 by OpenAI and various models by Anthropic are expensive and consume substantial energy. In contrast, the open-source community has produced competitive models, like Llama3. Furthermore, niche-specific smaller language models, such as those tailored for legal, medical or financial tasks, have outperformed their proprietary counterparts. This paper introduces a novel approach that employs \textit{functional tokens} to integrate \textbf{multiple open-source models}, each optimized for particular tasks. Our newly developed Octopus v4 model leverages \textit{functional tokens} to intelligently direct user queries to the most appropriate vertical model and reformat the query to achieve the best performance. Octopus v4, an evolution of the Octopus v1, v2, and v3 models, excels in selection and parameter understanding and reformatting. Additionally, we explore the use of graph as a versatile data structure that effectively coordinates multiple open-source models by harnessing the capabilities of the Octopus model and \textit{functional tokens}. Use our open-sourced GitHub (\url{this https URL}) to try Octopus v4 models (\url{this https URL}), and contrite to a larger graph of language models. By activating models less than 10B parameters, we achieved SOTA MMLU score of 74.8 among the same level models.
Abstract (translated)
语言模型已经在广泛的应用中取得了成功,然而最先进的模型通常都是私有的。例如,OpenAI的GPT-4和Anthropic开发的 various模型价格昂贵,且能源消耗巨大。相比之下,开源社区已经产生了竞争力的模型,如Llama3。此外,针对法律、医疗或金融等特定领域的 niche-specific 小语言模型已经超过了其私有的对应模型。本文介绍了一种新方法,该方法采用功能词(functional tokens)将多个开源模型集成在一起,每个模型都针对特定的任务进行优化。我们开发的新型Octopus v4模型利用功能词智能地将用户查询引导到最合适的垂直模型,并重新格式化查询以实现最佳性能。Octopus v4 是 Octopus v1、v2 和 v3 模型的进化版,在选择和参数理解方面表现优异,并且通过利用 Octopus 模型的功能和功能词,有效地协调了多个开源模型。使用我们开源的 GitHub(https://github.com/this https URL)尝试Octopus v4 模型(https://github.com/this https URL),并向更大的语言模型道歉。通过激活模型参数低于10B的模型,我们在同一级别的模型中实现了SOTA MMLU得分74.8。
URL
https://arxiv.org/abs/2404.19296