Abstract
In this preliminary study, we investigate a GPT-driven intent-based reasoning approach to streamline tool selection for large language models (LLMs) aimed at system efficiency. By identifying the intent behind user prompts at runtime, we narrow down the API toolset required for task execution, reducing token consumption by up to 24.6\%. Early results on a real-world, massively parallel Copilot platform with over 100 GPT-4-Turbo nodes show cost reductions and potential towards improving LLM-based system efficiency.
Abstract (translated)
在这份初步研究中,我们研究了一种基于意图的推理方法,以简化大型语言模型(LLMs)的选择工具,提高系统效率。通过在运行时识别用户提示的意图,我们缩小了任务执行所需API工具集的范围,将每个单词的消耗减少至24.6%。在一个具有超过100个GPT-4-Turbo节点的真实世界、大规模并行 Copilot 平台上进行初步实验,结果显示了成本降低以及潜在的提高LLM基础系统效率的可能性。
URL
https://arxiv.org/abs/2404.15804