Abstract
While reasoning-augmented large language models (RLLMs) significantly enhance complex task performance through extended reasoning chains, they inevitably introduce substantial unnecessary token consumption, particularly for simpler problems where Short Chain-of-Thought (Short CoT) suffices. This overthinking phenomenon leads to inefficient resource usage without proportional accuracy gains. To address this issue, we propose Self-Route, a dynamic reasoning framework that automatically selects between general and reasoning modes based on model capability estimation. Our approach introduces a lightweight pre-inference stage to extract capability-aware embeddings from hidden layer representations, enabling real-time evaluation of the model's ability to solve problems. We further construct Gradient-10K, a model difficulty estimation-based dataset with dense complexity sampling, to train the router for precise capability boundary detection. Extensive experiments demonstrate that Self-Route achieves comparable accuracy to reasoning models while reducing token consumption by 30-55\% across diverse benchmarks. The proposed framework demonstrates consistent effectiveness across models with different parameter scales and reasoning paradigms, highlighting its general applicability and practical value.
Abstract (translated)
尽管增强型大型语言模型(RLLM)通过扩展的推理链显著提高了复杂任务的表现,但对于那些只需简短思维链条(Short CoT)就能解决的简单问题而言,它们不可避免地引入了大量不必要的标记消耗,导致资源使用效率低下而没有相应的准确度提升。为了解决这一问题,我们提出了Self-Route,这是一种动态推理框架,能够根据模型能力估计自动在一般模式和推理模式之间进行选择。我们的方法引入了一个轻量级的预推理阶段,从隐藏层表示中提取出具有认知能力嵌入,从而实现实时评估模型解决问题的能力。此外,我们构建了Gradient-10K,这是一个基于模型难度估算的数据集,并且包含了密集复杂度抽样,用于训练路由器以实现精确的能力边界检测。广泛的实验表明,Self-Route在减少标记消耗30%-55%的同时,与推理模型的准确性相当,在各种基准测试中均表现出色。该框架展示出了跨不同参数规模和推理范式的持续有效性,突显了其广泛适用性和实际价值。
URL
https://arxiv.org/abs/2505.20664