Paper Reading AI Learner

Self-Route: Automatic Mode Switching via Capability Estimation for Efficient Reasoning

2025-05-27 03:18:31
Yang He, Xiao Ding, Bibo Cai, Yufei Zhang, Kai Xiong, Zhouhao Sun, Bing Qin, Ting Liu

Abstract

While reasoning-augmented large language models (RLLMs) significantly enhance complex task performance through extended reasoning chains, they inevitably introduce substantial unnecessary token consumption, particularly for simpler problems where Short Chain-of-Thought (Short CoT) suffices. This overthinking phenomenon leads to inefficient resource usage without proportional accuracy gains. To address this issue, we propose Self-Route, a dynamic reasoning framework that automatically selects between general and reasoning modes based on model capability estimation. Our approach introduces a lightweight pre-inference stage to extract capability-aware embeddings from hidden layer representations, enabling real-time evaluation of the model's ability to solve problems. We further construct Gradient-10K, a model difficulty estimation-based dataset with dense complexity sampling, to train the router for precise capability boundary detection. Extensive experiments demonstrate that Self-Route achieves comparable accuracy to reasoning models while reducing token consumption by 30-55\% across diverse benchmarks. The proposed framework demonstrates consistent effectiveness across models with different parameter scales and reasoning paradigms, highlighting its general applicability and practical value.

Abstract (translated)

尽管增强型大型语言模型(RLLM)通过扩展的推理链显著提高了复杂任务的表现,但对于那些只需简短思维链条(Short CoT)就能解决的简单问题而言,它们不可避免地引入了大量不必要的标记消耗,导致资源使用效率低下而没有相应的准确度提升。为了解决这一问题,我们提出了Self-Route,这是一种动态推理框架,能够根据模型能力估计自动在一般模式和推理模式之间进行选择。我们的方法引入了一个轻量级的预推理阶段,从隐藏层表示中提取出具有认知能力嵌入,从而实现实时评估模型解决问题的能力。此外,我们构建了Gradient-10K,这是一个基于模型难度估算的数据集,并且包含了密集复杂度抽样,用于训练路由器以实现精确的能力边界检测。广泛的实验表明,Self-Route在减少标记消耗30%-55%的同时,与推理模型的准确性相当,在各种基准测试中均表现出色。该框架展示出了跨不同参数规模和推理范式的持续有效性,突显了其广泛适用性和实际价值。

URL

https://arxiv.org/abs/2505.20664

PDF

https://arxiv.org/pdf/2505.20664.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot