Paper Reading AI Learner

Fast Thinking for Large Language Models

2025-09-28 04:19:48
Haoyu Zheng, Zhuonan Wang, Yuqian Yuan, Tianwei Lin, Wenqiao Zhang, Zheqi Lv, Juncheng Li, Siliang Tang, Yueting Zhuang, Hongyang He

Abstract

Reasoning-oriented Large Language Models (LLMs) often rely on generating explicit tokens step by step, and their effectiveness typically hinges on large-scale supervised fine-tuning or reinforcement learning. While Chain-of-Thought (CoT) techniques substantially enhance performance on complex reasoning tasks, they remain inefficient, requiring long reasoning traces that increase latency and token usage. In this work, we introduce Latent Codebooks for Fast Thinking, a framework that uses concise CoT sketches only during training to learn a codebook of discrete strategy priors. At inference, the model conditions on a handful of continuous thinking vectors distilled from the codebook in a single pass, enabling strategy-level guidance without producing explicit reasoning tokens. To complement this design, we propose GainRouter, a lightweight routing mechanism that adaptively switches between fast codebook guided inference and slow explicit reasoning, thereby suppressing overthinking and reducing unnecessary token generation. Experiments across multiple reasoning benchmarks show that our approach achieves competitive or superior accuracy while substantially lowering inference cost, offering a practical path toward efficient and controllable reasoning in large language models.

Abstract (translated)

以推理为导向的大规模语言模型(LLMs)通常依赖于逐步生成明确的标记,并且它们的有效性往往取决于大规模的监督微调或强化学习。虽然链式思维(CoT)技术显著提高了复杂推理任务的表现,但这种方法仍然效率低下,需要较长的推理痕迹,从而增加了延迟和标记使用量。在这项工作中,我们引入了“Fast Thinking”的隐式代码本框架,该框架仅在训练期间使用简洁的CoT草图来学习一组离散策略先验的代码本。在推断阶段,模型根据从代码本中一次性提取出的少量连续思考向量进行条件处理,从而能够在不生成明确推理标记的情况下提供策略级别的指导。为了补充这一设计,我们提出了GainRouter,这是一种轻量级的路由机制,能够自适应地在快速代码本引导推断和缓慢的显式推理之间切换,从而抑制过度思考并减少不必要的令牌生成。实验结果显示,在多个推理基准上,我们的方法实现了竞争性或更优的准确率,并且显著降低了推断成本,为大规模语言模型中的高效和可控推理提供了一条实用路径。

URL

https://arxiv.org/abs/2509.23633

PDF

https://arxiv.org/pdf/2509.23633.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot