Paper Reading AI Learner

LLM as Effective Streaming Processor: Bridging Streaming-Batch Mismatches with Group Position Encoding

2025-05-22 17:53:28
Junlong Tong, Jinlan Fu, Zixuan Lin, Yingqi Fan, Anhao Zhao, Hui Su, Xiaoyu Shen

Abstract

Large Language Models (LLMs) are primarily designed for batch processing. Existing methods for adapting LLMs to streaming rely either on expensive re-encoding or specialized architectures with limited scalability. This work identifies three key mismatches in adapting batch-oriented LLMs to streaming: (1) input-attention, (2) output-attention, and (3) position-ID mismatches. While it is commonly assumed that the latter two mismatches require frequent re-encoding, our analysis reveals that only the input-attention mismatch significantly impacts performance, indicating re-encoding outputs is largely unnecessary. To better understand this discrepancy with the common assumption, we provide the first comprehensive analysis of the impact of position encoding on LLMs in streaming, showing that preserving relative positions within source and target contexts is more critical than maintaining absolute order. Motivated by the above analysis, we introduce a group position encoding paradigm built on batch architectures to enhance consistency between streaming and batch modes. Extensive experiments on cross-lingual and cross-modal tasks demonstrate that our method outperforms existing approaches. Our method requires no architectural modifications, exhibits strong generalization in both streaming and batch modes. The code is available at repository this https URL.

Abstract (translated)

大型语言模型(LLMs)主要设计用于批处理。现有将LLMs适应流处理的方法要么依赖昂贵的重新编码,要么依赖具有有限可扩展性的专用架构。本研究识别了在将面向批次的LLM调整为流模式时出现的三个关键不匹配:(1) 输入注意机制不匹配;(2) 输出注意机制不匹配;以及 (3) 位置ID不匹配。尽管通常认为后两种不匹配需要频繁重新编码,但我们的分析表明,只有输入注意机制不匹配显著影响性能,这意味着输出的重新编码在很大程度上是不必要的。为了更好地理解这种与普遍假设之间的差异,我们首次对LLMs在流处理中的位置编码影响进行了全面分析,显示保持源和目标上下文内的相对位置比维持绝对顺序更为关键。 受到上述分析的启发,我们引入了一种基于批处理架构的组位置编码范式,以增强流模式与批处理模式之间的一致性。跨语言和跨模态任务上的大量实验表明,我们的方法优于现有方法,并且无需对架构进行修改,在流模式和批处理模式下都表现出强大的泛化能力。代码可在以下网址获取:[此链接](请将“this https URL”替换为实际的URL)。

URL

https://arxiv.org/abs/2505.16983

PDF

https://arxiv.org/pdf/2505.16983.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot