Paper Reading AI Learner

Sequential Enumeration in Large Language Models

2026-01-13 09:39:43
Kuinan Hou, Marco Zorzi, Alberto Testolin

Abstract

Reliably counting and generating sequences of items remain a significant challenge for neural networks, including Large Language Models (LLMs). Indeed, although this capability is readily handled by rule-based symbolic systems based on serial computation, learning to systematically deploy counting procedures is difficult for neural models, which should acquire these skills through learning. Previous research has demonstrated that recurrent architectures can only approximately track and enumerate sequences of events, and it remains unclear whether modern deep learning systems, including LLMs, can deploy systematic counting procedures over sequences of discrete symbols. This paper aims to fill this gap by investigating the sequential enumeration abilities of five state-of-the-art LLMs, including proprietary, open-source, and reasoning models. We probe LLMs in sequential naming and production tasks involving lists of letters and words, adopting a variety of prompting instructions to explore the role of chain-of-thought in the spontaneous emerging of counting strategies. We also evaluate open-source models with the same architecture but increasing size to see whether the mastering of counting principles follows scaling laws, and we analyze the embedding dynamics during sequential enumeration to investigate the emergent encoding of numerosity. We find that some LLMs are indeed capable of deploying counting procedures when explicitly prompted to do so, but none of them spontaneously engage in counting when simply asked to enumerate the number of items in a sequence. Our results suggest that, despite their impressive emergent abilities, LLMs cannot yet robustly and systematically deploy counting procedures, highlighting a persistent gap between neural and symbolic approaches to compositional generalization.

Abstract (translated)

可靠地计数和生成项目序列对神经网络(包括大型语言模型LLM)来说仍然是一个重要的挑战。实际上,虽然基于串行计算的规则化符号系统可以轻松处理这一能力,但学习系统性部署计数程序对于需要通过学习获得这些技能的神经模型而言则非常困难。之前的研究表明,递归架构只能近似地追踪和枚举事件序列,并且目前尚不清楚现代深度学习系统(包括LLM)是否能够对离散符号序列部署系统的计数程序。本文旨在填补这一空白,通过研究五个最先进的LLM(包括专有、开源和推理模型)的顺序枚举能力来解决这个问题。我们采用一系列提示指令在字母列表和单词列表的任务中探查LLM的序列命名和生成任务,以探讨链式思维在计数策略自发出现中的作用。此外,我们还评估了具有相同架构但大小逐渐增加的开源模型,以查看是否遵循缩放定律来掌握计数原则,并分析序列枚举过程中的嵌入动力学,以研究数值性的涌现编码。 我们的发现表明,当明确提示时,某些LLM确实能够部署计数程序,但在仅被要求列出一个序列中项目的数量时,没有一种模型会自发地进行计数。这些结果表明,尽管具有令人印象深刻的涌现能力,但LLM目前还不能稳健且系统性地部署计数程序,这凸显了神经方法和符号化方法在组合泛化中的持久差距。

URL

https://arxiv.org/abs/2512.04727

PDF

https://arxiv.org/pdf/2512.04727.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot