Abstract
Transformers have supplanted Recurrent Neural Networks as the dominant architecture for both natural language processing tasks and, despite criticisms of cognitive implausibility, for modelling the effect of predictability on online human language comprehension. However, two recently developed recurrent neural network architectures, RWKV and Mamba, appear to perform natural language tasks comparably to or better than transformers of equivalent scale. In this paper, we show that contemporary recurrent models are now also able to match - and in some cases, exceed - performance of comparably sized transformers at modeling online human language comprehension. This suggests that transformer language models are not uniquely suited to this task, and opens up new directions for debates about the extent to which architectural features of language models make them better or worse models of human language comprehension.
Abstract (translated)
转换器已经取代了循环神经网络在自然语言处理任务中的主导地位,并为建模预测对在线人类语言理解的影响提供了优势。然而,最近开发的两个循环神经网络架构,RWKV和Mamba,似乎与同等规模的转换器在自然语言任务上表现相当,或者甚至更好。在本文中,我们证明了当代循环模型现在也能够达到或超过相同或更大规模的转换器的性能,建模在线人类语言理解。这表明,转换器语言模型并不独一无二地适应当这项任务,并为讨论语言模型的架构特征使其成为更好或更差的人类语言理解模型提供了新的方向。
URL
https://arxiv.org/abs/2404.19178