Abstract
Long-range sequence modeling is a crucial aspect of natural language processing and time series analysis. However, traditional models like Recurrent Neural Networks (RNNs) and Transformers suffer from computational and memory inefficiencies, especially when dealing with long sequences. This paper introduces Logarithmic Memory Networks (LMNs), a novel architecture that leverages a hierarchical logarithmic tree structure to efficiently store and retrieve past information. LMNs dynamically summarize historical context, significantly reducing the memory footprint and computational complexity of attention mechanisms from O(n2) to O(log(n)). The model employs a single-vector, targeted attention mechanism to access stored information, and the memory block construction worker (summarizer) layer operates in two modes: a parallel execution mode during training for efficient processing of hierarchical tree structures and a sequential execution mode during inference, which acts as a memory management system. It also implicitly encodes positional information, eliminating the need for explicit positional encodings. These features make LMNs a robust and scalable solution for processing long-range sequences in resource-constrained environments, offering practical improvements in efficiency and scalability. The code is publicly available under the MIT License on GitHub: this https URL.
Abstract (translated)
长序列建模是自然语言处理和时间序列分析中的一个关键方面。然而,传统的模型如循环神经网络(RNN)和变换器在处理长序列时会遇到计算效率低下和内存消耗过高的问题。本文介绍了一种新的架构——对数记忆网络(LMN),该架构利用了分层的对数树结构来高效存储和检索过去的信息。LMNs能够动态地总结历史背景,显著减少了注意力机制的记忆占用量和计算复杂度,从O(n^2)降至O(log(n))。模型采用单向量、目标注意机制来访问存储信息,并且记忆块构建工人(摘要生成器)层在两种模式下运行:一种是在训练期间用于高效处理分层树结构的并行执行模式;另一种是推理时作为内存管理系统工作的顺序执行模式。此外,LMNs还隐式地编码位置信息,从而消除了对显式位置编码的需求。这些特性使LMN成为资源受限环境中处理长序列的有效和可扩展解决方案,提供了在效率和可扩展性方面的实际改进。该代码以MIT许可证的形式公开发布在GitHub上:[此链接](请将括号内的文本替换为实际的URL)。
URL
https://arxiv.org/abs/2501.07905