Abstract
We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop. Our input sequence acts as a punchcard, consisting of instructions and memory for data read/writes. We demonstrate that a constant number of encoder layers can emulate basic computing blocks, including embedding edit operations, non-linear functions, function calls, program counters, and conditional branches. Using these building blocks, we emulate a small instruction-set computer. This allows us to map iterative algorithms to programs that can be executed by a looped, 13-layer transformer. We show how this transformer, instructed by its input, can emulate a basic calculator, a basic linear algebra library, and in-context learning algorithms that employ backpropagation. Our work highlights the versatility of the attention mechanism, and demonstrates that even shallow transformers can execute full-fledged, general-purpose programs.
Abstract (translated)
我们提出了一个框架,用于将Transformer网络视为通用计算机,通过编程其带有特定权重并将它们放入循环中。我们的输入序列就像一张卡片,包含指令和用于数据读取和写入的内存。我们证明了一定数量的编码层可以模拟基本的计算单元,包括嵌入编辑操作、非线性函数、函数调用、程序计数器和条件分支。利用这些单元,我们模拟了小型指令集计算机。这使我们能够将迭代算法映射到可循环的13层Transformer执行的程序中。我们展示了该Transformer如何根据输入指示模拟基本的计算器、基本的线性代数库以及使用反向梯度的上下文学习算法。我们的工作突出了注意力机制的多样化,并证明了即使是肤浅的Transformer也可以执行完整的通用程序。
URL
https://arxiv.org/abs/2301.13196