Looped Transformers as Programmable Computers

2023-01-30 18:57:31

Angeliki Giannou, Shashank Rajput, Jy-yong Sohn, Kangwook Lee, Jason D. Lee, Dimitris Papailiopoulos

arXiv_AI

Abstract
Abstract (translated)
URL
PDF

Abstract

We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop. Our input sequence acts as a punchcard, consisting of instructions and memory for data read/writes. We demonstrate that a constant number of encoder layers can emulate basic computing blocks, including embedding edit operations, non-linear functions, function calls, program counters, and conditional branches. Using these building blocks, we emulate a small instruction-set computer. This allows us to map iterative algorithms to programs that can be executed by a looped, 13-layer transformer. We show how this transformer, instructed by its input, can emulate a basic calculator, a basic linear algebra library, and in-context learning algorithms that employ backpropagation. Our work highlights the versatility of the attention mechanism, and demonstrates that even shallow transformers can execute full-fledged, general-purpose programs.

Abstract (translated)

我们提出了一个框架,用于将Transformer网络视为通用计算机,通过编程其带有特定权重并将它们放入循环中。我们的输入序列就像一张卡片,包含指令和用于数据读取和写入的内存。我们证明了一定数量的编码层可以模拟基本的计算单元,包括嵌入编辑操作、非线性函数、函数调用、程序计数器和条件分支。利用这些单元,我们模拟了小型指令集计算机。这使我们能够将迭代算法映射到可循环的13层Transformer执行的程序中。我们展示了该Transformer如何根据输入指示模拟基本的计算器、基本的线性代数库以及使用反向梯度的上下文学习算法。我们的工作突出了注意力机制的多样化,并证明了即使是肤浅的Transformer也可以执行完整的通用程序。

URL

https://arxiv.org/abs/2301.13196

PDF

https://arxiv.org/pdf/2301.13196.pdf

Looped Transformers as Programmable Computers

Abstract

Abstract (translated)

URL

PDF Copy

PDF