UFO-ViT: High Performance Linear Vision Transformer without Softmax

Abstract
Abstract (translated)
URL
PDF

Abstract

Vision transformers have become one of the most important models for computer vision tasks. While they outperform earlier convolutional networks, the complexity quadratic to $N$ is one of the major drawbacks when using traditional self-attention algorithms. Here we propose the UFO-ViT(Unit Force Operated Vision Trnasformer), novel method to reduce the computations of self-attention by eliminating some non-linearity. Modifying few of lines from self-attention, UFO-ViT achieves linear complexity without the degradation of performance. The proposed models outperform most transformer-based models on image classification and dense prediction tasks through most capacity regime.

Abstract (translated)

URL

https://arxiv.org/abs/2109.14382

PDF

https://arxiv.org/pdf/2109.14382.pdf