Abstract
We introduce DocPolarBERT, a layout-aware BERT model for document understanding that eliminates the need for absolute 2D positional embeddings. We extend self-attention to take into account text block positions in relative polar coordinate system rather than the Cartesian one. Despite being pre-trained on a dataset more than six times smaller than the widely used IIT-CDIP corpus, DocPolarBERT achieves state-of-the-art results. These results demonstrate that a carefully designed attention mechanism can compensate for reduced pre-training data, offering an efficient and effective alternative for document understanding.
Abstract (translated)
我们介绍了DocPolarBERT,这是一种针对文档理解的布局感知型BERT模型,它消除了对绝对二维位置嵌入的需求。我们将自注意力机制扩展为考虑基于相对极坐标系统而非笛卡尔坐标的文本块位置。尽管是在比广泛使用的IIT-CDIP语料库小六倍多的数据集上进行预训练的,DocPolarBERT仍取得了最先进的成果。这些结果表明,精心设计的注意机制可以补偿较小规模的预训练数据量,并为文档理解提供了高效且有效的替代方案。
URL
https://arxiv.org/abs/2507.08606