Abstract
Tabular representation learning has recently gained a lot of attention. However, existing approaches only learn a representation from a single table, and thus ignore the potential to learn from the full structure of relational databases, including neighboring tables that can contain important information for a contextualized representation. Moreover, current models are significantly limited in scale, which prevents that they learn from large databases. In this paper, we thus introduce our vision of relational representation learning, that can not only learn from the full relational structure, but also can scale to larger database sizes that are commonly found in real-world. Moreover, we also discuss opportunities and challenges we see along the way to enable this vision and present initial very promising results. Overall, we argue that this direction can lead to foundation models for relational databases that are today only available for text and images.
Abstract (translated)
表格表示学习最近获得了很多关注。然而,现有的方法仅从单个表格中学习表示,因此忽略了从关系数据库的完整结构中学习的潜在机会,包括可以包含上下文化表示的重要相邻表格。此外,当前模型的规模和范围都非常有限,这阻止了它们从大型数据库中学习。在本文中,我们介绍了我们的关系表示学习愿景,不仅可以从完整的关系结构中学习,还可以扩展到通常存在于现实世界的更大的数据库规模。此外,我们还讨论了在实现这个愿景过程中看到的机会和挑战,并呈现了 initial 非常 promising 的结果。总的来说,我们认为这个方向可以导致关系数据库的基础模型,而今天这些模型只适用于文本和图像。
URL
https://arxiv.org/abs/2305.15321