Abstract
Relational database management systems (RDBMS) are widely used for the storage and retrieval of structured data. To derive insights beyond statistical aggregation, we typically have to extract specific subdatasets from the database using conventional database operations, and then apply deep neural networks (DNN) training and inference on these respective subdatasets in a separate machine learning system. The process can be prohibitively expensive, especially when there are a combinatorial number of subdatasets extracted for different analytical purposes. This calls for efficient in-database support of advanced analytical methods In this paper, we introduce LEADS, a novel SQL-aware dynamic model slicing technique to customize models for subdatasets specified by SQL queries. LEADS improves the predictive modeling of structured data via the mixture of experts (MoE) technique and maintains inference efficiency by a SQL-aware gating network. At the core of LEADS is the construction of a general model with multiple expert sub-models via MoE trained over the entire database. This SQL-aware MoE technique scales up the modeling capacity, enhances effectiveness, and preserves efficiency by activating only necessary experts via the gating network during inference. Additionally, we introduce two regularization terms during the training process of LEADS to strike a balance between effectiveness and efficiency. We also design and build an in-database inference system, called INDICES, to support end-to-end advanced structured data analytics by non-intrusively incorporating LEADS onto PostgreSQL. Our extensive experiments on real-world datasets demonstrate that LEADS consistently outperforms baseline models, and INDICES delivers effective in-database analytics with a considerable reduction in inference latency compared to traditional solutions.
Abstract (translated)
关系数据库管理系统(RDBMS)广泛用于存储和检索结构化数据。为了获得超越统计聚合的见解,通常需要使用传统数据库操作从数据库中提取特定子数据集,并在另一个机器学习系统中在这些子数据集上应用深度神经网络(DNN)的训练和推理。这个过程通常非常昂贵,尤其是在提取不同分析目的的多个子数据集时。因此,本文提出了LEADS,一种新颖的SQL感知动态模型切削技术,用于根据SQL查询指定的子数据集定制模型。LEADS通过混合专家(MoE)技术改善了结构化数据的预测建模,通过SQL感知门网络保持推理效率。LEADS的核心是通过MoE在整个数据库上训练构建通用模型,并仅在推理时激活必要专家。此外,在LEADS训练过程中引入了两个正则化项,以在效果和效率之间取得平衡。我们还设计并构建了一个支持端到端高级结构数据分析的内部推理系统,称为INDices,通过在PostgreSQL中非侵入性地将LEADS集成来支持结构化数据的分析。我们在现实世界的数据集上进行的大量实验证明,LEADS持续优于基线模型,而INDices通过显著减少推理延迟,实现了有效的高速数据库分析。
URL
https://arxiv.org/abs/2405.00568