Abstract
Large language model (LLM) has marked a pivotal moment in the field of machine learning and deep learning. Recently its capability for query planning has been investigated, including both single-modal and multi-modal queries. However, there is no work on the query optimization capability of LLM. As a critical (or could even be the most important) step that significantly impacts the execution performance of the query plan, such analysis and attempts should not be missed. From another aspect, existing query optimizers are usually rule-based or rule-based + cost-based, i.e., they are dependent on manually created rules to complete the query plan rewrite/transformation. Given the fact that modern optimizers include hundreds to thousands of rules, designing a multi-modal query optimizer following a similar way is significantly time-consuming since we will have to enumerate as many multi-modal optimization rules as possible, which has not been well addressed today. In this paper, we investigate the query optimization ability of LLM and use LLM to design LaPuda, a novel LLM and Policy based multi-modal query optimizer. Instead of enumerating specific and detailed rules, LaPuda only needs a few abstract policies to guide LLM in the optimization, by which much time and human effort are saved. Furthermore, to prevent LLM from making mistakes or negative optimization, we borrow the idea of gradient descent and propose a guided cost descent (GCD) algorithm to perform the optimization, such that the optimization can be kept in the correct direction. In our evaluation, our methods consistently outperform the baselines in most cases. For example, the optimized plans generated by our methods result in 1~3x higher execution speed than those by the baselines.
Abstract (translated)
大语言模型(LLM)在机器学习和深度学习领域标志着了一个重要的里程碑。最近,研究了其查询规划能力,包括单模态和多模态查询。然而,LLM的查询优化能力尚未得到研究。作为对查询计划执行性能的显著影响的关键(甚至可能是最重要的)一步,这种分析和尝试不应该被忽视。从另一个方面说,现有的查询优化器通常是基于规则的或基于规则的+成本基于的,即它们依赖于人工创建的规则来完成查询规划的重新编写/转换。考虑到现代优化器包括数百到数千个规则,根据类似的方法设计一个多模态查询优化器会耗时很长时间,因为我们必须枚举尽可能多的多模态优化规则,而今天这个问题尚未得到很好的解决。在本文中,我们研究了LLM的查询优化能力,并使用LLM设计了一种新颖的LLM和基于策略的多模态查询优化器LaPuda。我们不再枚举具体的和详细的规则,而只是需要几个抽象策略来指导LLM在优化,从而大大节省了时间和人力。此外,为了防止LLM犯错或进行负优化,我们借用了梯度下降的思想,并提出了一种引导成本下降(GCD)算法来进行优化,使得优化可以在正确的方向上进行。在我们的评估中,我们的方法在大多数情况下都超越了基线。例如,通过我们的方法优化后的计划使执行速度提高了1~3倍,而基线则没有变化。
URL
https://arxiv.org/abs/2403.13597