Abstract
The PLAID (Performance-optimized Late Interaction Driver) algorithm for ColBERTv2 uses clustered term representations to retrieve and progressively prune documents for final (exact) document scoring. In this paper, we reproduce and fill in missing gaps from the original work. By studying the parameters PLAID introduces, we find that its Pareto frontier is formed of a careful balance among its three parameters; deviations beyond the suggested settings can substantially increase latency without necessarily improving its effectiveness. We then compare PLAID with an important baseline missing from the paper: re-ranking a lexical system. We find that applying ColBERTv2 as a re-ranker atop an initial pool of BM25 results provides better efficiency-effectiveness trade-offs in low-latency settings. However, re-ranking cannot reach peak effectiveness at higher latency settings due to limitations in recall of lexical matching and provides a poor approximation of an exhaustive ColBERTv2 search. We find that recently proposed modifications to re-ranking that pull in the neighbors of top-scoring documents overcome this limitation, providing a Pareto frontier across all operational points for ColBERTv2 when evaluated using a well-annotated dataset. Curious about why re-ranking methods are highly competitive with PLAID, we analyze the token representation clusters PLAID uses for retrieval and find that most clusters are predominantly aligned with a single token and vice versa. Given the competitive trade-offs that re-ranking baselines exhibit, this work highlights the importance of carefully selecting pertinent baselines when evaluating the efficiency of retrieval engines.
Abstract (translated)
PLAID(高性能晚期交互驱动器)算法用于 ColBERTv2 时,它使用聚类词表示来检索并逐步修剪文本来实现最终(精确)文档评分。在本文中,我们复制并填补了原始工作中的缺失部分。通过研究 PLAID 引入的参数,我们发现其 Pareto 前沿是由其三个参数之间的谨慎平衡组成的;超出建议设置的偏差可能会显著增加延迟,而不仅仅是提高其有效性。然后,我们将 PLAID 与原始论文中重要的基线进行比较:对词汇系统进行重新排名。我们发现,将 ColBERTv2 作为初始池的 BM25 结果上的重新排名提供了更好的效率-效果权衡。然而,由于词汇匹配的回忆限制,在较高延迟设置上无法达到峰值效果,并且对完整的 ColBERTv2 搜索的近似度很低。我们发现,最近提出的重新排名修改方法,如吸引邻居最高评分文档的邻居,克服了这一限制,为使用良好注释的数据集评估 PLAID 时提供了 Pareto 前沿。关于为什么重新排名方法与 PLAID 具有高度竞争性,我们分析了 PLAID 使用时的词表示聚类,并发现大多数聚类都是高度相关的单一词,反之亦然。鉴于重新排名基线的竞争性,这项工作突出了在评估检索引擎的效率时谨慎选择相关基线的重要性。
URL
https://arxiv.org/abs/2404.14989