Abstract
Pretrained language models like BERT and T5 serve as crucial backbone encoders for dense retrieval. However, these models often exhibit limited generalization capabilities and face challenges in improving in domain accuracy. Recent research has explored using large language models (LLMs) as retrievers, achieving SOTA performance across various tasks. Despite these advancements, the specific benefits of LLMs over traditional retrievers and the impact of different LLM configurations, such as parameter sizes, pretraining duration, and alignment processes on retrieval tasks remain unclear. In this work, we conduct a comprehensive empirical study on a wide range of retrieval tasks, including in domain accuracy, data efficiency, zero shot generalization, lengthy retrieval, instruction based retrieval, and multi task learning. We evaluate over 15 different backbone LLMs and non LLMs. Our findings reveal that larger models and extensive pretraining consistently enhance in domain accuracy and data efficiency. Additionally, larger models demonstrate significant potential in zero shot generalization, lengthy retrieval, instruction based retrieval, and multi task learning. These results underscore the advantages of LLMs as versatile and effective backbone encoders in dense retrieval, providing valuable insights for future research and development in this field.
Abstract (translated)
预训练语言模型(如BERT和T5)在密集检索中扮演着至关重要的基础编码器角色。然而,这些模型通常表现出有限的泛化能力,并且在改进领域精度方面面临挑战。最近的研究探索了使用大型语言模型(LLMs)作为检索器,在各种任务上实现SOTA性能。尽管有这些进步,但LLMs相对于传统检索器的具体优势以及不同LLM配置(如参数大小、预训练持续时间和对齐过程)对检索任务的影响仍然不明确。在这篇论文中,我们对一系列检索任务进行了全面的实证研究,包括领域精度、数据效率、零散获取、长时间检索、基于指令的检索和多任务学习。我们评估了15种不同的基础LLM和非LLM。我们的发现表明,更大的模型和广泛的预训练确实增强了领域精度和数据效率。此外,更大的模型在零散获取、长时间检索、基于指令的检索和多任务学习方面表现出显著的增长潜力。这些结果突出了LLMs作为密集检索的有用且有效的编码器的作用,为未来研究和开发这个领域的未来研究提供了宝贵的见解。
URL
https://arxiv.org/abs/2408.12194