Abstract
With the rapid advancement of large language models (LLMs), information retrieval (IR) systems, such as search engines and recommender systems, have undergone a significant paradigm shift. This evolution, while heralding new opportunities, introduces emerging challenges, particularly in terms of biases and unfairness, which may threaten the information ecosystem. In this paper, we present a comprehensive survey of existing works on emerging and pressing bias and unfairness issues in IR systems when the integration of LLMs. We first unify bias and unfairness issues as distribution mismatch problems, providing a groundwork for categorizing various mitigation strategies through distribution alignment. Subsequently, we systematically delve into the specific bias and unfairness issues arising from three critical stages of LLMs integration into IR systems: data collection, model development, and result evaluation. In doing so, we meticulously review and analyze recent literature, focusing on the definitions, characteristics, and corresponding mitigation strategies associated with these issues. Finally, we identify and highlight some open problems and challenges for future work, aiming to inspire researchers and stakeholders in the IR field and beyond to better understand and mitigate bias and unfairness issues of IR in this LLM era. We also consistently maintain a GitHub repository for the relevant papers and resources in this rising direction at this https URL.
Abstract (translated)
随着大型语言模型(LLMs)的快速发展,信息检索(IR)系统,如搜索引擎和推荐系统,经历了一个显著的范式转变。尽管这一演变带来了新的机会,但同时也带来了新兴挑战,特别是在偏见和公平性方面,这可能威胁到信息生态系统。在本文中,我们对IR系统中当前新兴和紧迫的偏见和公平性问题进行了全面的调查。我们首先将偏见和公平性问题统一为分布不匹配问题,为通过分布对各种缓解策略进行分类奠定基础。接着,我们系统地深入研究了LLMs整合到IR系统中的三个关键阶段:数据收集、模型开发和结果评估,详细审查并分析了相关文献,重点关注这些问题的定义、特征以及相应的缓解策略。最后,我们识别和强调了未来工作中的一些开放问题和挑战,旨在激发IR领域的研究人员和利益相关者更好地理解并缓解LLM时代IR中的偏见和公平性问题。我们还在这个上升趋势的领域持续维护一个GitHub仓库,用于存储相关论文和资源,链接如下:https://github.com/[您的GitHub用户名]/相关论文和资源。
URL
https://arxiv.org/abs/2404.11457