Abstract
The recent literature on online learning to rank (LTR) has established the utility of prior knowledge to Bayesian ranking bandit algorithms. However, a major limitation of existing work is the requirement for the prior used by the algorithm to match the true prior. In this paper, we propose and analyze adaptive algorithms that address this issue and additionally extend these results to the linear and generalized linear models. We also consider scalar relevance feedback on top of click feedback. Moreover, we demonstrate the efficacy of our algorithms using both synthetic and real-world experiments.
Abstract (translated)
最近的在线学习排序(LTR)文献已经证明了先验知识对于贝叶斯排序欺诈算法的有用性。然而,现有工作的一个主要限制是算法使用的先验必须与真正的先验匹配。在本文中,我们提出了并分析了自适应算法,以解决这个问题,并将这些结果扩展到线性和广义线性模型。我们还考虑了在点击反馈之上的 scalar 相关反馈。此外,我们使用合成实验和现实世界实验来演示我们算法的有效性。
URL
https://arxiv.org/abs/2301.10651