Abstract
Conversational recommender systems have emerged as a potent solution for efficiently eliciting user preferences. These systems interactively present queries associated with "key terms" to users and leverage user feedback to estimate user preferences more efficiently. Nonetheless, most existing algorithms adopt a centralized approach. In this paper, we introduce FedConPE, a phase elimination-based federated conversational bandit algorithm, where $M$ agents collaboratively solve a global contextual linear bandit problem with the help of a central server while ensuring secure data management. To effectively coordinate all the clients and aggregate their collected data, FedConPE uses an adaptive approach to construct key terms that minimize uncertainty across all dimensions in the feature space. Furthermore, compared with existing federated linear bandit algorithms, FedConPE offers improved computational and communication efficiency as well as enhanced privacy protections. Our theoretical analysis shows that FedConPE is minimax near-optimal in terms of cumulative regret. We also establish upper bounds for communication costs and conversation frequency. Comprehensive evaluations demonstrate that FedConPE outperforms existing conversational bandit algorithms while using fewer conversations.
Abstract (translated)
对话推荐系统已成为有效发掘用户偏好的强大解决方案。这些系统通过交互地向用户呈现与“关键词”相关的查询,并利用用户反馈来更有效地估计用户偏好。然而,大多数现有算法采用集中方法。在本文中,我们引入了FedConPE,一种基于阶段消除的联邦对话博弈算法,其中M个代理商通过中央服务器协作解决全球上下文线性博弈问题,同时确保安全数据管理。为了有效地协调所有客户端并汇总他们收集的数据,FedConPE采用一种自适应方法构造关键词,以最小化所有维度特征空间中的不确定性。此外,与现有联邦线性博弈算法相比,FedConPE在计算和通信效率以及隐私保护方面提供了改进。我们的理论分析表明,FedConPE在累积后悔度方面是最小最大最优的。我们还建立了通信成本和对话频率的上界。全面评估表明,尽管使用较少对话,FedConPE在现有对话博弈算法中表现优异。
URL
https://arxiv.org/abs/2405.02881