Abstract
With the growing demand for personalized assortment recommendations, concerns over data privacy have intensified, highlighting the urgent need for effective privacy-preserving strategies. This paper presents a novel framework for privacy-preserving dynamic assortment selection using the multinomial logit (MNL) bandits model. Our approach employs a perturbed upper confidence bound method, integrating calibrated noise into user utility estimates to balance between exploration and exploitation while ensuring robust privacy protection. We rigorously prove that our policy satisfies Joint Differential Privacy (JDP), which better suits dynamic environments than traditional differential privacy, effectively mitigating inference attack risks. This analysis is built upon a novel objective perturbation technique tailored for MNL bandits, which is also of independent interest. Theoretically, we derive a near-optimal regret bound of $\tilde{O}(\sqrt{T})$ for our policy and explicitly quantify how privacy protection impacts regret. Through extensive simulations and an application to the Expedia hotel dataset, we demonstrate substantial performance enhancements over the benchmark method.
Abstract (translated)
随着对个性化商品推荐需求的增加,关于数据隐私的关注也日益增强,这突显了有效保护隐私策略的迫切需要。本文提出了一种基于多项式逻辑(MNL)强盗模型的新型动态商品选择框架,用于实现隐私保护。我们的方法采用了扰动的上置信界法,通过将校准后的噪声整合到用户效用估计中来平衡探索与利用的关系,并确保了强大的隐私保护功能。我们严格证明了我们的策略满足联合微分隐私(JDP),这比传统的微分隐私更适应动态环境,有效缓解了推断攻击的风险。这项分析基于一种专为MNL强盗设计的新目标扰动技术,该技术本身也具有独立的研究价值。理论上,我们得到了我们的策略的接近最优的遗憾界$\tilde{O}(\sqrt{T})$,并明确量化了隐私保护对遗憾的影响。通过广泛的模拟实验和Expedia酒店数据集的应用,我们展示了在基准方法上的显著性能提升。
URL
https://arxiv.org/abs/2410.22488