Generating Synthetic Data for Conversational Music Recommendation Using Random Walks and Language Models

2023-01-27 01:54:16

Megan Leszczynski, Ravi Ganti, Shu Zhang, Krisztian Balog, Filip Radlinski, Fernando Pereira, Arun Tejasvi Chaganty

arXiv_CL

arXiv_CL Recommendation Language_Model

Abstract
Abstract (translated)
URL
PDF

Abstract

Conversational recommendation systems (CRSs) enable users to use natural language feedback to control their recommendations, overcoming many of the challenges of traditional recommendation systems. However, the practical adoption of CRSs remains limited due to a lack of rich and diverse conversational training data that pairs user utterances with recommendations. To address this problem, we introduce a new method to generate synthetic training data by transforming curated item collections, such as playlists or movie watch lists, into item-seeking conversations. First, we use a biased random walk to generate a sequence of slates, or sets of item recommendations; then, we use a language model to generate corresponding user utterances. We demonstrate our approach by generating a conversational music recommendation dataset with over one million conversations, which were found to be consistent with relevant recommendations by a crowdsourced evaluation. Using the synthetic data to train a CRS, we significantly outperform standard retrieval baselines in offline and online evaluations.

Abstract (translated)

对话推荐系统(CRSs)使用户可以使用自然语言反馈来控制他们的推荐,克服了传统推荐系统许多挑战。然而,在实践中,CRSs仍然受到限制,由于缺乏与用户言论和推荐相关联的丰富和多样化的对话训练数据。为了解决这一问题,我们提出了一种新的方法,通过将整理过的物品集合(如播放列表或电影查看列表)转换为物品寻求的对话,生成合成的训练数据。首先,使用一个偏斜的随机漫步生成一组空白棋盘,或一组物品推荐集合;然后,使用语言模型生成相应的用户言论。我们通过生成一个超过100万条对话的对话音乐推荐数据集来展示我们的方法,该数据集通过 crowdsourced 评估被发现与相关推荐一致。使用合成数据来训练一个 CRS,我们在离线和在线评估中显著优于标准检索基准。

URL

https://arxiv.org/abs/2301.11489

PDF

https://arxiv.org/pdf/2301.11489.pdf