Abstract
Social media platforms are daily exhibiting millions of events. To preliminarily predict the mainstream public reaction to these events, we study trendy response prediction to automatically generate top-liked user replies to social media events. While previous works focus on generating responses without factoring in popularity, we propose Popularity-Aligned Language Models (PopALM) to distinguish responses liked by a larger audience through reinforcement learning. Recognizing the noisy labels from user "likes", we tailor-make curriculum learning in proximal policy optimization (PPO) to help models capture the essential samples for easy-to-hard training. In experiments, we build a large-scale Weibo dataset for trendy response prediction, and its results show that PopALM can help boost the performance of advanced language models.
Abstract (translated)
社交媒体平台每天展示数百万个事件。为了初步预测这些事件对主流公众的反应,我们研究了针对社交媒体事件的流行回答预测,以自动生成受到大众欢迎的用户回复。与其他工作关注于生成没有考虑受欢迎程度的回答不同,我们提出了Popularity-Aligned Language Models (PopALM),通过强化学习区分受到更大受众喜欢的回答。为了识别用户“点赞”中的噪声标签,我们针对距离策略优化(PPO)进行了定制化的学习,以帮助模型捕捉易-难训练的关键样本。在实验中,我们为流行回答预测构建了一个庞大的微博数据集,结果表明PopALM可以提高高级语言模型的性能。
URL
https://arxiv.org/abs/2402.18950