Abstract
Millions of users are active on social media. To allow users to better showcase themselves and network with others, we explore the auto-generation of social media self-introduction, a short sentence outlining a user's personal interests. While most prior work profiles users with tags (e.g., ages), we investigate sentence-level self-introductions to provide a more natural and engaging way for users to know each other. Here we exploit a user's tweeting history to generate their self-introduction. The task is non-trivial because the history content may be lengthy, noisy, and exhibit various personal interests. To address this challenge, we propose a novel unified topic-guided encoder-decoder (UTGED) framework; it models latent topics to reflect salient user interest, whose topic mixture then guides encoding a user's history and topic words control decoding their self-introduction. For experiments, we collect a large-scale Twitter dataset, and extensive results show the superiority of our UTGED to the advanced encoder-decoder models without topic modeling.
Abstract (translated)
数百万用户在社交媒体上活跃。为了让用户更好地展示自己并与他人建立联系,我们探索了自动生成社交媒体自我介绍的功能,即简单的句子概述用户的个人兴趣。尽管先前的工作大多数涉及标签(例如年龄)的用户(例如),我们研究句子级别的自我介绍,以提供一个更加自然和有互动性的让用户互相了解的方式。在这里,我们利用用户的微博历史生成他们的自我介绍。任务非常困难,因为历史内容可能会很长、嘈杂,并表现出各种个人兴趣。为了解决这一挑战,我们提出了一个独特的主题引导编码解码框架(UTGED)框架。该框架模型潜在主题以反映敏锐的用户兴趣,其主题混合随后指导编码用户的历史,并主题词汇控制解码他们的自我介绍。为了进行实验,我们收集了大规模的推特数据集,广泛的结果表明,我们的UTGED比没有主题建模的高级编码解码模型优越。
URL
https://arxiv.org/abs/2305.15138