Abstract
In the rapidly evolving landscape of social media, the introduction of new emojis in Unicode release versions presents a structured opportunity to explore digital language evolution. Analyzing a large dataset of sampled English tweets, we examine how newly released emojis gain traction and evolve in meaning. We find that community size of early adopters and emoji semantics are crucial in determining their popularity. Certain emojis experienced notable shifts in the meanings and sentiment associations during the diffusion process. Additionally, we propose a novel framework utilizing language models to extract words and pre-existing emojis with semantically similar contexts, which enhances interpretation of new emojis. The framework demonstrates its effectiveness in improving sentiment classification performance by substituting unknown new emojis with familiar ones. This study offers a new perspective in understanding how new language units are adopted, adapted, and integrated into the fabric of online communication.
Abstract (translated)
在社交媒体快速发展的环境中,Unicode 发布版本中引入新的 emoji 提供了一个有结构的机会来探讨数字语言演变。分析了一个大量的英文推特样本数据集,我们研究了的新发布的 emoji 如何获得关注并演变其意义。我们发现,社区规模和 emoji 语义是决定其流行度的重要因素。在扩散过程中,某些 emoji 的意义和情感关联经历了一些显著的变化。此外,我们提出了一种利用语言模型提取具有相似语义上下文的单词和预先存在的 emoji 的框架,这有助于解释新 emoji 的含义。该框架通过用熟悉的词汇替换未知的新 emoji 来提高情感分类性能,证明了其有效性。本研究提供了一种新的理解,即新的语言单位是如何被采用、适应和融入在线交流的织物。
URL
https://arxiv.org/abs/2402.14187