Abstract
With the widespread use of social networks, detecting the topics discussed in these networks has become a significant challenge. The current works are mainly based on frequent pattern mining or semantic relations, and the language structure is not considered. The meaning of language structural methods is to discover the relationship between words and how humans understand them. Therefore, this paper uses the Concept of the Imitation of the Mental Ability of Word Association to propose a topic detection framework in social networks. This framework is based on the Human Word Association method. The performance of this method is evaluated on the FA-CUP dataset. It is a benchmark dataset in the field of topic detection. The results show that the proposed method is a good improvement compared to other methods, based on the Topic-recall and the keyword F1 measure. Also, most of the previous works in the field of topic detection are limited to the English language, and the Persian language, especially microblogs written in this language, is considered a low-resource language. Therefore, a data set of Telegram posts in the Farsi language has been collected. Applying the proposed method to this dataset also shows that this method works better than other topic detection methods.
Abstract (translated)
随着社交媒体的普及,检测在这些网络上讨论的主题已成为一项重大挑战。目前的研究主要基于频繁模式挖掘或语义关系,而语言结构未被考虑。语言结构方法的意义在于发现单词之间的关系以及人类如何理解和运用这些关系。因此,本文使用“单词联想 mental 能力”的概念,在社交媒体上提出了一个主题检测框架。该框架基于人类单词联想方法。该方法的性能在FA-CUP数据集上进行评估。它是主题检测领域的基准数据集。结果表明,相对于其他方法,基于主题召回和关键词F1测量的方法取得了很好的改进。此外,在主题检测领域的先前研究中,大部分局限于英语语言,而波斯语,特别是以该语言撰写的微博客,被认为是资源较少的语言。因此,收集了波斯语Telegram帖子的数据集。应用所提出的方法对该数据集也表明,这种方法比其他主题检测方法更有效。
URL
https://arxiv.org/abs/2301.13066