Abstract
The lack of a suitable tool for the analysis of conversational texts in the Persian language has made various analyses of these texts, including Sentiment Analysis, difficult. In this research, we tried to make the understanding of these texts easier for the machine by providing PSC, Persian Slang Converter, a tool for converting conversational texts into formal ones, and by using the most up-to-date and best deep learning methods along with the PSC, the sentiment learning of short Persian language texts for the machine in a better way. be made More than 10 million unlabeled texts from various social networks and movie subtitles (as Conversational texts) and about 10 million news texts (as formal texts) have been used for training unsupervised models and formal implementation of the tool. 60,000 texts from the comments of Instagram social network users with positive, negative, and neutral labels are considered supervised data for training the emotion classification model of short texts. Using the formal tool, 57% of the words of the corpus of conversation were converted. Finally, by using the formalizer, FastText model, and deep LSTM network, an accuracy of 81.91 was obtained on the test data.
Abstract (translated)
波斯语对话文本的分析缺乏适当的工具,包括情感分析,使得各种分析变得困难。在这项研究中,我们试图通过提供PSC(波斯语俚语转换器)、一个将对话文本转换为正式文本的工具,以及使用最先进的和最优秀的深度学习方法和PSC,更好地理解这些文本,使得机器更容易理解。已经使用了超过1000万无标签的社交媒体文本和电影字幕(作为对话文本)以及大约1000万正式文本(作为正式文本)进行训练,并正式发布了该工具。60,000篇来自Instagram社交网络用户正面、负面和中立标签的文本被认为是训练短文本情感分类模型的有监督数据。使用正式工具,将数据集的57%的单词转换为正式文本。最后,通过使用正式化器、FastText模型和深度LSTM网络,在测试数据上获得了81.91%的准确率。
URL
https://arxiv.org/abs/2403.06023