Paper Reading AI Learner

Constructing Colloquial Dataset for Persian Sentiment Analysis of Social Microblogs

2024-03-07 04:25:50
Mojtaba MazoochiICT Research Institute, Tehran, Iran, Leila RabieiIran Telecommunication Research Center, Farzaneh RahmaniComputer Department, Mehralborz University, Tehran, Iran, Zeinab RajabiComputer Department, Hazrat-e Masoumeh University, Qom, Iran

Abstract

Introduction: Microblogging websites have massed rich data sources for sentiment analysis and opinion mining. In this regard, sentiment classification has frequently proven inefficient because microblog posts typically lack syntactically consistent terms and representatives since users on these social networks do not like to write lengthy statements. Also, there are some limitations to low-resource languages. The Persian language has exceptional characteristics and demands unique annotated data and models for the sentiment analysis task, which are distinctive from text features within the English dialect. Method: This paper first constructs a user opinion dataset called ITRC-Opinion in a collaborative environment and insource way. Our dataset contains 60,000 informal and colloquial Persian texts from social microblogs such as Twitter and Instagram. Second, this study proposes a new architecture based on the convolutional neural network (CNN) model for more effective sentiment analysis of colloquial text in social microblog posts. The constructed datasets are used to evaluate the presented architecture. Furthermore, some models, such as LSTM, CNN-RNN, BiLSTM, and BiGRU with different word embeddings, including Fasttext, Glove, and Word2vec, investigated our dataset and evaluated the results. Results: The results demonstrate the benefit of our dataset and the proposed model (72% accuracy), displaying meaningful improvement in sentiment classification performance.

Abstract (translated)

简介:微博网站聚集了大量的情感分析和意见挖掘数据资源。在这方面,情感分类常常因为微博帖子通常缺乏句法一致的词汇和代表而证明效率低下。此外,低资源语言也存在一些限制。波斯语具有独特的特点,需要为情感分析任务提供独特的注释数据和模型,这与英式英语方言中的文本特征不同。方法:本文首先在一个合作和资源的环境中构建了一个用户意见数据集 called ITRC-Opinion。我们的数据集包含来自推特和Instagram等社交微博的60,000个非正式和俚语波斯语文本。接着,本研究提出了一种基于卷积神经网络(CNN)模型的新的架构,以更有效地分析社交微博中的流行文本的情感。构建的数据集用于评估所提出的架构。此外,一些模型,如LSTM、CNN-RNN、BiLSTM和BiGRU,使用不同的词向量,包括Fasttext、Glove和Word2vec,对数据集进行了调查并评估了结果。结果:结果表明,我们的数据集和所提出的模型的价值(72%的准确性),在情感分类性能上具有显著的提高。

URL

https://arxiv.org/abs/2306.12679

PDF

https://arxiv.org/pdf/2306.12679.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot