Paper Reading AI Learner

Imbalanced Sentiment Classification Enhanced with Discourse Marker

2019-03-28 12:38:58
Tao Zhang, Xing Wu, Meng Lin, Jizhong Han, Songlin Hu

Abstract

Imbalanced data commonly exists in real world, espacially in sentiment-related corpus, making it difficult to train a classifier to distinguish latent sentiment in text data. We observe that humans often express transitional emotion between two adjacent discourses with discourse markers like "but", "though", "while", etc, and the head discourse and the tail discourse 3 usually indicate opposite emotional tendencies. Based on this observation, we propose a novel plug-and-play method, which first samples discourses according to transitional discourse markers and then validates sentimental polarities with the help of a pretrained attention-based model. Our method increases sample diversity in the first place, can serve as a upstream preprocessing part in data augmentation. We conduct experiments on three public sentiment datasets, with several frequently used algorithms. Results show that our method is found to be consistently effective, even in highly imbalanced scenario, and easily be integrated with oversampling method to boost the performance on imbalanced sentiment classification.

Abstract (translated)

不平衡数据通常存在于现实世界中,尤其是与情感相关的语料库中,因此很难训练分类器来区分文本数据中的潜在情感。我们观察到,人类经常用“但是”、“尽管”、“同时”等话语标记来表达相邻两个话语之间的过渡情感,而头语篇和尾语篇3通常表现出相反的情感倾向。基于这一观察,我们提出了一种新的即插即用的方法,首先根据过渡话语标记对话语进行样本分析,然后借助预先训练的基于注意力的模型验证情感的极端性。该方法首先提高了样本的多样性,可以作为数据增强的上游预处理部分。我们使用几种常用算法对三个公众情绪数据集进行了实验。结果表明,我们的方法具有一致的有效性,即使是在高度不平衡的情况下,也很容易与过度抽样方法相结合,以提高不平衡情绪分类的绩效。

URL

https://arxiv.org/abs/1903.11919

PDF

https://arxiv.org/pdf/1903.11919.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot