Abstract
Text classification is a fundamental task in natural language processing (NLP). Several recent studies show the success of deep learning on text processing. Convolutional neural network (CNN), as a popular deep learning model, has shown remarkable success in the task of text classification. In this paper, new baseline models have been studied for text classification using CNN. In these models, documents are fed to the network as a three-dimensional tensor representation to provide sentence-level analysis. Applying such a method enables the models to take advantage of the positional information of the sentences in the text. Besides, analysing adjacent sentences allows extracting additional features. The proposed models have been compared with the state-of-the-art models using several datasets. The results have shown that the proposed models have better performance, particularly in the longer documents.
Abstract (translated)
文本分类是自然语言处理(NLP)的基本概念任务。多项最近的研究表明深度学习在文本处理方面取得了成功。卷积神经网络(CNN)是一种常见的深度学习模型,在文本分类任务中取得了显著的成功。在本文中,使用CNN进行了文本分类的新基线模型进行研究。在这些模型中,文档以三维张量表示向网络提供句子级别的分析。应用这种方法使模型能够利用文本中句子的位置信息。此外,分析相邻句子可以提取额外的特征。提出的模型与最先进的模型使用多个数据集进行比较。结果表明,提出的模型在较长的文档中表现更好。
URL
https://arxiv.org/abs/2301.11696