Abstract
ChatGPT has shown the potential of emerging general artificial intelligence capabilities, as it has demonstrated competent performance across many natural language processing tasks. In this work, we evaluate the capabilities of ChatGPT to perform text classification on three affective computing problems, namely, big-five personality prediction, sentiment analysis, and suicide tendency detection. We utilise three baselines, a robust language model (RoBERTa-base), a legacy word model with pretrained embeddings (Word2Vec), and a simple bag-of-words baseline (BoW). Results show that the RoBERTa trained for a specific downstream task generally has a superior performance. On the other hand, ChatGPT provides decent results, and is relatively comparable to the Word2Vec and BoW baselines. ChatGPT further shows robustness against noisy data, where Word2Vec models achieve worse results due to noise. Results indicate that ChatGPT is a good generalist model that is capable of achieving good results across various problems without any specialised training, however, it is not as good as a specialised model for a downstream task.
Abstract (translated)
ChatGPT展现了新兴通用人工智能能力的潜力,因为它在许多自然语言处理任务中表现出了出色的性能。在本研究中,我们评估了ChatGPT对三个情感计算问题进行文本分类的能力,这些问题分别是大五人格预测、情感分析和自杀倾向检测。我们使用了三个基准,一个稳健的语言模型(RoBERTa-base)、一个具有预训练嵌入的 legacy 词模型(Word2Vec)和一个简单的词袋基准(BoW)。结果显示,RoBERTa训练出的特定下游任务通常表现更好。另一方面,ChatGPT提供了不错的结果,与Word2Vec和BoW基准相当。ChatGPT还表现出对噪声数据的可靠性,因为Word2Vec模型因为噪声而出现更差的结果。结果显示,ChatGPT是一个通用性模型,能够在各种问题上获得良好的结果,但是与针对特定下游任务专门的模型相比,它的表现并不理想。
URL
https://arxiv.org/abs/2303.03186