Abstract
Artificial intelligence and machine learning have significantly bolstered the technological world. This paper explores the potential of transfer learning in natural language processing focusing mainly on sentiment analysis. The models trained on the big data can also be used where data are scarce. The claim is that, compared to training models from scratch, transfer learning, using pre-trained BERT models, can increase sentiment classification accuracy. The study adopts a sophisticated experimental design that uses the IMDb dataset of sentimentally labelled movie reviews. Pre-processing includes tokenization and encoding of text data, making it suitable for NLP models. The dataset is used on a BERT based model, measuring its performance using accuracy. The result comes out to be 100 per cent accurate. Although the complete accuracy could appear impressive, it might be the result of overfitting or a lack of generalization. Further analysis is required to ensure the model's ability to handle diverse and unseen data. The findings underscore the effectiveness of transfer learning in NLP, showcasing its potential to excel in sentiment analysis tasks. However, the research calls for a cautious interpretation of perfect accuracy and emphasizes the need for additional measures to validate the model's generalization.
Abstract (translated)
人工智能和机器学习在很大程度上推动了科技发展。本文主要探讨自然语言处理中迁移学习的潜力,重点关注情感分析。使用大数据训练的模型也可以在没有数据的情况下使用。论文认为,与从头训练模型相比,使用预训练的BERT模型进行迁移学习可以提高情感分类准确性。研究采用了一种复杂的实验设计,使用了情感标注的电影评论的IMDb数据集。预处理包括对文本数据的分词和编码,使其适合自然语言处理模型。数据集应用于基于BERT的模型,通过准确性来衡量其性能。结果表明,准确率为100%。尽管完整的准确性可能会令人印象深刻,但它可能是过拟合或泛化不足的结果。需要进一步分析以确保模型能够处理多样化和未见过的数据。研究结果强调了迁移学习在自然语言处理中的有效性,展示了它在情感分析任务中取得优异表现的潜力。然而,研究呼吁对完美准确度的谨慎解释,并强调需要额外的措施来验证模型的泛化能力。
URL
https://arxiv.org/abs/2311.16965