An Experimental Investigation of Part-Of-Speech Taggers for Vietnamese

2022-06-14 17:07:28

Tuan-Phong Nguyen, Quoc-Tuan Truong, Xuan-Nam Nguyen, Anh-Cuong Le

arXiv_CL

arXiv_CL Recognition Speech

Abstract
Abstract (translated)
URL
PDF

Abstract

Part-of-speech (POS) tagging plays an important role in Natural Language Processing (NLP). Its applications can be found in many NLP tasks such as named entity recognition, syntactic parsing, dependency parsing and text chunking. In the investigation conducted in this paper, we utilize the technologies of two widely-used toolkits, ClearNLP and Stanford POS Tagger, as well as develop two new POS taggers for Vietnamese, then compare them to three well-known Vietnamese taggers, namely JVnTagger, vnTagger and RDRPOSTagger. We make a systematic comparison to find out the tagger having the best performance. We also design a new feature set to measure the performance of the statistical taggers. Our new taggers built from Stanford Tagger and ClearNLP with the new feature set can outperform all other current Vietnamese taggers in term of tagging accuracy. Moreover, we also analyze the affection of some features to the performance of statistical taggers. Lastly, the experimental results also reveal that the transformation-based tagger, RDRPOSTagger, can run significantly faster than any other statistical tagger.

Abstract (translated)

URL

https://arxiv.org/abs/2206.06992

PDF

https://arxiv.org/pdf/2206.06992.pdf