Paper Reading AI Learner

Do 'English' Named Entity Recognizers Work Well on Global Englishes?

2024-04-20 20:48:42
Alexander Shan, John Bauer, Riley Carlson, Christopher Manning

Abstract

The vast majority of the popular English named entity recognition (NER) datasets contain American or British English data, despite the existence of many global varieties of English. As such, it is unclear whether they generalize for analyzing use of English globally. To test this, we build a newswire dataset, the Worldwide English NER Dataset, to analyze NER model performance on low-resource English variants from around the world. We test widely used NER toolkits and transformer models, including models using the pre-trained contextual models RoBERTa and ELECTRA, on three datasets: a commonly used British English newswire dataset, CoNLL 2003, a more American focused dataset OntoNotes, and our global dataset. All models trained on the CoNLL or OntoNotes datasets experienced significant performance drops-over 10 F1 in some cases-when tested on the Worldwide English dataset. Upon examination of region-specific errors, we observe the greatest performance drops for Oceania and Africa, while Asia and the Middle East had comparatively strong performance. Lastly, we find that a combined model trained on the Worldwide dataset and either CoNLL or OntoNotes lost only 1-2 F1 on both test sets.

Abstract (translated)

绝大多数流行的英语命名实体识别(NER)数据集包含美国或英国英语数据,尽管存在许多全球英语变体。因此,它们是否适用于全球分析尚不确定。为了测试这一点,我们构建了一个新的新闻数据集,全球英语NER数据集,以分析低资源英语变种的NER模型性能。我们测试了广泛使用的NER工具包和Transformer模型,包括使用预训练上下文模型的RoBERTa和ELECTRA模型,在三个数据集上:一个常用的英国英语新闻数据集,CoNLL 2003,一个更侧重于美国的数据集OntoNotes,以及我们的全球数据集。在CoNLL或OntoNotes数据集上训练的所有模型,在测试 世界英语数据集 时,性能都出现了显著的下降-有时下降了10个F1分数以上。经过对地区特定错误的检查,我们观察到大洋洲和非洲的性能下降最大,而亚洲和中东地区则相对较强。最后,我们发现,在全局数据集上训练的联合模型,无论是使用CoNLL还是OntoNotes,在测试数据集上都只有1-2个F1分数的损失。

URL

https://arxiv.org/abs/2404.13465

PDF

https://arxiv.org/pdf/2404.13465.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot