Comparing the Performance of NLP Toolkits and Evaluation measures in Legal Tech

Abstract
Abstract (translated)
URL
PDF

Abstract

Recent developments in Natural Language Processing have led to the introduction of state-of-the-art Neural Language Models, enabled with unsupervised transferable learning, using different pretraining objectives. While these models achieve excellent results on the downstream NLP tasks, various domain adaptation techniques can improve their performance on domain-specific tasks. We compare and analyze the pretrained Neural Language Models, XLNet (autoregressive), and BERT (autoencoder) on the Legal Tasks. Results show that XLNet Model performs better on our Sequence Classification task of Legal Opinions Classification, whereas BERT produces better results on the NER task. We use domain-specific pretraining and additional legal vocabulary to adapt BERT Model further to the Legal Domain. We prepared multiple variants of the BERT Model, using both methods and their combination. Comparing our variants of the BERT Model, specializing in the Legal Domain, we conclude that both additional pretraining and vocabulary techniques enhance the BERT model's performance on the Legal Opinions Classification task. Additional legal vocabulary improves BERT's performance on the NER task. Combining the pretraining and vocabulary techniques further improves the final results. Our Legal-Vocab-BERT Model gives the best results on the Legal Opinions Task, outperforming the larger pretrained general Language Models, i.e., BERT-Base and XLNet-Base.

Abstract (translated)

URL

https://arxiv.org/abs/2103.11792

PDF

https://arxiv.org/pdf/2103.11792.pdf