Abstract
The classification of statements provided by individuals during police interviews is a complex and significant task within the domain of natural language processing (NLP) and legal informatics. The lack of extensive domain-specific datasets raises challenges to the advancement of NLP methods in the field. This paper aims to address some of the present challenges by introducing a novel dataset tailored for classification of statements made during police interviews, prior to court proceedings. Utilising the curated dataset for training and evaluation, we introduce a fine-tuned DistilBERT model that achieves state-of-the-art performance in distinguishing truthful from deceptive statements. To enhance interpretability, we employ explainable artificial intelligence (XAI) methods to offer explainability through saliency maps, that interpret the model's decision-making process. Lastly, we present an XAI interface that empowers both legal professionals and non-specialists to interact with and benefit from our system. Our model achieves an accuracy of 86%, and is shown to outperform a custom transformer architecture in a comparative study. This holistic approach advances the accessibility, transparency, and effectiveness of statement analysis, with promising implications for both legal practice and research.
Abstract (translated)
在自然语言处理(NLP)和法律信息学领域,对犯罪嫌疑人在警讯中提供的陈述进行分类是一个复杂而重要的任务。缺乏广泛的领域特定数据集会挑战NLP方法在领域的发展。本文旨在通过引入一个针对警讯中陈述分类的新型数据集来解决一些现有挑战,该数据集在训练和评估过程中使用了经过挑选的数据集。通过训练和评估来微调预先训练的DistilBERT模型,该模型在区分真实陈述和虚假陈述方面实现了最先进的性能。为了提高可解释性,我们采用了解释性人工智能(XAI)方法,通过置信度图提供置信度,解释了模型的决策过程。最后,我们提出了一个XAI界面,使法律专业人员和非专业人士能够与我们系统互动并从中受益。我们的模型实现了86%的准确率,并在比较研究中证明了其优于自定义Transformer架构的性能。这种全面的方法推动了陈述分析的可用性、透明度和有效性,对法律实践和研究具有积极的意义。
URL
https://arxiv.org/abs/2405.10702