Abstract
Limited public understanding of legal processes and inconsistent verdicts in the Indonesian court system led to widespread dissatisfaction and increased stress on judges. This study addresses these issues by developing a deep learning-based predictive system for court sentence lengths. Our hybrid model, combining CNN and BiLSTM with attention mechanism, achieved an R-squared score of 0.5893, effectively capturing both local patterns and long-term dependencies in legal texts. While document summarization proved ineffective, using only the top 30% most frequent tokens increased prediction performance, suggesting that focusing on core legal terminology balances information retention and computational efficiency. We also implemented a modified text normalization process, addressing common errors like misspellings and incorrectly merged words, which significantly improved the model's performance. These findings have important implications for automating legal document processing, aiding both professionals and the public in understanding court judgments. By leveraging advanced NLP techniques, this research contributes to enhancing transparency and accessibility in the Indonesian legal system, paving the way for more consistent and comprehensible legal decisions.
Abstract (translated)
公众对法律程序的了解有限,加上印度尼西亚法院系统的判决不一致,导致了广泛的不满,并增加了法官的压力。本研究通过开发基于深度学习的预测系统来解决这些问题,该系统用于预测法院判刑长度。我们的混合模型结合了CNN和BiLSTM以及注意力机制,在法律文本中有效捕捉局部模式和长期依赖关系,实现了0.5893的R平方得分。尽管文档摘要无效,但仅使用最频繁出现的前30%标记显著提高了预测性能,这表明专注于核心法律术语在保持信息量与计算效率之间找到了平衡。我们还实施了一种修改后的文本归一化过程,解决了常见的错误如拼写错误和不正确的单词合并问题,极大地提升了模型的表现。这些发现对自动化法律文件处理具有重要意义,有助于专业人士及公众理解法院判决。通过利用先进的自然语言处理技术,本研究为提高印度尼西亚司法系统的透明度和可访问性做出了贡献,并为进一步实现更一致、易懂的法律决策铺平了道路。
URL
https://arxiv.org/abs/2410.20104