Abstract
Automating high-volume unstructured data processing is essential for operational efficiency. Optical Character Recognition (OCR) is critical but often struggles with accuracy and efficiency in complex layouts and ambiguous text. These challenges are especially pronounced in large-scale tasks requiring both speed and precision. This paper introduces LMV-RPA, a Large Model Voting-based Robotic Process Automation system to enhance OCR workflows. LMV-RPA integrates outputs from OCR engines such as Paddle OCR, Tesseract OCR, Easy OCR, and DocTR with Large Language Models (LLMs) like LLaMA 3 and Gemini-1.5-pro. Using a majority voting mechanism, it processes OCR outputs into structured JSON formats, improving accuracy, particularly in complex layouts. The multi-phase pipeline processes text extracted by OCR engines through LLMs, combining results to ensure the most accurate outputs. LMV-RPA achieves 99 percent accuracy in OCR tasks, surpassing baseline models with 94 percent, while reducing processing time by 80 percent. Benchmark evaluations confirm its scalability and demonstrate that LMV-RPA offers a faster, more reliable, and efficient solution for automating large-scale document processing tasks.
Abstract (translated)
自动化高容量非结构化数据处理对于操作效率至关重要。光学字符识别(OCR)虽然关键,但在复杂布局和模糊文本中往往难以保证准确性和效率。这些挑战在需要同时具备速度与精度的大规模任务中尤为突出。本文介绍了一种基于大模型投票的机器人流程自动化系统——LMV-RPA,以提升OCR工作流的效果。LMV-RPA集成了包括Paddle OCR、Tesseract OCR、Easy OCR和DocTR在内的多个OCR引擎输出,并结合了诸如LLaMA 3和Gemini-1.5-pro等大型语言模型(LLMs)。通过多数投票机制,它将OCR的输出转换为结构化的JSON格式,特别是在复杂布局中提高了准确率。多阶段管道流程对OCR引擎提取出的文字进行处理并整合结果,以确保最高准确性。LMV-RPA在OCR任务中的准确率达到99%,优于基准模型94%的准确率,并且降低了80%的处理时间。基准评估确认了其可扩展性,并表明LMV-RPA为自动化大规模文档处理任务提供了一个更快、更可靠和高效的解决方案。
URL
https://arxiv.org/abs/2412.17965