Segmentation-free Connectionist Temporal Classification loss based OCR Model for Text Captcha Classification

Abstract
Abstract (translated)
URL
PDF

Abstract

Captcha are widely used to secure systems from automatic responses by distinguishing computer responses from human responses. Text, audio, video, picture picture-based Optical Character Recognition (OCR) are used for creating captcha. Text-based OCR captcha are the most often used captcha which faces issues namely, complex and distorted contents. There are attempts to build captcha detection and classification-based systems using machine learning and neural networks, which need to be tuned for accuracy. The existing systems face challenges in the recognition of distorted characters, handling variable-length captcha and finding sequential dependencies in captcha. In this work, we propose a segmentation-free OCR model for text captcha classification based on the connectionist temporal classification loss technique. The proposed model is trained and tested on a publicly available captcha dataset. The proposed model gives 99.80\% character level accuracy, while 95\% word level accuracy. The accuracy of the proposed model is compared with the state-of-the-art models and proves to be effective. The variable length complex captcha can be thus processed with the segmentation-free connectionist temporal classification loss technique with dependencies which will be massively used in securing the software systems.

Abstract (translated)

Captcha 广泛用于通过区分计算机响应与人类响应来保护系统免受自动回复。文本、音频、视频和基于图像的图像识别（OCR）用于创建 captcha。基于文本的 OCR captcha 是最常见的 captcha，它面临复杂和扭曲的内容的问题。尝试使用机器学习和神经网络基于 captcha 检测和分类构建系统，这些系统需要进行精度调整。现有的系统在识别扭曲的 characters、处理变长 captcha 和寻找序列依赖方面面临挑战。在本文中，我们提出了一种基于连接主义时间分类损失技术文本captcha分类的分割免费 OCR 模型。所提出的模型在公开可用的 captcha 数据集上进行训练和测试。所提出的模型在字符级别具有 99.80\% 的准确率，而在单词级别具有 95\% 的准确率。与最先进的模型进行比较证明效果显著。因此，可以使用基于分割免费连接主义时间分类损失技术处理变长复杂 captcha。

URL

https://arxiv.org/abs/2402.05417

PDF

https://arxiv.org/pdf/2402.05417.pdf

Segmentation-free Connectionist Temporal Classification loss based OCR Model for Text Captcha Classification

Abstract

Abstract (translated)

URL

PDF Copy

PDF