Abstract
In the realm of digital forensics and document authentication, writer identification plays a crucial role in determining the authors of documents based on handwriting styles. The primary challenge in writer-id is the "open-set scenario", where the goal is accurately recognizing writers unseen during the model training. To overcome this challenge, representation learning is the key. This method can capture unique handwriting features, enabling it to recognize styles not previously encountered during training. Building on this concept, this paper introduces the Contrastive Masked Auto-Encoders (CMAE) for Character-level Open-Set Writer Identification. We merge Masked Auto-Encoders (MAE) with Contrastive Learning (CL) to simultaneously and respectively capture sequential information and distinguish diverse handwriting styles. Demonstrating its effectiveness, our model achieves state-of-the-art (SOTA) results on the CASIA online handwriting dataset, reaching an impressive precision rate of 89.7%. Our study advances universal writer-id with a sophisticated representation learning approach, contributing substantially to the ever-evolving landscape of digital handwriting analysis, and catering to the demands of an increasingly interconnected world.
Abstract (translated)
在数字取证和文档认证领域,作者识别通过分析书写风格来确定文档的作者身份,扮演着至关重要的角色。作者识别(writer-id)的主要挑战在于“开放集场景”,即目标是准确地识别出那些未在模型训练期间见过的作者。为应对这一挑战,表示学习方法至关重要,该方法能够捕捉到独特的手写特征,从而能够在未曾遇到过的书写风格中进行识别。 在此基础上,本文介绍了字符级开放集作者识别中的对比掩码自动编码器(Contrastive Masked Auto-Encoders, CMAE)。我们结合了掩码自动编码器(Masked Auto-Encoders, MAE)与对比学习(Contrastive Learning, CL),以同时且分别地捕捉序列信息和区分多样化的书写风格。通过在CASIA在线手写数据集上的实验,我们的模型取得了最先进的精度率89.7%的成绩,证明了其有效性。 本研究通过一种复杂的表示学习方法推进了通用作者识别技术的发展,并为不断演变的数字笔迹分析领域做出了重要贡献,同时也满足了一个日益互联的世界的需求。
URL
https://arxiv.org/abs/2501.11895