Paper Reading AI Learner

Contrastive Masked Autoencoders for Character-Level Open-Set Writer Identification

2025-01-21 05:15:10
Xiaowei Jiang, Wenhao Ma, Yiqun Duan, Thomas Do, Chin-Teng Lin

Abstract

In the realm of digital forensics and document authentication, writer identification plays a crucial role in determining the authors of documents based on handwriting styles. The primary challenge in writer-id is the "open-set scenario", where the goal is accurately recognizing writers unseen during the model training. To overcome this challenge, representation learning is the key. This method can capture unique handwriting features, enabling it to recognize styles not previously encountered during training. Building on this concept, this paper introduces the Contrastive Masked Auto-Encoders (CMAE) for Character-level Open-Set Writer Identification. We merge Masked Auto-Encoders (MAE) with Contrastive Learning (CL) to simultaneously and respectively capture sequential information and distinguish diverse handwriting styles. Demonstrating its effectiveness, our model achieves state-of-the-art (SOTA) results on the CASIA online handwriting dataset, reaching an impressive precision rate of 89.7%. Our study advances universal writer-id with a sophisticated representation learning approach, contributing substantially to the ever-evolving landscape of digital handwriting analysis, and catering to the demands of an increasingly interconnected world.

Abstract (translated)

在数字取证和文档认证领域,作者识别通过分析书写风格来确定文档的作者身份,扮演着至关重要的角色。作者识别(writer-id)的主要挑战在于“开放集场景”,即目标是准确地识别出那些未在模型训练期间见过的作者。为应对这一挑战,表示学习方法至关重要,该方法能够捕捉到独特的手写特征,从而能够在未曾遇到过的书写风格中进行识别。 在此基础上,本文介绍了字符级开放集作者识别中的对比掩码自动编码器(Contrastive Masked Auto-Encoders, CMAE)。我们结合了掩码自动编码器(Masked Auto-Encoders, MAE)与对比学习(Contrastive Learning, CL),以同时且分别地捕捉序列信息和区分多样化的书写风格。通过在CASIA在线手写数据集上的实验,我们的模型取得了最先进的精度率89.7%的成绩,证明了其有效性。 本研究通过一种复杂的表示学习方法推进了通用作者识别技术的发展,并为不断演变的数字笔迹分析领域做出了重要贡献,同时也满足了一个日益互联的世界的需求。

URL

https://arxiv.org/abs/2501.11895

PDF

https://arxiv.org/pdf/2501.11895.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot