Paper Reading AI Learner

SLOGAN: Handwriting Style Synthesis for Arbitrary-Length and Out-of-Vocabulary Text

2022-02-23 12:13:27
Canjie Luo, Yuanzhi Zhu, Lianwen Jin, Zhe Li, Dezhi Peng

Abstract

Large amounts of labeled data are urgently required for the training of robust text recognizers. However, collecting handwriting data of diverse styles, along with an immense lexicon, is considerably expensive. Although data synthesis is a promising way to relieve data hunger, two key issues of handwriting synthesis, namely, style representation and content embedding, remain unsolved. To this end, we propose a novel method that can synthesize parameterized and controllable handwriting Styles for arbitrary-Length and Out-of-vocabulary text based on a Generative Adversarial Network (GAN), termed SLOGAN. Specifically, we propose a style bank to parameterize the specific handwriting styles as latent vectors, which are input to a generator as style priors to achieve the corresponding handwritten styles. The training of the style bank requires only the writer identification of the source images, rather than attribute annotations. Moreover, we embed the text content by providing an easily obtainable printed style image, so that the diversity of the content can be flexibly achieved by changing the input printed image. Finally, the generator is guided by dual discriminators to handle both the handwriting characteristics that appear as separated characters and in a series of cursive joins. Our method can synthesize words that are not included in the training vocabulary and with various new styles. Extensive experiments have shown that high-quality text images with great style diversity and rich vocabulary can be synthesized using our method, thereby enhancing the robustness of the recognizer.

Abstract (translated)

URL

https://arxiv.org/abs/2202.11456

PDF

https://arxiv.org/pdf/2202.11456.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot