Paper Reading AI Learner

Zero-shot Generation of Training Data with Denoising Diffusion Probabilistic Model for Handwritten Chinese Character Recognition

2023-05-25 02:13:37
Dongnan Gui, Kai Chen, Haisong Ding, Qiang Huo

Abstract

There are more than 80,000 character categories in Chinese while most of them are rarely used. To build a high performance handwritten Chinese character recognition (HCCR) system supporting the full character set with a traditional approach, many training samples need be collected for each character category, which is both time-consuming and expensive. In this paper, we propose a novel approach to transforming Chinese character glyph images generated from font libraries to handwritten ones with a denoising diffusion probabilistic model (DDPM). Training from handwritten samples of a small character set, the DDPM is capable of mapping printed strokes to handwritten ones, which makes it possible to generate photo-realistic and diverse style handwritten samples of unseen character categories. Combining DDPM-synthesized samples of unseen categories with real samples of other categories, we can build an HCCR system to support the full character set. Experimental results on CASIA-HWDB dataset with 3,755 character categories show that the HCCR systems trained with synthetic samples perform similarly with the one trained with real samples in terms of recognition accuracy. The proposed method has the potential to address HCCR with a larger vocabulary.

Abstract (translated)

中文字符有超过80,000个分类,但大部分很少被使用。通过传统的方法建立一个支持全部字符集的高性能手写中文字符识别系统,需要为每个字符类别收集许多训练样本,这既耗时又昂贵。在本文中,我们提出了一种 novel 的方法,使用一种denoising diffusion probabilistic模型(DDPM)将从字体库生成的中文字符glyph图像转换为手写图像,从而实现手写字符的去噪。通过训练小型字符集的手写样本,DDPM能够将打印 strokes 映射到手写 ones,从而生成从未见过的分类类别的逼真和多样化的手写样本。将 DDPM-合成的未知分类类别样本与其他类别的真实样本组合在一起,可以构建一个支持全部字符集的 HCCR 系统。针对CASIA-HWDB数据集,包含3,755个字符类别的实验结果显示,使用合成样本训练的 HCCR 系统在识别精度方面与使用真实样本训练的系统类似。该方法有潜力解决使用更大量词汇的 HCCR。

URL

https://arxiv.org/abs/2305.15660

PDF

https://arxiv.org/pdf/2305.15660.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot