Paper Reading AI Learner

HWD: A Novel Evaluation Score for Styled Handwritten Text Generation

2023-10-31 09:44:27
Vittorio Pippi, Fabio Quattrini, Silvia Cascianelli, Rita Cucchiara

Abstract

Styled Handwritten Text Generation (Styled HTG) is an important task in document analysis, aiming to generate text images with the handwriting of given reference images. In recent years, there has been significant progress in the development of deep learning models for tackling this task. Being able to measure the performance of HTG models via a meaningful and representative criterion is key for fostering the development of this research topic. However, despite the current adoption of scores for natural image generation evaluation, assessing the quality of generated handwriting remains challenging. In light of this, we devise the Handwriting Distance (HWD), tailored for HTG evaluation. In particular, it works in the feature space of a network specifically trained to extract handwriting style features from the variable-lenght input images and exploits a perceptual distance to compare the subtle geometric features of handwriting. Through extensive experimental evaluation on different word-level and line-level datasets of handwritten text images, we demonstrate the suitability of the proposed HWD as a score for Styled HTG. The pretrained model used as backbone will be released to ease the adoption of the score, aiming to provide a valuable tool for evaluating HTG models and thus contributing to advancing this important research area.

Abstract (translated)

手写文本生成(Styled HTG)是文档分析中一个重要的任务,旨在生成给定参考图像的手写文本图像。近年来,在解决这个任务的深度学习模型的开发方面取得了显著的进展。通过一个有意义且具有代表性的标准来评估HTG模型的性能对于促进这个研究主题的发展至关重要。然而,尽管目前对于自然图像生成评估使用了一些分数,但评估生成手写的质量仍然具有挑战性。鉴于这一点,我们设计了一个专为HTG评估而设计的 Handwriting Distance(HWD)。 特别是,它在专门从变长输入图像中提取手写风格特征的网络的特征空间中工作,并利用感知距离来比较手写文本中微妙的几何特征。通过对手写文本图像的不同词级和行级数据集进行广泛的实验评估,我们证明了所提出的HWD可以作为Styled HTG的分数。作为基本骨架的预训练模型将发布,以促进对分数的采用,旨在为评估HTG模型提供有价值的工具,从而为发展这个重要研究领域做出贡献。

URL

https://arxiv.org/abs/2310.20316

PDF

https://arxiv.org/pdf/2310.20316.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot