The use of artificial intelligence technology in education is growing rapidly, with increasing attention being paid to handwritten mathematical expression recognition (HMER) by researchers. However, many existing methods for HMER may fail to accurately read formulas with complex structures, as the attention results can be inaccurate due to illegible handwriting or large variations in writing styles. Our proposed Intelligent-Detection Network (IDN) for HMER differs from traditional encoder-decoder methods by utilizing object detection techniques. Specifically, we have developed an enhanced YOLOv7 network that can accurately detect both digital and symbolic objects. The detection results are then integrated into the bidirectional gated recurrent unit (BiGRU) and the baseline symbol relationship tree (BSRT) to determine the relationships between symbols and numbers. The experiments demonstrate that the proposed method outperforms those encoder-decoder networks in recognizing complex handwritten mathematical expressions. This is due to the precise detection of symbols and numbers. Our research has the potential to make valuable contributions to the field of HMER. This could be applied in various practical scenarios, such as assignment grading in schools and information entry of paper documents.
人工智能技术在教育领域的应用迅速增长,研究人员越来越关注手写数学表达识别(HMER)。然而,许多现有方法可能无法准确地识别具有复杂结构的复杂数学公式,因为注意结果可能因不清晰的手写或写作风格的巨大差异而出现误差。我们提出的智能检测网络(IDN) for HMER与传统编码器-解码器方法不同,因为它利用了物体检测技术。具体来说,我们开发了一个增强的 YOLOv7 网络,可以准确检测数字和符号物体。检测结果 then 集成到双向门控循环单元(BiGRU)和基线符号关系树(BSRT)中,以确定符号和数字之间的关系。实验证明,与传统的编码器-解码器网络相比,该方法在识别复杂手写数学表达式方面表现出优异性能。这是由于符号和数字的准确检测。我们的研究有望为 H梅尔领域做出有价值的贡献。这可以在各种实际场景中应用,例如学校中的作业评分和纸质文件的信息录入等。
https://arxiv.org/abs/2311.15273
Background and objectives: Dynamic handwriting analysis, due to its non-invasive and readily accessible nature, has recently emerged as a vital adjunctive method for the early diagnosis of Parkinson's disease. In this study, we design a compact and efficient network architecture to analyse the distinctive handwriting patterns of patients' dynamic handwriting signals, thereby providing an objective identification for the Parkinson's disease diagnosis. Methods: The proposed network is based on a hybrid deep learning approach that fully leverages the advantages of both long short-term memory (LSTM) and convolutional neural networks (CNNs). Specifically, the LSTM block is adopted to extract the time-varying features, while the CNN-based block is implemented using one-dimensional convolution for low computational cost. Moreover, the hybrid model architecture is continuously refined under ablation studies for superior performance. Finally, we evaluate the proposed method with its generalization under a five-fold cross-validation, which validates its efficiency and robustness. Results: The proposed network demonstrates its versatility by achieving impressive classification accuracies on both our new DraWritePD dataset ($96.2\%$) and the well-established PaHaW dataset ($90.7\%$). Moreover, the network architecture also stands out for its excellent lightweight design, occupying a mere $0.084$M of parameters, with a total of only $0.59$M floating-point operations. It also exhibits near real-time CPU inference performance, with inference times ranging from $0.106$ to $0.220$s. Conclusions: We present a series of experiments with extensive analysis, which systematically demonstrate the effectiveness and efficiency of the proposed hybrid neural network in extracting distinctive handwriting patterns for precise diagnosis of Parkinson's disease.
背景和目标:由于其非侵入性和易于访问的特点,近年来动态手写分析已成为早期诊断帕金森病的实用方法。在这项研究中,我们设计了一个紧凑且高效的网络架构,用于分析患者动态手写信号的显著特征,从而为帕金森病诊断提供客观依据。方法:所提出的网络基于一种结合长短时记忆(LSTM)和卷积神经网络(CNN)的优势的混合深度学习方法。具体来说,LSTM模块用于提取时间变化特征,而基于CNN的模块则使用一维卷积进行低计算成本的实现。此外,在消融研究中对模型架构进行了持续改进,以提高性能。最后,我们在五倍交叉验证上评估所提出的方法,验证了其效率和稳健性。结果:与我们的新DraWritePD数据集($96.2\%)和已知的有成效的PaHaW数据集($90.7\%)相比,所提出的网络在分类准确性方面都表现出惊人的效果。此外,网络架构还因其轻量级设计而脱颖而出,仅占0.084M的参数,总共有0.59M的浮点运算。它还表现出近乎实时的CPU推理性能,推理时间从0.106到0.220秒。结论:我们提供了系列实验,详细分析了所提出的混合神经网络在提取帕金森病独特手写模式方面的有效性和效率。实验结果表明,所提出的混合神经网络在精确诊断帕金森病方面具有出色的效果和效率。
https://arxiv.org/abs/2311.11756
Writing assistance is an application closely related to human life and is also a fundamental Natural Language Processing (NLP) research field. Its aim is to improve the correctness and quality of input texts, with character checking being crucial in detecting and correcting wrong characters. From the perspective of the real world where handwriting occupies the vast majority, characters that humans get wrong include faked characters (i.e., untrue characters created due to writing errors) and misspelled characters (i.e., true characters used incorrectly due to spelling errors). However, existing datasets and related studies only focus on misspelled characters mainly caused by phonological or visual confusion, thereby ignoring faked characters which are more common and difficult. To break through this dilemma, we present Visual-C$^3$, a human-annotated Visual Chinese Character Checking dataset with faked and misspelled Chinese characters. To the best of our knowledge, Visual-C$^3$ is the first real-world visual and the largest human-crafted dataset for the Chinese character checking scenario. Additionally, we also propose and evaluate novel baseline methods on Visual-C$^3$. Extensive empirical results and analyses show that Visual-C$^3$ is high-quality yet challenging. The Visual-C$^3$ dataset and the baseline methods will be publicly available to facilitate further research in the community.
写作协助是一个与人类生活和自然语言处理(NLP)密切相关并作为NLP研究领域的基本应用。其目标是提高输入文本的正确性和质量,其中字符检查在检测和纠正错误字符方面至关重要。从现实世界的角度来看,人类会犯错包括由于书写错误而创建的虚假字符和拼写错误导致的真实字符。然而,现有数据集和相关研究主要关注由于音标或视觉混淆引起的拼写错误主要字符,而忽略了更加普遍和困难的伪造字符。为了突破这一困境,我们提出了Visual-C$^3$,一个由人类标注的视觉中文字符检查数据集,包括伪造和拼写错误的中文字符。据我们所知,Visual-C$^3$是第一个真实世界的视觉中文字符检查数据集,也是中文字符检查场景中最大的人造数据集。此外,我们还针对Visual-C$^3$提出了并评估了新型基准方法。大量的实证结果和分析表明,Visual-C$^3$具有高质量但具有挑战性。Visual-C$^3$数据集和基准方法将公开发布,以促进社区进一步研究。
https://arxiv.org/abs/2311.11268
Styled Handwritten Text Generation (Styled HTG) is an important task in document analysis, aiming to generate text images with the handwriting of given reference images. In recent years, there has been significant progress in the development of deep learning models for tackling this task. Being able to measure the performance of HTG models via a meaningful and representative criterion is key for fostering the development of this research topic. However, despite the current adoption of scores for natural image generation evaluation, assessing the quality of generated handwriting remains challenging. In light of this, we devise the Handwriting Distance (HWD), tailored for HTG evaluation. In particular, it works in the feature space of a network specifically trained to extract handwriting style features from the variable-lenght input images and exploits a perceptual distance to compare the subtle geometric features of handwriting. Through extensive experimental evaluation on different word-level and line-level datasets of handwritten text images, we demonstrate the suitability of the proposed HWD as a score for Styled HTG. The pretrained model used as backbone will be released to ease the adoption of the score, aiming to provide a valuable tool for evaluating HTG models and thus contributing to advancing this important research area.
手写文本生成(Styled HTG)是文档分析中一个重要的任务,旨在生成给定参考图像的手写文本图像。近年来,在解决这个任务的深度学习模型的开发方面取得了显著的进展。通过一个有意义且具有代表性的标准来评估HTG模型的性能对于促进这个研究主题的发展至关重要。然而,尽管目前对于自然图像生成评估使用了一些分数,但评估生成手写的质量仍然具有挑战性。鉴于这一点,我们设计了一个专为HTG评估而设计的 Handwriting Distance(HWD)。 特别是,它在专门从变长输入图像中提取手写风格特征的网络的特征空间中工作,并利用感知距离来比较手写文本中微妙的几何特征。通过对手写文本图像的不同词级和行级数据集进行广泛的实验评估,我们证明了所提出的HWD可以作为Styled HTG的分数。作为基本骨架的预训练模型将发布,以促进对分数的采用,旨在为评估HTG模型提供有价值的工具,从而为发展这个重要研究领域做出贡献。
https://arxiv.org/abs/2310.20316
Human demonstrations of trajectories are an important source of training data for many machine learning problems. However, the difficulty of collecting human demonstration data for complex tasks makes learning efficient representations of those trajectories challenging. For many problems, such as for handwriting or for quasistatic dexterous manipulation, the exact timings of the trajectories should be factored from their spatial path characteristics. In this work, we propose TimewarpVAE, a fully differentiable manifold-learning algorithm that incorporates Dynamic Time Warping (DTW) to simultaneously learn both timing variations and latent factors of spatial variation. We show how the TimewarpVAE algorithm learns appropriate time alignments and meaningful representations of spatial variations in small handwriting and fork manipulation datasets. Our results have lower spatial reconstruction test error than baseline approaches and the learned low-dimensional representations can be used to efficiently generate semantically meaningful novel trajectories.
人类演示轨迹的数据对许多机器学习问题来说是一种重要的训练数据来源。然而,收集复杂任务中的人类演示数据具有挑战性,使得学习有效表示这些轨迹具有困难。对于许多问题,如手写或准静态灵巧操作,轨迹的准确时间应该从其空间路径特征中计算出来。在本文中,我们提出TimewarpVAE,一种完全可导的流形学习算法,它结合了动态时间平移(DTW)来同时学习时差变化和空间变化中的隐含因素。我们证明了TimewarpVAE算法在小型手写和叉操作数据集上学会了适当的时间对齐和有意义的空间变化表示。我们的结果具有比基线方法更低的空间重构测试误差,并且学到的低维表示可用于有效地生成具有语义意义的新的轨迹。
https://arxiv.org/abs/2310.16027
Self-supervised learning offers an efficient way of extracting rich representations from various types of unlabeled data while avoiding the cost of annotating large-scale datasets. This is achievable by designing a pretext task to form pseudo labels with respect to the modality and domain of the data. Given the evolving applications of online handwritten texts, in this study, we propose the novel Part of Stroke Masking (POSM) as a pretext task for pretraining models to extract informative representations from the online handwriting of individuals in English and Chinese languages, along with two suggested pipelines for fine-tuning the pretrained models. To evaluate the quality of the extracted representations, we use both intrinsic and extrinsic evaluation methods. The pretrained models are fine-tuned to achieve state-of-the-art results in tasks such as writer identification, gender classification, and handedness classification, also highlighting the superiority of utilizing the pretrained models over the models trained from scratch.
自监督学习提供了一种从各种类型的未标注数据中提取丰富表示的有效方法,同时避免了标注大型数据集的成本。这是通过设计一个前缀任务,在数据的形式和领域上形成伪标签来实现的。考虑到在线手写文字的应用不断扩展,在本研究中,我们提出了名为“部分轮廓掩码”(POSM)的前缀任务,作为预训练模型的预处理任务,以提取英汉语言个体在线手写字的 informative 表示。我们还提出了两种建议的微调预处理管道,用于微调预训练模型。为了评估提取到的表示的质量,我们使用了内化和外化评估方法。预训练模型在诸如写者识别、性别分类和左右利手分类等任务中达到了最先进的结果,同时也突出了利用预训练模型的优越性。
https://arxiv.org/abs/2310.06645
The growing global elderly population is expected to increase the prevalence of frailty, posing significant challenges to healthcare systems. Frailty, a syndrome associated with ageing, is characterised by progressive health decline, increased vulnerability to stressors and increased risk of mortality. It represents a significant burden on public health and reduces the quality of life of those affected. The lack of a universally accepted method to assess frailty and a standardised definition highlights a critical research gap. Given this lack and the importance of early prevention, this study presents an innovative approach using an instrumented ink pen to ecologically assess handwriting for age group classification. Content-free handwriting data from 80 healthy participants in different age groups (20-40, 41-60, 61-70 and 70+) were analysed. Fourteen gesture- and tremor-related indicators were computed from the raw data and used in five classification tasks. These tasks included discriminating between adjacent and non-adjacent age groups using Catboost and Logistic Regression classifiers. Results indicate exceptional classifier performance, with accuracy ranging from 82.5% to 97.5%, precision from 81.8% to 100%, recall from 75% to 100% and ROC-AUC from 92.2% to 100%. Model interpretability, facilitated by SHAP analysis, revealed age-dependent sensitivity of temporal and tremor-related handwriting features. Importantly, this classification method offers potential for early detection of abnormal signs of ageing in uncontrolled settings such as remote home monitoring, thereby addressing the critical issue of frailty detection and contributing to improved care for older adults.
全球老年人口的增长预计会导致脆性增加,给医疗保健系统带来了巨大的挑战。脆性是一种与年龄有关的综合征,其特征为渐进的健康下降、更强的应激反应和更高的死亡率。它对环境造成了巨大的负担,影响了受影响的人们的生命质量。没有一种普遍接受的方法来评估脆性,也没有一个标准化的定义,这突出了一个重要的研究差距。鉴于这种情况和预防的重要性,本研究提出了一种创新的方法,使用一支配备了笔的电子设备墨水笔,以生态方式评估年龄组分类的手写数据。从不同年龄组(20-40岁、41-60岁、61-70岁和70岁以上)的80名健康参与者的无内容手写数据进行了分析。从原始数据计算了14个手势和颤抖相关指标,并用于五个分类任务。这些任务包括使用 CatBoost 和逻辑回归分类器区分相邻和非相邻年龄组。结果显示,分类器表现非常出色,精度从82.5%到97.5%,召回率从75%到100%,ROC-AUC从92.2%到100%。通过SHAP分析的帮助,模型解释性得以改善,揭示了年龄相关的时间和颤抖手写特征的灵敏度。重要的是,这种方法可以在失控环境下(如远程家庭监测)早期检测异常年龄表现,从而解决了脆性检测的关键问题,并为改善老年人护理做出贡献。
https://arxiv.org/abs/2309.17156
Teaching physical skills to humans requires one-on-one interaction between the teacher and the learner. With a shortage of human teachers, such a teaching mode faces the challenge of scaling up. Robots, with their replicable nature and physical capabilities, offer a solution. In this work, we present TeachingBot, a robotic system designed for teaching handwriting to human learners. We tackle two primary challenges in this teaching task: the adaptation to each learner's unique style and the creation of an engaging learning experience. TeachingBot captures the learner's style using a probabilistic learning approach based on the learner's handwriting. Then, based on the learned style, it provides physical guidance to human learners with variable impedance to make the learning experience engaging. Results from human-subject experiments based on 15 human subjects support the effectiveness of TeachingBot, demonstrating improved human learning outcomes compared to baseline methods. Additionally, we illustrate how TeachingBot customizes its teaching approach for individual learners, leading to enhanced overall engagement and effectiveness.
向人类教授物理技能需要教师和学生之间的一对一互动。由于教师短缺,这种教学方式面临扩大的挑战。机器人因其可重复性和物理能力,提供了一个解决方案。在本研究中,我们介绍了教学机器人 TeachingBot,它设计用于向人类学习者教授手写笔记。我们在教学任务中处理了两个主要挑战:适应每个学习者的独特风格,以及创造令人感兴趣的学习体验。 TeachingBot基于学习者的手写笔记使用一种概率学习方法来捕捉其风格。基于学到的风格,它为具有不同阻抗的人类学习者提供物理指导,以创造令人感兴趣的学习体验。基于15名人类参与者的实验结果,支持 TeachingBot的有效性,表明相比基准方法,人类学习结果有所改善。此外,我们展示了 TeachingBot为个人学习者定制其教学 approach 的方法,导致整体参与和有效性的增强。
https://arxiv.org/abs/2309.11848
Offline handwriting recognition (HWR) has improved significantly with the advent of deep learning architectures in recent years. Nevertheless, it remains a challenging problem and practical applications often rely on post-processing techniques for restricting the predicted words via lexicons or language models. Despite their enhanced performance, such systems are less usable in contexts where out-of-vocabulary words are anticipated, e.g. for detecting misspelled words in school assessments. To that end, we introduce the task of comparing a handwriting image to text. To solve the problem, we propose an unrestricted binary classifier, consisting of a HWR feature extractor and a multimodal classification head which convolves the feature extractor output with the vector representation of the input text. Our model's classification head is trained entirely on synthetic data created using a state-of-the-art generative adversarial network. We demonstrate that, while maintaining high recall, the classifier can be calibrated to achieve an average precision increase of 19.5% compared to addressing the task by directly using state-of-the-art HWR models. Such massive performance gains can lead to significant productivity increases in applications utilizing human-in-the-loop automation.
过去几年中,深度学习架构的出现使得离线手写识别(HWR)性能得到了显著提高。然而,它仍然是一个具有挑战性的问题,并且实用的应用程序通常依赖于后处理技术通过词汇表或语言模型限制预测单词。尽管这些系统的性能得到了增强,但在预计缺少词汇表的单词的情况下,它们 less useful,例如在在学校评估中检测拼写错误的单词方面。为此,我们引入了比较手写图像和文本的任务。为了解决这个问题,我们提出了一个不受限制的二进制分类器,它由一个HWR特征提取器和一个多模式分类头组成,该分类头将特征提取器输出与输入文本的向量表示卷积。我们训练我们的分类头完全使用先进的生成对抗网络生成的模拟数据。我们证明,尽管保持高召回率,分类器可以校准以实现平均精度提高19.5%,而直接使用先进的HWR模型解决这个问题则无法达到这个水平。这种巨大的性能提升可以在利用人类参与的自动化应用中导致显著的生产率增加。
https://arxiv.org/abs/2309.10158
On-line handwritten character segmentation is often associated with handwriting recognition and even though recognition models include mechanisms to locate relevant positions during the recognition process, it is typically insufficient to produce a precise segmentation. Decoupling the segmentation from the recognition unlocks the potential to further utilize the result of the recognition. We specifically focus on the scenario where the transcription is known beforehand, in which case the character segmentation becomes an assignment problem between sampling points of the stylus trajectory and characters in the text. Inspired by the $k$-means clustering algorithm, we view it from the perspective of cluster assignment and present a Transformer-based architecture where each cluster is formed based on a learned character query in the Transformer decoder block. In order to assess the quality of our approach, we create character segmentation ground truths for two popular on-line handwriting datasets, IAM-OnDB and HANDS-VNOnDB, and evaluate multiple methods on them, demonstrating that our approach achieves the overall best results.
在线手写字符分割通常与手写字符识别相关联,尽管识别模型包括在识别过程中定位相关位置的机制,但通常不足以产生精确的分割。将分割与识别分离解锁了进一步利用识别结果的潜力。我们 specifically focus on the scenario where the transcription is known beforehand, in which case the character segmentation becomes an assignment problem between sampling points of the stylus trajectory and characters in the text. 受到 $k$-means 聚类算法的启发,我们将其从簇 assignment 的角度看待,并提出了基于 Transformer 解码 block 的架构,其中每个簇是基于 learned character query 在 Transformer 编码器块中学习的角色查询形成的。为了评估我们的方法的质量,我们为两个流行的在线手写数据集,IAM-OnDB 和 HandS-VNOnDB 创建字符分割基准 truth,并评估了多个方法,证明了我们的方法取得了最佳结果。
https://arxiv.org/abs/2309.03072
Recognizing text lines from images is a challenging problem, especially for handwritten documents due to large variations in writing styles. While text line recognition models are generally trained on large corpora of real and synthetic data, such models can still make frequent mistakes if the handwriting is inscrutable or the image acquisition process adds corruptions, such as noise, blur, compression, etc. Writing style is generally quite consistent for an individual, which can be leveraged to correct mistakes made by such models. Motivated by this, we introduce the problem of adapting text line recognition models during test time. We focus on a challenging and realistic setting where, given only a single test image consisting of multiple text lines, the task is to adapt the model such that it performs better on the image, without any labels. We propose an iterative self-training approach that uses feedback from the language model to update the optical model, with confident self-labels in each iteration. The confidence measure is based on an augmentation mechanism that evaluates the divergence of the prediction of the model in a local region. We perform rigorous evaluation of our method on several benchmark datasets as well as their corrupted versions. Experimental results on multiple datasets spanning multiple scripts show that the proposed adaptation method offers an absolute improvement of up to 8% in character error rate with just a few iterations of self-training at test time.
识别图像中的文本线条是一个挑战性的问题,特别是对于手写文档,因为书写风格有很大的差异。虽然文本线条识别模型通常是基于大量真实和合成数据的大型数据集训练的,但如果手写字迹难以辨认或图像采集过程会增加噪声、模糊、压缩等错误,这些模型仍然可能频繁犯错。书写风格通常为个人非常一致,可以利用它来纠正这些模型的错误。基于这种想法,我们提出了在测试期间适应文本线条识别模型的问题。我们关注一个具有挑战性和实际性的情境,其中给定只有一张包含多个文本线条的测试图像,任务是适应模型,使其在图像中表现更好,而不需要标签。我们提出了一种迭代的自我训练方法,使用语言模型的反馈更新光学模型,在每个迭代中都有一个自信的自我标签。信心测量基于增强机制,评估模型在局部区域的预测差异。我们对多个基准数据集以及其损坏版本进行了严格的评估。多个数据集跨越多个脚本的实验结果显示, proposed 适应方法在测试时仅需要进行几个迭代的自我训练,就能 absolute 地提高字符错误率,达到8%的水平。
https://arxiv.org/abs/2308.15037
In this paper, we tackle the challenge of white-box false positive adversarial attacks on contrastive loss-based offline handwritten signature verification models. We propose a novel attack method that treats the attack as a style transfer between closely related but distinct writing styles. To guide the generation of deceptive images, we introduce two new loss functions that enhance the attack success rate by perturbing the Euclidean distance between the embedding vectors of the original and synthesized samples, while ensuring minimal perturbations by reducing the difference between the generated image and the original image. Our method demonstrates state-of-the-art performance in white-box attacks on contrastive loss-based offline handwritten signature verification models, as evidenced by our experiments. The key contributions of this paper include a novel false positive attack method, two new loss functions, effective style transfer in handwriting styles, and superior performance in white-box false positive attacks compared to other white-box attack methods.
在本文中,我们解决了白色盒中对基于对比度损失的 offline 手写签名验证模型的对抗攻击挑战。我们提出了一种新的攻击方法,将其视为 closely related but distinct 写作风格的风格转移。为了指导生成欺骗性图像,我们引入了两个新的损失函数,通过改变原始和合成样本的嵌入向量之间的欧几里得距离,提高了攻击成功的概率,同时通过减少生成图像和原始图像之间的差异,确保了最小化干扰。我们的方法证明了在白色盒对基于对比度损失的 offline 手写签名验证模型的对抗攻击中最先进的性能,我们的实验证据表明。本文的关键贡献包括一种新的 False positive 攻击方法、两个新的损失函数、手写风格的有效风格转移,以及与其他白色盒攻击方法相比,在白色盒 False positive 攻击中表现出更好的性能。
https://arxiv.org/abs/2308.08925
A learning-based modular motion planning pipeline is presented that is compliant, safe, and reactive to perturbations at task execution. A nominal motion plan, defined as a nonlinear autonomous dynamical system (DS), is learned offline from kinesthetic demonstrations using a Neural Ordinary Differential Equation (NODE) model. To ensure both stability and safety during inference, a novel approach is proposed which selects a target point at each time step for the robot to follow, using a time-varying target trajectory generated by the learned NODE. A correction term to the NODE model is computed online by solving a Quadratic Program that guarantees stability and safety using Control Lyapunov Functions and Control Barrier Functions, respectively. Our approach outperforms baseline DS learning techniques on the LASA handwriting dataset and is validated on real-robot experiments where it is shown to produce stable motions, such as wiping and stirring, while being robust to physical perturbations and safe around humans and obstacles.
基于学习的模块运动规划管道呈现,能够在任务执行中符合要求、确保安全,并响应干扰。一个名义的运动计划,定义为非线性自主动态系统(DS),通过使用神经网络普通微分方程模型从触觉演示中学习 offline 。为了在推理期间保证稳定性和安全性,提出一种新方法,使用 learned 的 NODE 模型在每个时间步骤选择目标点,使用由 learned 的 NODE 模型生成的时变目标轨迹。一个 NODE 模型的修正 term 在线通过解决一个 Quadratic 程序计算,使用控制 Lyapunov 函数和控制屏障函数分别保证稳定性和安全性。我们在 LASA 手写数据集上比基线 DS 学习技术表现更好,并在真实机器人实验中验证,它显示产生稳定的运动,如擦拭和搅拌,同时 robust 到物理干扰,并在人类和障碍物周围安全。
https://arxiv.org/abs/2308.00186
Handwriting recognition is a challenging and critical problem in the fields of pattern recognition and machine learning, with applications spanning a wide range of domains. In this paper, we focus on the specific issue of recognizing offline Arabic handwritten text. Existing approaches typically utilize a combination of convolutional neural networks for image feature extraction and recurrent neural networks for temporal modeling, with connectionist temporal classification used for text generation. However, these methods suffer from a lack of parallelization due to the sequential nature of recurrent neural networks. Furthermore, these models cannot account for linguistic rules, necessitating the use of an external language model in the post-processing stage to boost accuracy. To overcome these issues, we introduce two alternative architectures, namely the Transformer Transducer and the standard sequence-to-sequence Transformer, and compare their performance in terms of accuracy and speed. Our approach can model language dependencies and relies only on the attention mechanism, thereby making it more parallelizable and less complex. We employ pre-trained Transformers for both image understanding and language modeling. Our evaluation on the Arabic KHATT dataset demonstrates that our proposed method outperforms the current state-of-the-art approaches for recognizing offline Arabic handwritten text.
手写识别是模式识别和机器学习领域的挑战和关键问题,应用范围广泛。在本文中,我们将专注于识别离线阿拉伯手写文本的特定问题。现有的方法通常使用卷积神经网络的图像特征提取和循环神经网络的时间建模相结合,并用连接主义的时间分类用于文本生成。然而,这些方法由于循环神经网络的序列性质而缺乏并行化。此外,这些模型无法考虑语言学规则,因此需要在处理后期使用外部语言模型来提高准确性。为了克服这些问题,我们介绍了两个 alternative 架构,即Transformer 转换器和标准序列到序列 Transformer,并比较了它们的性能和速度。我们的方法可以建模语言依赖关系,仅依靠注意力机制,因此使其更可并行化,更简洁。我们使用预训练的Transformers 用于图像理解和语言建模。我们对阿拉伯语的KHATT 数据集进行评估,表明我们提出的方法在识别离线阿拉伯手写文本方面优于当前最先进的方法。
https://arxiv.org/abs/2307.15045
Handwriting authentication is a valuable tool used in various fields, such as fraud prevention and cultural heritage protection. However, it remains a challenging task due to the complex features, severe damage, and lack of supervision. In this paper, we propose a novel Contrastive Self-Supervised Learning framework for Robust Handwriting Authentication (CSSL-RHA) to address these issues. It can dynamically learn complex yet important features and accurately predict writer identities. Specifically, to remove the negative effects of imperfections and redundancy, we design an information-theoretic filter for pre-processing and propose a novel adaptive matching scheme to represent images as patches of local regions dominated by more important features. Through online optimization at inference time, the most informative patch embeddings are identified as the "most important" elements. Furthermore, we employ contrastive self-supervised training with a momentum-based paradigm to learn more general statistical structures of handwritten data without supervision. We conduct extensive experiments on five benchmark datasets and our manually annotated dataset EN-HA, which demonstrate the superiority of our CSSL-RHA compared to baselines. Additionally, we show that our proposed model can still effectively achieve authentication even under abnormal circumstances, such as data falsification and corruption.
手写验证是一种在各个领域得到广泛应用的宝贵工具,例如欺诈预防和文化遗产保护。然而,由于复杂的特征、严重的损坏以及缺乏监督,手写验证仍然是一项具有挑战性的任务。在本文中,我们提出了一种新的Contrastive Self-Supervised Learning框架,用于 robust手写验证(CSSL-RHA),以解决这些问题。它可以动态学习复杂但重要的特征,准确地预测作者身份。具体来说,为了消除不完美和冗余的负面影响,我们设计了信息论预处理过滤器,并提出了一种新的自适应匹配方案,以将图像表示为当地区域重要性更高的斑点。通过在推理时在线优化,最 informative 的斑点嵌入s被识别为“最重要的”元素。此外,我们采用基于动量的Contrastive self-supervised训练方法,在没有监督的情况下学习手写数据更普遍的统计结构。我们研究了五个基准数据集和我们的手动标注数据集EN-HA,证明了我们CSSL-RHA相对于基准集的优越性。此外,我们还展示了我们提出的模型即使在异常条件下,如数据伪造和腐败,仍然能够有效地进行验证。
https://arxiv.org/abs/2307.11100
Handwriting recognition has seen significant success with the use of deep learning. However, a persistent shortcoming of neural networks is that they are not well-equipped to deal with shifting data distributions. In the field of handwritten text recognition (HTR), this shows itself in poor recognition accuracy for writers that are not similar to those seen during training. An ideal HTR model should be adaptive to new writing styles in order to handle the vast amount of possible writing styles. In this paper, we explore how HTR models can be made writer adaptive by using only a handful of examples from a new writer (e.g., 16 examples) for adaptation. Two HTR architectures are used as base models, using a ResNet backbone along with either an LSTM or Transformer sequence decoder. Using these base models, two methods are considered to make them writer adaptive: 1) model-agnostic meta-learning (MAML), an algorithm commonly used for tasks such as few-shot classification, and 2) writer codes, an idea originating from automatic speech recognition. Results show that an HTR-specific version of MAML known as MetaHTR improves performance compared to the baseline with a 1.4 to 2.0 improvement in word error rate (WER). The improvement due to writer adaptation is between 0.2 and 0.7 WER, where a deeper model seems to lend itself better to adaptation using MetaHTR than a shallower model. However, applying MetaHTR to larger HTR models or sentence-level HTR may become prohibitive due to its high computational and memory requirements. Lastly, writer codes based on learned features or Hinge statistical features did not lead to improved recognition performance.
手写文本识别利用深度学习取得了显著的成功。然而,神经网络的一个长期缺陷是它们无法处理不断变化的数据分布。在手写文本识别领域(HTR),这种现象表现为对新的写作风格识别准确率较差,与训练期间看到的写作风格不同。理想的HTR模型应该适应新的写作风格,以处理大量的可能写作风格。在本文中,我们探讨了如何使用仅来自一名新作者的几个例子(例如16个例子)来适应HTR模型,如何使HTR模型作家自适应:1)模型无关的元学习(MAML),是一种常见的任务,例如少量分类,以及2)写作代码,这是一个源自自动语音识别的想法。结果显示,称为MetaHTR的HTR特定版本相比基线表现提高了性能,单词错误率(WER)下降了1.4到2.0。由于作家适应引起的改善位于0.2到0.7的WER之间, deeper模型似乎更好地使用MetaHTR进行适应,而 shallower模型则更适合。然而,将MetaHTR应用于更大的HTR模型或句子级别的HTR可能由于高计算和内存要求而令人难以实施。最后,基于学习特征或梯度统计特征的写作代码并未导致更好的识别性能。
https://arxiv.org/abs/2307.15071
Stroke extraction of Chinese characters plays an important role in the field of character recognition and generation. The most existing character stroke extraction methods focus on image morphological features. These methods usually lead to errors of cross strokes extraction and stroke matching due to rarely using stroke semantics and prior information. In this paper, we propose a deep learning-based character stroke extraction method that takes semantic features and prior information of strokes into consideration. This method consists of three parts: image registration-based stroke registration that establishes the rough registration of the reference strokes and the target as prior information; image semantic segmentation-based stroke segmentation that preliminarily separates target strokes into seven categories; and high-precision extraction of single strokes. In the stroke registration, we propose a structure deformable image registration network to achieve structure-deformable transformation while maintaining the stable morphology of single strokes for character images with complex structures. In order to verify the effectiveness of the method, we construct two datasets respectively for calligraphy characters and regular handwriting characters. The experimental results show that our method strongly outperforms the baselines. Code is available at this https URL.
中文字符的 stroke 提取在字符识别和生成领域扮演着重要的角色。目前,大多数字符 stroke 提取方法都关注图像形态学特征。这些方法通常会导致交叉字符提取和字符匹配的错误,因为它们很少使用字符语义特征和前信息。在本文中,我们提出了一种基于深度学习的字符 stroke 提取方法,考虑了字符语义特征和 stroke 前信息。这种方法由三部分组成:基于图像注册的字符 stroke 注册,建立参考字符和目标作为初步注册信息;基于图像语义分割的字符分割,初步地将目标字符分割为七类;以及高精度的单个字符提取。在字符注册中,我们提出了一种可重构的结构图像注册网络,以实现可重构的结构变化,同时保持字符图像中复杂结构中的单个字符稳定的形态学。为了验证方法的有效性,我们分别构建了两个数据集,分别是书法字符和常规手写字符。实验结果显示,我们的方法显著优于基准方法。代码可在 this https URL 中找到。
https://arxiv.org/abs/2307.04341
Using kinematic properties of handwriting to support the diagnosis of neurodegenerative disease is a real challenge: non-invasive detection techniques combined with machine learning approaches promise big steps forward in this research field. In literature, the tasks proposed focused on different cognitive skills to elicitate handwriting movements. In particular, the meaning and phonology of words to copy can compromise writing fluency. In this paper, we investigated how word semantics and phonology affect the handwriting of people affected by Alzheimer's disease. To this aim, we used the data from six handwriting tasks, each requiring copying a word belonging to one of the following categories: regular (have a predictable phoneme-grapheme correspondence, e.g., cat), non-regular (have atypical phoneme-grapheme correspondence, e.g., laugh), and non-word (non-meaningful pronounceable letter strings that conform to phoneme-grapheme conversion rules). We analyzed the data using a machine learning approach by implementing four well-known and widely-used classifiers and feature selection. The experimental results showed that the feature selection allowed us to derive a different set of highly distinctive features for each word type. Furthermore, non-regular words needed, on average, more features but achieved excellent classification performance: the best result was obtained on a non-regular, reaching an accuracy close to 90%.
使用手写动作的 Kinematic 特性支持神经元疾病的诊断是一项真正的挑战:非侵入性检测技术和机器学习方法将推动这一领域的重大进展。在文献中,提出的任务重点关注不同的认知能力来唤起手写动作。特别是,要复制的词语的意义和音韵学可以影响受阿尔茨海默病影响的人的写作流畅度。在本文中,我们研究了单词语义和音韵学如何影响受阿尔茨海默病影响的人手写动作。为了实现这一目标,我们使用了六个手写动作任务的数据,每个任务要求复制一个属于以下类别之一的词语:规则的(具有可预测的音位-元音素对应关系,例如,猫),不规则的(具有不同类型的音位-元音素对应关系,例如,笑),以及非单词(符合音位-元音素转换规则的非有意义的音节字符串)。我们使用机器学习方法来分析数据,采用四个广泛应用的分类器和特征选择来实现。实验结果显示,特征选择允许我们为每种单词类型生成不同的一组高度独特的特征。此外,非规则的词语通常需要更多的特征,但实现了出色的分类性能:最好的结果发生在一个不规则单词上,达到了接近 90% 的准确性。
https://arxiv.org/abs/2307.04762
This paper presents an end-to-end methodology for collecting datasets to recognize handwritten English alphabets by utilizing Inertial Measurement Units (IMUs) and leveraging the diversity present in the Indian writing style. The IMUs are utilized to capture the dynamic movement patterns associated with handwriting, enabling more accurate recognition of alphabets. The Indian context introduces various challenges due to the heterogeneity in writing styles across different regions and languages. By leveraging this diversity, the collected dataset and the collection system aim to achieve higher recognition accuracy. Some preliminary experimental results demonstrate the effectiveness of the dataset in accurately recognizing handwritten English alphabet in the Indian context. This research can be extended and contributes to the field of pattern recognition and offers valuable insights for developing improved systems for handwriting recognition, particularly in diverse linguistic and cultural contexts.
本论文介绍了一种端到端的方法,以收集数据集,识别手写的英语单词,利用惯性测量单元(IMUs)并利用印度书写风格的多样性。IMUs用于捕捉与手写运动相关的动态运动模式,实现更精确的单词识别。由于印度地区的书写风格的多样性,引入了各种挑战。通过利用这种多样性,收集的数据集和收集系统旨在实现更高的识别精度。一些初步实验结果证明,该数据集在印度上下文中准确地识别手写的英语单词的有效性。这项研究可以扩展并贡献于模式识别领域,并提供有价值的洞察力,以开发改进的手写识别系统,尤其是在各种语言和文化背景下。
https://arxiv.org/abs/2307.02480
We present a CNN-BiLSTM system for the problem of offline English handwriting recognition, with extensive evaluations on the public IAM dataset, including the effects of model size, data augmentation and the lexicon. Our best model achieves 3.59\% CER and 9.44\% WER using CNN-BiLSTM network with CTC layer. Test time augmentation with rotation and shear transformations applied to the input image, is proposed to increase recognition of difficult cases and found to reduce the word error rate by 2.5\% points. We also conduct an error analysis of our proposed method on IAM dataset, show hard cases of handwriting images and explore samples with erroneous labels. We provide our source code as public-domain, to foster further research to encourage scientific reproducibility.
我们提出了一个使用卷积神经网络(CNN)和双向循环神经网络(BiLSTM)来解决 offline English手写识别问题的系统。我们对公开的IAM数据集进行了广泛的评估,包括模型大小、数据增强和词汇表的影响。我们的最佳模型使用CNN-BiLSTM网络和CTC层,取得了3.59\%的CER和9.44\%的WER。我们提出了将输入图像进行旋转和剪切变换的测试时间增强方法,以增加困难情况下的识别,并且发现可以减少单词错误率2.5\% points。我们还对在我们提出的方法和IAM数据集上的错误分析方法进行了评估,展示了手写图像中的困难案例,并探索了带有错误标签的样本。我们将我们的源代码提供为公共版权,以促进进一步的研究,鼓励科学复现。
https://arxiv.org/abs/2307.00664