Progress in Automated Handwriting Recognition has been hampered by the lack of large training datasets. Nearly all research uses a set of small datasets that often cause models to overfit. We present CENSUS-HWR, a new dataset consisting of full English handwritten words in 1,812,014 gray scale images. A total of 1,865,134 handwritten texts from a vocabulary of 10,711 words in the English language are present in this collection. This dataset is intended to serve handwriting models as a benchmark for deep learning algorithms. This huge English handwriting recognition dataset has been extracted from the US 1930 and 1940 censuses taken by approximately 70,000 enumerators each year. The dataset and the trained model with their weights are freely available to download at this https URL.
自动手写识别的进展受到了缺乏大型训练数据的困扰。几乎所有研究都使用了一些小型数据集,这往往导致模型过拟合。我们提出了CENSUS-HWR,这是一个新的数据集,包含1,812,014张灰度图像上的全英语手写单词。这个数据集包含了总共1,865,134篇从英语语言中 vocabulary为10,711个单词的手写文本。这个数据集旨在作为深度学习算法的基准手写模型。这个巨大的英语手写识别数据集是从美国1930和1940年人口普查中每年由大约70,000名调查员收集到的。数据集和训练模型及其权重可以在这个httpsURL上免费下载。
https://arxiv.org/abs/2305.16275
The Transformer architecture is shown to provide a powerful machine transduction framework for online handwritten gestures corresponding to glyph strokes of natural language sentences. The attention mechanism is successfully used to create latent representations of an end-to-end encoder-decoder model, solving multi-level segmentation while also learning some language features and syntax rules. The additional use of a large decoding space with some learned Byte-Pair-Encoding (BPE) is shown to provide robustness to ablated inputs and syntax rules. The encoder stack was directly fed with spatio-temporal data tokens potentially forming an infinitely large input vocabulary, an approach that finds applications beyond that of this work. Encoder transfer learning capabilities is also demonstrated on several languages resulting in faster optimisation and shared parameters. A new supervised dataset of online handwriting gestures suitable for generic handwriting recognition tasks was used to successfully train a small transformer model to an average normalised Levenshtein accuracy of 96% on English or German sentences and 94% in French.
Transformer架构提供了一种强大的机器翻译框架,以对应自然语言句子glyph strokes的在线手写手势。注意力机制成功地被用于创建端到端编码解码模型的隐态表示,同时解决多层次分割,同时也学习了一些语言特征和语法规则。此外,使用一些已学习的字节对编码(BPE)的大解码空间提供了对 ablated 输入和语法规则的鲁棒性。编码器栈直接接收时空数据 token,可能形成无限大的输入词汇,这种方法超越了本工作的应用。Encoder 迁移学习能力也被在多个语言中演示,导致更快的优化和共享参数。了一个新的适合通用手写识别任务的在线手写手势监督数据集被使用,成功地训练了一个小型Transformer模型,使其在英语或德语句子上的平均 normalised 拼写错误率为96%,而在法语中的为94%。
https://arxiv.org/abs/2305.03407
Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR) have led to models with remarkable performance on both modern and historical manuscripts in large benchmark datasets. Nonetheless, those models struggle to obtain the same performance when applied to manuscripts with peculiar characteristics, such as language, paper support, ink, and author handwriting. This issue is very relevant for valuable but small collections of documents preserved in historical archives, for which obtaining sufficient annotated training data is costly or, in some cases, unfeasible. To overcome this challenge, a possible solution is to pretrain HTR models on large datasets and then fine-tune them on small single-author collections. In this paper, we take into account large, real benchmark datasets and synthetic ones obtained with a styled Handwritten Text Generation model. Through extensive experimental analysis, also considering the amount of fine-tuning lines, we give a quantitative indication of the most relevant characteristics of such data for obtaining an HTR model able to effectively transcribe manuscripts in small collections with as little as five real fine-tuning lines.
近年来,深度学习基于手写文本识别(HTR)的发展已经取得了巨大的进步,在大型基准数据集上取得了卓越的现代和历史手稿识别性能。然而,当应用于具有独特特征的手稿,例如语言、纸张支持、墨水和作者手写字体时,这些模型却无法达到相同的表现水平。这个问题对于保存的宝贵但小型文档集合,获取足够的注释训练数据非常昂贵或在某些情况下不可能。为了克服这一挑战,一种可能的解决方案是在大型基准数据集上预训练HTR模型,然后在小 single-author 集合上微调它们。在本文中,我们考虑了大型真实基准数据集和通过手写文本生成模型样式生成的模拟数据集。通过广泛的实验分析,并考虑微调线的数量和精度,我们提供了这种数据的最相关特征的定量指示,以构建能够在小数据集上有效解码手稿的HTR模型,仅包含五行真正的微调线。
https://arxiv.org/abs/2305.02593
We propose a Transformer-based approach for information extraction from digitized handwritten documents. Our approach combines, in a single model, the different steps that were so far performed by separate models: feature extraction, handwriting recognition and named entity recognition. We compare this integrated approach with traditional two-stage methods that perform handwriting recognition before named entity recognition, and present results at different levels: line, paragraph, and page. Our experiments show that attention-based models are especially interesting when applied on full pages, as they do not require any prior segmentation step. Finally, we show that they are able to learn from key-value annotations: a list of important words with their corresponding named entities. We compare our models to state-of-the-art methods on three public databases (IAM, ESPOSALLES, and POPP) and outperform previous performances on all three datasets.
我们提出了从数字化手写文档中提取信息的Transformer-based方法。我们的方法将 separate 模型此前完成的不同步骤整合到一个模型中:特征提取,手写命名实体识别和两阶段方法(手写命名实体识别之前)。我们将这种集成方法与传统的二阶段方法进行比较,在手写命名实体识别之前进行特征提取,并在不同水平上呈现结果:行、段落和页面。我们的实验结果表明,当应用于整页时,注意力模型特别有趣,因为它们不需要任何前分片步骤。最后,我们展示了它们能够从关键值注释学习:一个重要的单词及其相应的命名实体列表。我们在三个公共数据库(IAM、ESPOSALLES和POPP)上比较了我们的模型与最先进的方法,并在所有三个数据集上优于以前的性能表现。
https://arxiv.org/abs/2304.13530
Generative modelling over continuous-time geometric constructs, a.k.a such as handwriting, sketches, drawings etc., have been accomplished through autoregressive distributions. Such strictly-ordered discrete factorization however falls short of capturing key properties of chirographic data -- it fails to build holistic understanding of the temporal concept due to one-way visibility (causality). Consequently, temporal data has been modelled as discrete token sequences of fixed sampling rate instead of capturing the true underlying concept. In this paper, we introduce a powerful model-class namely "Denoising Diffusion Probabilistic Models" or DDPMs for chirographic data that specifically addresses these flaws. Our model named "ChiroDiff", being non-autoregressive, learns to capture holistic concepts and therefore remains resilient to higher temporal sampling rate up to a good extent. Moreover, we show that many important downstream utilities (e.g. conditional sampling, creative mixing) can be flexibly implemented using ChiroDiff. We further show some unique use-cases like stochastic vectorization, de-noising/healing, abstraction are also possible with this model-class. We perform quantitative and qualitative evaluation of our framework on relevant datasets and found it to be better or on par with competing approaches.
连续性时间几何构造的生成建模,例如手写、 Sketch、绘图等,是通过自回归分布完成的。然而,这种严格排序的离散 factorization 未能捕捉到的图像数据的关键特性——由于单向可视化(因果性),它无法构建整体理解,因此,时间数据被建模为固定采样率的离散 token 序列,而不是捕捉真正 underlying 概念。在本文中,我们介绍了一种强大的模型类别,名为“去噪扩散概率模型”(DDPM),专门解决了图像数据的这些缺陷。我们的模型名为“ChiroDiff”,它不是自回归的,因此学习捕捉整体概念,因此可以很好地适应更高的时间采样率。此外,我们展示了许多重要的后续功能(例如条件采样、创造性混合)可以使用 ChiroDiff 灵活实现。我们还展示了一些独特的应用场景,例如随机向量化、去噪/修复、抽象等。我们对这些模型类别在相关数据集上进行定量和定性评估,并发现它更好或与竞争方法相当。
https://arxiv.org/abs/2304.03785
Planning from demonstrations has shown promising results with the advances of deep neural networks. One of the most popular real-world applications is automated handwriting using a robotic manipulator. Classically it is simplified as a two-dimension problem. This representation is suitable for elementary drawings, but it is not sufficient for Japanese calligraphy or complex work of art where the orientation of a pen is part of the user expression. In this study, we focus on automated planning of Japanese calligraphy using a three-dimension representation of the trajectory as well as the rotation of the pen tip, and propose a novel deep imitation learning neural network that learns from expert demonstrations through a combination of images and pose data. The network consists of a combination of variational auto-encoder, bi-directional LSTM, and Multi-Layer Perceptron (MLP). Experiments are conducted in a progressive way, and results demonstrate that the proposed approach is successful in completion of tasks for real-world robots, overcoming the distribution shift problem in imitation learning. The source code and dataset will be public.
在深度学习网络的进步下,从演示中规划取得了令人瞩目的成果。最受欢迎的实际应用场景之一是使用机器人操纵器自动化手写体。传统上,它被简化为两个维度的问题。这种表示适合基本的绘画,但对于日本书法或复杂的艺术作品,笔的方向是用户表达的一部分,不适合使用。在这个研究中,我们关注用三维路径和笔尖旋转表示日本书法的自动化规划,并提出了一种新的深度学习模仿学习神经网络,它通过图像和姿态数据的组合学习专家演示。网络由Variational 编码器、双向LSTM和多层感知器(MLP)的组合组成。实验以逐步的方式进行,结果证明,该 proposed 方法成功地完成了针对现实世界机器人的任务,克服了模仿学习的分布 shift 问题。源代码和数据集将公开。
https://arxiv.org/abs/2304.02801
In this work, we explore massive pre-training on synthetic word images for enhancing the performance on four benchmark downstream handwriting analysis tasks. To this end, we build a large synthetic dataset of word images rendered in several handwriting fonts, which offers a complete supervision signal. We use it to train a simple convolutional neural network (ConvNet) with a fully supervised objective. The vector representations of the images obtained from the pre-trained ConvNet can then be considered as encodings of the handwriting style. We exploit such representations for Writer Retrieval, Writer Identification, Writer Verification, and Writer Classification and demonstrate that our pre-training strategy allows extracting rich representations of the writers' style that enable the aforementioned tasks with competitive results with respect to task-specific State-of-the-Art approaches.
在本研究中,我们探讨了在合成单词图像上进行大规模预训练,以增强四基准手写分析任务的表现。为此,我们构建了一个大型合成字体渲染的单词图像数据集,提供了完整的监督信号。我们使用它训练了一个简单卷积神经网络(ConvNet),该网络具有完全监督目标。从预训练的卷积神经网络中获取的图像的向量表示可以被视为手写风格编码。我们利用这些表示用于作家检索、作家识别、作家验证和作家分类,并证明我们的预训练策略允许提取作家风格的丰富表示,从而使上述任务在任务特定的最新方法中具有竞争力的结果。
https://arxiv.org/abs/2304.01842
Training machines to synthesize diverse handwritings is an intriguing task. Recently, RNN-based methods have been proposed to generate stylized online Chinese characters. However, these methods mainly focus on capturing a person's overall writing style, neglecting subtle style inconsistencies between characters written by the same person. For example, while a person's handwriting typically exhibits general uniformity (e.g., glyph slant and aspect ratios), there are still small style variations in finer details (e.g., stroke length and curvature) of characters. In light of this, we propose to disentangle the style representations at both writer and character levels from individual handwritings to synthesize realistic stylized online handwritten characters. Specifically, we present the style-disentangled Transformer (SDT), which employs two complementary contrastive objectives to extract the style commonalities of reference samples and capture the detailed style patterns of each sample, respectively. Extensive experiments on various language scripts demonstrate the effectiveness of SDT. Notably, our empirical findings reveal that the two learned style representations provide information at different frequency magnitudes, underscoring the importance of separate style extraction. Our source code is public at: this https URL.
训练机器生成多样化的手写字体是一个有趣的任务。最近,基于RNN的方法被提出用于生成精心修饰的在线中文字符。但是这些方法主要关注捕捉一个人的整体写作风格,忽略了相同作者笔下字符的微妙风格一致性。例如,尽管一个人的手写字体通常具有普遍一致性(例如字斜度和比例),但字符的 fine grained 风格差异仍然存在。鉴于这一点,我们建议从作者和字符级别的风格表示中分离出来,以合成实际修饰的在线手写字符。具体来说,我们提出了风格分离的Transformer(SDT),它使用两个互补的对比度目标分别提取参考样本的风格一致性,并捕捉每个样本的详细风格模式。针对各种语言脚本,进行了广泛的实验,证明了SDT的有效性。特别值得一提的是,我们的实证研究表明,两种学习的风格表示提供不同的频率幅度信息,突出了分开风格提取的重要性。我们的源代码公开在以下httpsURL。
https://arxiv.org/abs/2303.14736
The Transformer has quickly become the dominant architecture for various pattern recognition tasks due to its capacity for long-range representation. However, transformers are data-hungry models and need large datasets for training. In Handwritten Text Recognition (HTR), collecting a massive amount of labeled data is a complicated and expensive task. In this paper, we propose a lite transformer architecture for full-page multi-script handwriting recognition. The proposed model comes with three advantages: First, to solve the common problem of data scarcity, we propose a lite transformer model that can be trained on a reasonable amount of data, which is the case of most HTR public datasets, without the need for external data. Second, it can learn the reading order at page-level thanks to a curriculum learning strategy, allowing it to avoid line segmentation errors, exploit a larger context and reduce the need for costly segmentation annotations. Third, it can be easily adapted to other scripts by applying a simple transfer-learning process using only page-level labeled images. Extensive experiments on different datasets with different scripts (French, English, Spanish, and Arabic) show the effectiveness of the proposed model.
Transformer 已经迅速成为各种模式识别任务的主要架构,因为它能够进行远程表示。然而,Transformer 是一个数据饥渴模型,需要大规模的数据进行训练。在手写文本识别(HTR)中,收集大量标记数据是一项复杂且昂贵的任务。在本文中,我们提出了一种简单的Transformer架构,用于全页多脚本手写文本识别。该模型有三项优点:第一,为了解决数据稀缺的共同问题,我们提出了一种简单的Transformer模型,可以在合理的数据量上进行训练,这是HTR公共数据集的一般情况,无需外部数据。第二,它可以通过课程学习策略在页面级别学习阅读顺序,避免线分割错误,利用更大的上下文,减少昂贵的分割注释需求。第三,它可以轻松适应其他脚本,通过仅使用页面级别标记图像的应用简单的迁移学习过程。对不同脚本不同数据集(法语、英语、西班牙语和阿拉伯语)进行广泛的实验表明,该模型的有效性。
https://arxiv.org/abs/2303.13931
This study investigates the effect of haptic control strategies on a subject's mental engagement during a fine motor handwriting rehabilitation task. The considered control strategies include an error-reduction (ER) and an error-augmentation (EA), which are tested on both dominant and non-dominant hand. A non-invasive brain-computer interface is used to monitor the electroencephalogram (EEG) activities of the subjects and evaluate the subject's mental engagement using the power of multiple frequency bands (theta, alpha, and beta). Statistical analysis of the effect of the control strategy on mental engagement revealed that the choice of the haptic control strategy has a significant effect (p < 0.001) on mental engagement depending on the type of hand (dominant or non-dominant). Among the evaluated strategies, EA is shown to be more mentally engaging when compared with the ER under the non-dominant hand.
这项研究研究了在精细运动手语康复任务中, haptic控制策略对Subject mental engagement的影响。被考虑的控制策略包括一个减少错误(ER)和一个增加错误(EA),这两种策略都在 dominant 和 non- dominant 手分别进行测试。一个非侵入性的脑机接口被用来监测Subject的EEG活动,并使用多个频率Band的能量评估Subject的 mental engagement。对控制策略对 mental engagement 的影响进行统计分析表明,根据手的类型( dominant 或 non- dominant),选择 haptic 控制策略具有显著影响(p < 0.001)。在评估的策略中,EA 在 non- dominant 手下比 ER 更具有心理 engagement。
https://arxiv.org/abs/2303.09686
One of the factors limiting the performance of handwritten text recognition (HTR) for stenography is the small amount of annotated training data. To alleviate the problem of data scarcity, modern HTR methods often employ data augmentation. However, due to specifics of the stenographic script, such settings may not be directly applicable for stenography recognition. In this work, we study 22 classical augmentation techniques, most of which are commonly used for HTR of other scripts, such as Latin handwriting. Through extensive experiments, we identify a group of augmentations, including for example contained ranges of random rotation, shifts and scaling, that are beneficial to the use case of stenography recognition. Furthermore, a number of augmentation approaches, leading to a decrease in recognition performance, are identified. Our results are supported by statistical hypothesis testing. Links to the publicly available dataset and codebase are provided.
手写文本识别(stenography)对于训读方法的性能限制之一是训练数据量的微小。为了缓解数据稀缺的问题,现代训读方法常常采用数据增强。然而,由于训读脚本的特殊性质,这些设置可能不能直接适用于stenography识别。在本研究中,我们研究了22种经典的增强技术,其中大多数用于其他脚本,如拉丁手写体。通过广泛的实验,我们识别了一种群体增强方法,包括例如随机旋转、位移和缩放的范围,对于stenography识别的应用有益。此外,我们识别了一些增强方法,导致识别性能下降。我们的结果受到了统计假设检验的支持。提供了公开可用的数据集和代码链接。
https://arxiv.org/abs/2303.02761
Deformable linear objects, such as rods, cables, and ropes, play important roles in daily life. However, manipulation of DLOs is challenging as large geometrically nonlinear deformations may occur during the manipulation process. This problem is made even more difficult as the different deformation modes (e.g., stretching, bending, and twisting) may result in elastic instabilities during manipulation. In this paper, we formulate a physics-guided data-driven method to solve a challenging manipulation task -- accurately deploying a DLO (an elastic rod) onto a rigid substrate along various prescribed patterns. Our framework combines machine learning, scaling analysis, and physics-based simulations to develop a physically informed neural controller for deployment. We explore the complex interplay between the gravitational and elastic energies of the manipulated DLO and obtain a control method for DLO deployment that is robust against friction and material properties. Out of the numerous geometrical and material properties of the rod and substrate, we show that only three non-dimensional parameters are needed to describe the deployment process with physical analysis. Therefore, the essence of the controlling law for the manipulation task can be constructed with a low-dimensional model, drastically increasing the computation speed. The effectiveness of our optimal control scheme is shown through a comprehensive robotic case study comparing against a heuristic control method for deploying rods for a wide variety of patterns. In addition to this, we also showcase the practicality of our control scheme by having a robot accomplish challenging high-level tasks such as mimicking human handwriting and tying knots.
弯曲线性物体,如棒、电缆和绳索,在日常生活中扮演重要角色。然而,处理DLOs是一项挑战性的任务,因为它们在操作过程中可能会出现大的几何非线性变形。这个问题变得更加困难,因为不同的变形模式(例如拉伸、弯曲和扭曲)可能在操作过程中导致弹性不稳定。在本文中,我们提出了一种基于物理学指导的数据驱动方法来解决一个挑战性的操作任务——准确地部署DLO(弹性棒)到坚硬的表面上,按照各种给定的模式。我们的框架结合了机器学习、指数级分析和物理学模拟,开发了一款物理 informed 神经网络控制器,以部署DLO。我们探索了操纵DLO的重力和弹性能量之间的复杂交互作用,并获得了一种能够在摩擦和材料特性方面保持稳定的控制方法。从棒和表面上的许多几何和材料特性中,我们表明只需要三个非三维参数才能用物理分析来描述部署过程。因此,操纵任务的控制法的核心可以构建在一个低维模型中,显著加快计算速度。我们的最佳控制方案的效果通过一个全面的机器人案例研究对比了一种新型的启发式控制方法,以部署棒的各种模式。此外,我们还展示了我们的控制方案的实际可行性,通过让机器人完成诸如模仿人类手写体和解开结等具有挑战性的高层次任务。
https://arxiv.org/abs/2303.02574
In this paper, we present AR3n (pronounced as Aaron), an assist-as-needed (AAN) controller that utilizes reinforcement learning to supply adaptive assistance during a robot assisted handwriting rehabilitation task. Unlike previous AAN controllers, our method does not rely on patient specific controller parameters or physical models. We propose the use of a virtual patient model to generalize AR3n across multiple subjects. The system modulates robotic assistance in realtime based on a subject's tracking error, while minimizing the amount of robotic assistance. The controller is experimentally validated through a set of simulations and human subject experiments. Finally, a comparative study with a traditional rule-based controller is conducted to analyze differences in assistance mechanisms of the two controllers.
在本文中,我们介绍了AR3n(发音为Aaron),一种需要时提供协助(AAN)控制器,它利用强化学习提供自适应协助,在机器人协助手写康复任务中执行。与以前的AAN控制器不同,我们的方法和方法不依赖于患者特定的控制器参数或物理模型。我们建议使用虚拟患者模型来泛化AR3n在不同受试者中的普及。系统实时基于一个受试者跟踪误差来调整机器人协助,同时最小化机器人协助的数量。控制器通过一组模拟和人类受试者实验进行了实验验证。最后,与传统的基于规则的控制器进行了比较研究,以分析两个控制器的协助机制之间的差异。
https://arxiv.org/abs/2303.00085
One of the challenges of handwriting recognition is to transcribe a large number of vastly different writing styles. State-of-the-art approaches do not explicitly use information about the writer's style, which may be limiting overall accuracy due to various ambiguities. We explore models with writer-dependent parameters which take the writer's identity as an additional input. The proposed models can be trained on datasets with partitions likely written by a single author (e.g. single letter, diary, or chronicle). We propose a Writer Style Block (WSB), an adaptive instance normalization layer conditioned on learned embeddings of the partitions. We experimented with various placements and settings of WSB and contrastively pre-trained embeddings. We show that our approach outperforms a baseline with no WSB in a writer-dependent scenario and that it is possible to estimate embeddings for new writers. However, domain adaptation using simple finetuning in a writer-independent setting provides superior accuracy at a similar computational cost. The proposed approach should be further investigated in terms of training stability and embedding regularization to overcome such a baseline.
手写字符识别中的一个重要挑战是准确地记录大量不同的写作风格。目前最先进的方法并没有明确地使用作家的风格信息,这可能因为各种歧义而导致整体准确性的限制。我们探索了具有作家依赖性参数的方法,这些参数将作家身份作为额外的输入。我们希望将这些模型应用于可能由单个作者编写的分区数据集上(例如单个字母、日记或 chronicle)。我们提出了作家风格块(WSB),它是一种自适应实例归一化层,基于分区学习的嵌入。我们尝试了各种 WSB 的位置和设置,并对比性的预先训练嵌入进行了实验。我们表明,在我们的方法中,在没有 WSB 的情况下在作家依赖性场景中比基准模型表现更好,并且可以估计新作家的嵌入。然而,在作家独立性场景中,通过简单的微调提供比基准模型更好的准确性。因此,我们建议进一步研究训练稳定性和嵌入正则化,以克服这种基准。
https://arxiv.org/abs/2302.06318
In many machine learning tasks, a large general dataset and a small specialized dataset are available. In such situations, various domain adaptation methods can be used to adapt a general model to the target dataset. We show that in the case of neural networks trained for handwriting recognition using CTC, simple finetuning with data augmentation works surprisingly well in such scenarios and that it is resistant to overfitting even for very small target domain datasets. We evaluated the behavior of finetuning with respect to augmentation, training data size, and quality of the pre-trained network, both in writer-dependent and writer-independent settings. On a large real-world dataset, finetuning provided an average relative CER improvement of 25 % with 16 text lines for new writers and 50 % for 256 text lines.
在许多机器学习任务中,存在一个大一般性数据集和一个小型专门化数据集。在这种情况下,可以使用各种域适应方法将一个通用模型适应目标数据集。我们展示,在利用CTC训练手写识别神经网络的情况下,简单的数据增强方法在这类情况下出乎意料地有效,并且即使在目标域数据集非常小的情况下,也可以有效地避免过拟合。我们评价了增强和训练数据大小以及预训练网络质量与作者独立性设置下的表现。在一个大规模的现实世界数据集上,增强方法平均提高了相对CER的25%,对于新作者,每个文本行增加了16个,而对于256个文本行的新作者,增强方法提高了50%。
https://arxiv.org/abs/2302.06308
Text-writing robots have been used in assistive writing and drawing applications. However, robots do not convey emotional tones in the writing process due to the lack of behaviors humans typically adopt. To examine how people interpret designed robotic expressions of emotion through both movements and textual output, we used a pen-plotting robot to generate texts by performing human-like behaviors like stop-and-go, speed, and pressure variation. We examined how people convey emotion in the writing process by observing how they wrote in different emotional contexts. We then mapped these human expressions during writing to the handwriting robot and measured how well other participants understood the robot's affective expression. We found that textual output was the strongest determinant of participants' ability to perceive the robot's emotions, whereas parameters of gestural movements of the robots like speed, fluency, pressure, size, and acceleration could be useful for understanding the context of the writing expression.
文本写作机器人已经被应用于辅助写作和绘图应用程序中。然而,由于机器人缺乏人类通常采取的行为,因此在写作过程中无法传达情感语调。为了检查人们通过运动和文本输出如何解释设计的情感机器人表达,我们使用一支笔绘图机器人,通过执行类似于停格、速度变化和压力变化等人类行为来生成文本。我们检查了人们在写作过程中如何传达情感,通过观察他们在不同的情感背景下如何写作来进行。随后,我们将这些人类表达在写作中映射到手写机器人,并测量其他参与者如何理解机器人的情感表达。我们发现,文本输出是参与者感知机器人情感能力最强的决定性因素,而机器人的运动参数,如速度、流畅度、压力、大小和加速,对于理解写作表达的背景也非常重要。
https://arxiv.org/abs/2302.05959
The events of the past 2 years related to the pandemic have shown that it is increasingly important to find new tools to help mental health experts in diagnosing mood disorders. Leaving aside the longcovid cognitive (e.g., difficulty in concentration) and bodily (e.g., loss of smell) effects, the short-term covid effects on mental health were a significant increase in anxiety and depressive symptoms. The aim of this study is to use a new tool, the online handwriting and drawing analysis, to discriminate between healthy individuals and depressed patients. To this aim, patients with clinical depression (n = 14), individuals with high sub-clinical (diagnosed by a test rather than a doctor) depressive traits (n = 15) and healthy individuals (n = 20) were recruited and asked to perform four online drawing /handwriting tasks using a digitizing tablet and a special writing device. From the raw collected online data, seventeen drawing/writing features (categorized into five categories) were extracted, and compared among the three groups of the involved participants, through ANOVA repeated measures analyses. Results shows that Time features are more effective in discriminating between healthy and participants with sub-clinical depressive characteristics. On the other hand, Ductus and Pressure features are more effective in discriminating between clinical depressed and healthy participants.
过去几年与疫情相关的事件表明,寻找新工具来帮助精神健康专家诊断情绪障碍变得越来越重要。除了长期新冠认知(例如,难以集中注意力)和身体(例如,失去嗅觉)影响之外,短期新冠对心理健康的影响主要在于增加焦虑和抑郁症状。本研究的目标是使用一个新的工具——在线手写和绘图分析,来区分健康和个人患有抑郁症的患者。为了达成这一目标,患有临床抑郁症的患者(n=14)、具有高亚临床(诊断由测试而不是医生)抑郁特征的人(n=15)和健康个人(n=20)被招募,并使用数字平板电脑和特殊写作设备完成四个在线绘画/手写任务。从原始收集的在线数据中,17种绘画/手写特征(分类为五个类别)被提取,并通过ANOVA重复测量分析对参与的三个组进行比较。结果表明,时间特征在区分健康和个人患有亚临床抑郁特征方面更有效。另一方面,脉动和压力特征在区分临床抑郁症和健康参与者方面更有效。
https://arxiv.org/abs/2302.02499
We present a generative document-specific approach to character analysis and recognition in text lines. Our main idea is to build on unsupervised multi-object segmentation methods and in particular those that reconstruct images based on a limited amount of visual elements, called sprites. Our approach can learn a large number of different characters and leverage line-level annotations when available. Our contribution is twofold. First, we provide the first adaptation and evaluation of a deep unsupervised multi-object segmentation approach for text line analysis. Since these methods have mainly been evaluated on synthetic data in a completely unsupervised setting, demonstrating that they can be adapted and quantitatively evaluated on real text images and that they can be trained using weak supervision are significant progresses. Second, we demonstrate the potential of our method for new applications, more specifically in the field of paleography, which studies the history and variations of handwriting, and for cipher analysis. We evaluate our approach on three very different datasets: a printed volume of the Google1000 dataset, the Copiale cipher and historical handwritten charters from the 12th and early 13th century.
我们提出了一种针对文本行中字符分析和识别的生成式特定方法。我们的主要思想是在无监督的多对象分割方法的基础上建立,特别是那些基于少量视觉元素的Sprites模型的重建图像的方法。我们的方法可以在可用时学习大量不同字符,并利用水平注释。我们的贡献有两个。首先,我们提供了对深度无监督多对象分割方法的适应和评估,因为这些方法主要基于完全无监督的环境评估合成数据,证明了这些方法可以在真实文本图像上适应和定量评估,并使用弱监督进行训练,这是一个重要的进展。其次,我们展示了我们方法对新应用的潜力,特别是训读学,研究手写文字的历史和变异,以及密码学分析。我们评估了三个非常不同的数据集:Google1000数据的印刷版, Copiale密码和12世纪和13世纪初的手写文件。
https://arxiv.org/abs/2302.01660
This paper presents the design and implementation of WhisperWand, a comprehensive voice and motion tracking interface for voice assistants. Distinct from prior works, WhisperWand is a precise tracking interface that can co-exist with the voice interface on low sampling rate voice assistants. Taking handwriting as a specific application, it can also capture natural strokes and the individualized style of writing while occupying only a single frequency. The core technique includes an accurate acoustic ranging method called Cross Frequency Continuous Wave (CFCW) sonar, enabling voice assistants to use ultrasound as a ranging signal while using the regular microphone system of voice assistants as a receiver. We also design a new optimization algorithm that only requires a single frequency for time difference of arrival. WhisperWand prototype achieves 73 um of median error for 1D ranging and 1.4 mm of median error in 3D tracking of an acoustic beacon using the microphone array used in voice assistants. Our implementation of an in-air handwriting interface achieves 94.1% accuracy with automatic handwriting-to-text software, similar to writing on paper (96.6%). At the same time, the error rate of voice-based user authentication only increases from 6.26% to 8.28%.
本文介绍了WhisperWand的设计和实现,这是一种 comprehensive voice and motion tracking interface,用于语音助手。与以前的工作不同,WhisperWand是一种精确的跟踪接口,可以在低采样率语音助手中与语音接口共存。将手写文本作为具体应用,它可以捕捉自然笔画和个性化写作风格,同时占用只有一个频率。核心技术包括称为交叉频率连续波(CFCW)声学映射的方法,使其语音助手可以使用超声波作为距离信号,同时使用语音助手常用的麦克风系统作为接收器。我们还设计了一种新优化算法,只需要一个频率来计算到达时间差异。WhisperWand原型在1D测量中取得了73微米的平均误差,在3D跟踪语音 beacon 中使用麦克风阵列时取得了1.4毫米的平均误差。我们实现的空中手写文本接口使用自动手写文本软件实现94.1%的精度,类似于在纸上书写(96.6%)。同时,基于语音的用户身份验证错误率仅从6.26%增加到8.28%。
https://arxiv.org/abs/2301.10314
To this date, studies focusing on the prodromal diagnosis of Lewy body diseases (LBDs) based on quantitative analysis of graphomotor and handwriting difficulties are missing. In this work, we enrolled 18 subjects diagnosed with possible or probable mild cognitive impairment with Lewy bodies (MCI-LB), 7 subjects having more than 50% probability of developing Parkinson's disease (PD), 21 subjects with both possible/probable MCI-LB and probability of PD > 50%, and 37 age- and gender-matched healthy controls (HC). Each participant performed three tasks: Archimedean spiral drawing (to quantify graphomotor difficulties), sentence writing task (to quantify handwriting difficulties), and pentagon copying test (to quantify cognitive decline). Next, we parameterized the acquired data by various temporal, kinematic, dynamic, spatial, and task-specific features. And finally, we trained classification models for each task separately as well as a model for their combination to estimate the predictive power of the features for the identification of LBDs. Using this approach we were able to identify prodromal LBDs with 74% accuracy and showed the promising potential of computerized objective and non-invasive diagnosis of LBDs based on the assessment of graphomotor and handwriting difficulties.
https://arxiv.org/abs/2301.08534