Abstract
Human movement studies and analyses have been fundamental in many scientific domains, ranging from neuroscience to education, pattern recognition to robotics, health care to sports, and beyond. Previous speech motor models were proposed to understand how speech movement is produced and how the resulting speech varies when some parameters are changed. However, the inverse approach, in which the muscular response parameters and the subject's age are derived from real continuous speech, is not possible with such models. Instead, in the handwriting field, the kinematic theory of rapid human movements and its associated Sigma-lognormal model have been applied successfully to obtain the muscular response parameters. This work presents a speech kinematics based model that can be used to study, analyze, and reconstruct complex speech kinematics in a simplified manner. A method based on the kinematic theory of rapid human movements and its associated Sigma lognormal model are applied to describe and to parameterize the asymptotic impulse response of the neuromuscular networks involved in speech as a response to a neuromotor command. The method used to carry out transformations from formants to a movement observation is also presented. Experiments carried out with the (English) VTR TIMIT database and the (German) Saarbrucken Voice Database, including people of different ages, with and without laryngeal pathologies, corroborate the link between the extracted parameters and aging, on the one hand, and the proportion between the first and second formants required in applying the kinematic theory of rapid human movements, on the other. The results should drive innovative developments in the modeling and understanding of speech kinematics.
Abstract (translated)
人类运动研究和分析在许多科学领域都至关重要,从神经科学到教育,模式识别到机器人学,医疗保健到体育,等等。提出了前馈Speech motor模型来理解言语运动是如何产生的以及当参数改变时产生的言语变化。然而,基于这样的模型,反向方法,即从真实连续语音中提取肌肉反应参数和受试者年龄,是不可能的。相反,在手写领域,成功应用了快速人类运动学及其相关Sigma-logistic模型来获得肌肉反应参数。这项工作提出了一种可以简便地研究和分析复杂言语运动学的手写模型。基于快速人类运动学及其相关Sigma-logistic模型的方法被应用于描述和参数化参与言语的神经肌肉网络的渐进激励响应作为对神经肌肉命令的反应。描述从辅音到运动观察的变换的方法也被提出。使用(英语)VTRIMIT数据库和(德国)Saarbrucken Voice Database进行实验,包括不同年龄、有无喉病的人,证实了提取的参数与衰老之间的联系,以及应用快速人类运动学时所需的第一个和第二个辅音之间的比例。结果应该推动在言语运动学建模和理解方面的创新发展。
URL
https://arxiv.org/abs/2401.17320