Paper Reading AI Learner

Sigma-lognormal modeling of speech

2024-01-27 18:00:20
C. Carmona-Duarte, M.A.Ferrer, R. Plamondon, A. Gomez-Rodellar, P. Gomez-Vilda

Abstract

Human movement studies and analyses have been fundamental in many scientific domains, ranging from neuroscience to education, pattern recognition to robotics, health care to sports, and beyond. Previous speech motor models were proposed to understand how speech movement is produced and how the resulting speech varies when some parameters are changed. However, the inverse approach, in which the muscular response parameters and the subject's age are derived from real continuous speech, is not possible with such models. Instead, in the handwriting field, the kinematic theory of rapid human movements and its associated Sigma-lognormal model have been applied successfully to obtain the muscular response parameters. This work presents a speech kinematics based model that can be used to study, analyze, and reconstruct complex speech kinematics in a simplified manner. A method based on the kinematic theory of rapid human movements and its associated Sigma lognormal model are applied to describe and to parameterize the asymptotic impulse response of the neuromuscular networks involved in speech as a response to a neuromotor command. The method used to carry out transformations from formants to a movement observation is also presented. Experiments carried out with the (English) VTR TIMIT database and the (German) Saarbrucken Voice Database, including people of different ages, with and without laryngeal pathologies, corroborate the link between the extracted parameters and aging, on the one hand, and the proportion between the first and second formants required in applying the kinematic theory of rapid human movements, on the other. The results should drive innovative developments in the modeling and understanding of speech kinematics.

Abstract (translated)

人类运动研究和分析在许多科学领域都至关重要,从神经科学到教育,模式识别到机器人学,医疗保健到体育,等等。提出了前馈Speech motor模型来理解言语运动是如何产生的以及当参数改变时产生的言语变化。然而,基于这样的模型,反向方法,即从真实连续语音中提取肌肉反应参数和受试者年龄,是不可能的。相反,在手写领域,成功应用了快速人类运动学及其相关Sigma-logistic模型来获得肌肉反应参数。这项工作提出了一种可以简便地研究和分析复杂言语运动学的手写模型。基于快速人类运动学及其相关Sigma-logistic模型的方法被应用于描述和参数化参与言语的神经肌肉网络的渐进激励响应作为对神经肌肉命令的反应。描述从辅音到运动观察的变换的方法也被提出。使用(英语)VTRIMIT数据库和(德国)Saarbrucken Voice Database进行实验,包括不同年龄、有无喉病的人,证实了提取的参数与衰老之间的联系,以及应用快速人类运动学时所需的第一个和第二个辅音之间的比例。结果应该推动在言语运动学建模和理解方面的创新发展。

URL

https://arxiv.org/abs/2401.17320

PDF

https://arxiv.org/pdf/2401.17320.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot