Paper Reading AI Learner

Evolution of Voices in French Audiovisual Media Across Genders and Age in a Diachronic Perspective

2024-04-24 18:00:06
Albert Rilliard, David Doukhan, Rémi Uro, Simon Devauchelle

Abstract

We present a diachronic acoustic analysis of the voice of 1023 speakers from French media archives. The speakers are spread across 32 categories based on four periods (years 1955/56, 1975/76, 1995/96, 2015/16), four age groups (20-35; 36-50; 51-65, >65), and two genders. The fundamental frequency ($F_0$) and the first four formants (F1-4) were estimated. Procedures used to ensure the quality of these estimations on heterogeneous data are described. From each speaker's $F_0$ distribution, the base-$F_0$ value was calculated to estimate the register. Average vocal tract length was estimated from formant frequencies. Base-$F_0$ and vocal tract length were fit by linear mixed models to evaluate how they may have changed across time periods and genders, corrected for age effects. Results show an effect of the period with a tendency to lower voices, independently of gender. A lowering of pitch is observed with age for female but not male speakers.

Abstract (translated)

我们对来自法国媒体档案的1023个发言者的声音进行了一个历时声学分析。发言者跨越了基于四个时期(1955/56年, 1975/76年, 1995/96年, 2015/16年)的32个类别,以及两个性别(男性和女性)。我们估计了基本频率($F_0$)和第一个前四个形态(F1-4)。描述了保证这些估计质量的程序。从每个发言者的$F_0$分布中,计算了基点-$F_0$值以估计调高。从形态频率中估计了平均声道长度。通过线性混合模型,将基点-$F_0$和声道长度对年龄效应进行校正,以评估它们在不同时间和性别下的变化。结果表明,随着周期性的降低声音。对于女性,随着年龄的增长,声音的音调降低。而对于男性,没有观察到这种现象。

URL

https://arxiv.org/abs/2404.16104

PDF

https://arxiv.org/pdf/2404.16104.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot