Paper Reading AI Learner

CoPaSul Manual - Contour-based parametric and superpositional intonation stylization

2018-07-31 11:12:42
Uwe D. Reichel

Abstract

The purposes of the CoPaSul toolkit are (1) automatic prosodic annotation and (2) prosodic feature extraction from syllable to utterance level. CoPaSul stands for contour-based, parametric, superpositional intonation stylization. In this framework intonation is represented as a superposition of global and local contours that are described parametrically in terms of polynomial coefficients. On the global level (usually associated but not necessarily restricted to intonation phrases) the stylization serves to represent register in terms of time-varying F0 level and range. On the local level (e.g. accent groups), local contour shapes are described. From this parameterization several features related to prosodic boundaries and prominence can be derived. Furthermore, by coefficient clustering prosodic contour classes can be obtained in a bottom-up way. Next to the stylization-based feature extraction also standard F0 and energy measures (e.g. mean and variance) as well as rhythmic aspects can be calculated. At the current state automatic annotation comprises: segmentation into interpausal chunks, syllable nucleus extraction, and unsupervised localization of prosodic phrase boundaries and prominent syllables. F0 and partly also energy feature sets can be derived for: standard measurements (as median and IQR), register in terms of F0 level and range, prosodic boundaries, local contour shapes, bottom-up derived contour classes, Gestalt of accent groups in terms of their deviation from higher level prosodic units, as well as for rhythmic aspects quantifying the relation between F0 and energy contours and prosodic event rates.

Abstract (translated)

CoPaSul工具包的目的是(1)自动韵律注释和(2)从音节到话语水平的韵律特征提取。 CoPaSul代表基于轮廓的,参数化的,叠加的语调风格。在该框架中,语调表示为全局和局部轮廓的叠加,其根据多项式系数参数化地描述。在全局层面(通常相关但不一定限于语调短语),样式化用于表示在时变F0水平和范围方面的寄存器。在局部级别(例如,重音组),描述了局部轮廓形状。通过该参数化,可以导出与韵律边界和突出相关的若干特征。此外,通过系数聚类,可以以自下而上的方式获得韵律轮廓类。除了基于风格化的特征提取之外,还可以计算标准F0和能量测量(例如,均值和方差)以及节奏方面。在当前状态下,自动注释包括:分割成间隔块,音节核提取,以及韵律短语边界和突出音节的无监督定位。 F0和部分能量特征集可以推导出:标准测量(中位数和IQR),根据F0水平和范围记录,韵律边界,局部轮廓形状,自下而上导出的轮廓类,重音组的格式塔他们偏离更高级别的韵律单位,以及量化F0与能量等高线和韵律事件率之间关系的节奏方面。

URL

https://arxiv.org/abs/1612.04765

PDF

https://arxiv.org/pdf/1612.04765.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot