Abstract
We present JNV (Japanese Nonverbal Vocalizations) corpus, a corpus of Japanese nonverbal vocalizations (NVs) with diverse phrases and emotions. Existing Japanese NV corpora lack phrase or emotion diversity, which makes it difficult to analyze NVs and support downstream tasks like emotion recognition. We first propose a corpus-design method that contains two phases: (1) collecting NVs phrases based on crowd-sourcing; (2) recording NVs by stimulating speakers with emotional scenarios. We then collect $420$ audio clips from $4$ speakers that cover $6$ emotions based on the proposed method. Results of comprehensive objective and subjective experiments demonstrate that the collected NVs have high emotion recognizability and authenticity that are comparable to previous corpora of English NVs. Additionally, we analyze the distributions of vowel types in Japanese NVs. To our best knowledge, JNV is currently the largest Japanese NVs corpus in terms of phrase and emotion diversities.
Abstract (translated)
我们提出了 JNV (日本非言语语音化) corpus,一个包含多种短语和情绪的日本非言语语音化(NV) corpus。现有的日本 NV corpora 缺乏短语或情绪多样性,这使得很难分析和支持后续任务,如情绪识别。我们首先提出了一个 corpus 设计方法,其中包括两个阶段:(1)基于众包收集 NVs 的短语;(2)通过刺激演讲者以情感场景来记录 NVs。然后我们从 $4$ 名演讲者中收集了 $420$ 个音频片段,涵盖了 $6$ 种情绪。综合客观和主观实验的结果表明,收集到的 NVs 具有高情感识别度和真实性,与以前的英语 NVs corpora 相当。此外,我们分析了日本 NVs 中各种元音类型的分布情况。据我们所知,JNV 目前是日语 NVs corpus 中短语和情绪多样性最大的一份。
URL
https://arxiv.org/abs/2305.12445