Abstract
Recognizing emotions from speech is a daunting task due to the subtlety and ambiguity of expressions. Traditional speech emotion recognition (SER) systems, which typically rely on a singular, precise emotion label, struggle with this complexity. Therefore, modeling the inherent ambiguity of emotions is an urgent problem. In this paper, we propose an iterative prototype refinement framework (IPR) for ambiguous SER. IPR comprises two interlinked components: contrastive learning and class prototypes. The former provides an efficient way to obtain high-quality representations of ambiguous samples. The latter are dynamically updated based on ambiguous labels -- the similarity of the ambiguous data to all prototypes. These refined embeddings yield precise pseudo labels, thus reinforcing representation quality. Experimental evaluations conducted on the IEMOCAP dataset validate the superior performance of IPR over state-of-the-art methods, thus proving the effectiveness of our proposed method.
Abstract (translated)
识别语音中的情感是一个具有挑战性和复杂性的任务,因为情感表达的微妙性和不确定性。传统的语音情感识别(SER)系统通常依赖单一、精确的情感标签,很难应对这种复杂性。因此,建模情感固有的歧义性是一个紧迫的问题。在本文中,我们提出了一个迭代原型改进框架(IPR)来解决模糊的SER。IPR包括两个相互链接的组件:对比学习和支持类原型。前者提供了一种有效的方法来获得高质量的模糊样本的高质量表示。后者根据模糊的标签动态更新,基于模糊数据的相似性与所有原型的一致性。这些平滑的嵌入产生了精确的伪标签,从而提高了表示质量。在IEMOCAP数据集上进行实验评估证实了IPR相对于最先进方法的卓越性能,从而证明了我们提出方法的有效性。
URL
https://arxiv.org/abs/2408.00325