Abstract
The advancements of technology have led to the use of multimodal systems in various real-world applications. Among them, the audio-visual systems are one of the widely used multimodal systems. In the recent years, associating face and voice of a person has gained attention due to presence of unique correlation between them. The Face-voice Association in Multilingual Environments (FAME) Challenge 2024 focuses on exploring face-voice association under a unique condition of multilingual scenario. This condition is inspired from the fact that half of the world's population is bilingual and most often people communicate under multilingual scenario. The challenge uses a dataset namely, Multilingual Audio-Visual (MAV-Celeb) for exploring face-voice association in multilingual environments. This report provides the details of the challenge, dataset, baselines and task details for the FAME Challenge.
Abstract (translated)
技术的进步导致各种现实应用中多模态系统的使用。其中,音频-视频系统是应用最广泛的多模态系统之一。在最近几年里,由于个人脸部和声音之间独特的相关性,人们对面部和声音的关联引起了关注。多语言环境中的Face-voice协会(FAME)挑战2024专注于探讨在独特多语言场景下脸部-声音的关联。这一条件灵感来自于世界上半数人口是双语,人们通常在多语言场景下交流的事实。挑战使用了一个数据集,即多语言音频-视频(MAV-Celeb)数据集,以探索多语言环境中的脸部-声音协会。本报告提供了FAME挑战的详细信息、数据集、基线和任务细节。
URL
https://arxiv.org/abs/2404.09342