Abstract
Large-scale face recognition datasets are collected by crawling the Internet and without individuals' consent, raising legal, ethical, and privacy concerns. With the recent advances in generative models, recently several works proposed generating synthetic face recognition datasets to mitigate concerns in web-crawled face recognition datasets. This paper presents the summary of the Synthetic Data for Face Recognition (SDFR) Competition held in conjunction with the 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2024) and established to investigate the use of synthetic data for training face recognition models. The SDFR competition was split into two tasks, allowing participants to train face recognition systems using new synthetic datasets and/or existing ones. In the first task, the face recognition backbone was fixed and the dataset size was limited, while the second task provided almost complete freedom on the model backbone, the dataset, and the training pipeline. The submitted models were trained on existing and also new synthetic datasets and used clever methods to improve training with synthetic data. The submissions were evaluated and ranked on a diverse set of seven benchmarking datasets. The paper gives an overview of the submitted face recognition models and reports achieved performance compared to baseline models trained on real and synthetic datasets. Furthermore, the evaluation of submissions is extended to bias assessment across different demography groups. Lastly, an outlook on the current state of the research in training face recognition models using synthetic data is presented, and existing problems as well as potential future directions are also discussed.
Abstract (translated)
大规模面部识别数据集是通过爬取互联网收集的,未经个人许可,引发法律、伦理和隐私问题。随着生成模型的最新进展,最近的一些工作提出了生成合成面部识别数据集以减轻爬取互联网面部识别数据集所带来的担忧。本文概述了与IEEE 18届国际面部和手势识别会议(FG 2024)共同举办的合成数据面部识别(SDFR)比赛的摘要,以研究合成数据在训练面部识别模型中的使用。SDFR比赛分为两个任务,允许参赛者使用新的合成数据集或现有数据集来训练面部识别系统。在第一任务中,面部识别骨架被固定,数据集大小有限,而第二任务为模型骨架、数据和训练流程提供了几乎完全的自由。提交的比赛模型在现有和新的合成数据集上进行训练,并采用了一些巧妙的方法来提高利用合成数据进行训练的效果。提交的模型在七个基准数据集上进行评估并排名。本文概述了提交的面部识别模型,并报告了与基于真实和合成数据进行训练的基线模型的性能。此外,评估提交模型的方法还扩展到了不同 demographic 群体之间的偏见评估。最后,本文概述了使用合成数据训练面部识别模型的当前研究进展,并讨论了现有问题和未来可能的发展方向。
URL
https://arxiv.org/abs/2404.04580