Abstract
Nowadays, deep learning models have reached incredible performance in the task of image generation. Plenty of literature works address the task of face generation and editing, with human and automatic systems that struggle to distinguish what's real from generated. Whereas most systems reached excellent visual generation quality, they still face difficulties in preserving the identity of the starting input subject. Among all the explored techniques, Semantic Image Synthesis (SIS) methods, whose goal is to generate an image conditioned on a semantic segmentation mask, are the most promising, even though preserving the perceived identity of the input subject is not their main concern. Therefore, in this paper, we investigate the problem of identity preservation in face image generation and present an SIS architecture that exploits a cross-attention mechanism to merge identity, style, and semantic features to generate faces whose identities are as similar as possible to the input ones. Experimental results reveal that the proposed method is not only suitable for preserving the identity but is also effective in the face recognition adversarial attack, i.e. hiding a second identity in the generated faces.
Abstract (translated)
如今,在图像生成任务中,深度学习模型已经达到了惊人的表现。大量文献都研究了面向人脸生成和编辑的任务,其中人类和自动系统很难区分真实和生成内容。虽然大多数系统都达到了出色的视觉生成质量,但它们仍然面临着保留输入主题身份的困难。在所有探索的技术中,语义图像合成(SIS)方法最具前景,尽管它们保留输入主题身份的不是它们的主要关注点。因此,在本文中,我们研究了人脸图像生成中身份保留的问题,并提出了一个SIS架构,它利用交叉注意机制将身份、风格和语义特征合并以生成尽可能与输入主题相似的面孔。实验结果表明,所提出的方法不仅适合保留身份,而且在面部识别对抗攻击中也非常有效,即在生成面部时隐藏第二个身份。
URL
https://arxiv.org/abs/2404.10408