The widespread adoption of face recognition has led to increasing privacy concerns, as unauthorized access to face images can expose sensitive personal information. This paper explores face image protection against viewing and recovery attacks. Inspired by image compression, we propose creating a visually uninformative face image through feature subtraction between an original face and its model-produced regeneration. Recognizable identity features within the image are encouraged by co-training a recognition model on its high-dimensional feature representation. To enhance privacy, the high-dimensional representation is crafted through random channel shuffling, resulting in randomized recognizable images devoid of attacker-leverageable texture details. We distill our methodologies into a novel privacy-preserving face recognition method, MinusFace. Experiments demonstrate its high recognition accuracy and effective privacy protection. Its code is available at this https URL.
随着人脸识别的广泛应用,隐私问题越来越引起人们的关注,因为未经授权地访问人脸图像会泄露敏感的个人信息。本文探讨了防止观看和恢复攻击的人脸图像保护方法。为了实现这一目标,我们提出了通过在原始人脸和其模型的特征下采样来创建具有视觉上无信息性的人脸图像的方法。通过在图像中鼓励可识别身份特征的识别模型在其高维特征表示上进行共同训练,我们促进了识别模型的可识别性。为了增强隐私,我们通过随机通道洗牌来制作高维表示,从而生成无攻击者利用的纹理细节的随机可识别图像。我们将我们的方法归类为一种新的隐私保护人脸识别方法,称为MinusFace。实验证明,其高识别准确性和有效的隐私保护功能。其代码可在此处访问:https://www. this URL。
https://arxiv.org/abs/2403.12457
Face inpainting, the technique of restoring missing or damaged regions in facial images, is pivotal for applications like face recognition in occluded scenarios and image analysis with poor-quality captures. This process not only needs to produce realistic visuals but also preserve individual identity characteristics. The aim of this paper is to inpaint a face given periocular region (eyes-to-face) through a proposed new Generative Adversarial Network (GAN)-based model called Eyes-to-Face Network (E2F-Net). The proposed approach extracts identity and non-identity features from the periocular region using two dedicated encoders have been used. The extracted features are then mapped to the latent space of a pre-trained StyleGAN generator to benefit from its state-of-the-art performance and its rich, diverse and expressive latent space without any additional training. We further improve the StyleGAN output to find the optimal code in the latent space using a new optimization for GAN inversion technique. Our E2F-Net requires a minimum training process reducing the computational complexity as a secondary benefit. Through extensive experiments, we show that our method successfully reconstructs the whole face with high quality, surpassing current techniques, despite significantly less training and supervision efforts. We have generated seven eyes-to-face datasets based on well-known public face datasets for training and verifying our proposed methods. The code and datasets are publicly available.
面部修复技术,即在面部图像中恢复缺失或受损区域的算法,对于应用如在遮挡场景下进行面部识别和低质量图像分析来说至关重要。这一过程不仅要产生逼真的视觉效果,还应保留个体的身份特征。本文旨在通过一种基于提出的新生成对抗网络(GAN)模型,即Eyes-to-Face Network(E2F-Net),对给定的外侧眼部区域(从眼睛到脸)进行修复。该方法通过使用两个专门编码器从外侧眼部区域提取身份和无关特征。提取的特征随后映射到预训练的StyleGAN生成器的潜在空间,以利用其最先进的性能和丰富的、多样化和表现力的潜在空间而无需额外训练。我们进一步通过新GAN反向优化技术对StyleGAN输出进行优化,以找到在潜在空间中最佳的代码。我们的E2F-Net需要最小训练过程,作为其次要好处,从而降低计算复杂度。通过广泛的实验,我们发现,我们的方法在高质量地重构整个面部,超越现有技术,尽管训练和监督 efforts大大减少。我们已经基于知名公共面部数据集生成七个眼睛-to-face数据集,用于训练和验证我们所提出的方法。代码和数据集都是公开可用的。
https://arxiv.org/abs/2403.12197
This paper presents Arc2Face, an identity-conditioned face foundation model, which, given the ArcFace embedding of a person, can generate diverse photo-realistic images with an unparalleled degree of face similarity than existing models. Despite previous attempts to decode face recognition features into detailed images, we find that common high-resolution datasets (e.g. FFHQ) lack sufficient identities to reconstruct any subject. To that end, we meticulously upsample a significant portion of the WebFace42M database, the largest public dataset for face recognition (FR). Arc2Face builds upon a pretrained Stable Diffusion model, yet adapts it to the task of ID-to-face generation, conditioned solely on ID vectors. Deviating from recent works that combine ID with text embeddings for zero-shot personalization of text-to-image models, we emphasize on the compactness of FR features, which can fully capture the essence of the human face, as opposed to hand-crafted prompts. Crucially, text-augmented models struggle to decouple identity and text, usually necessitating some description of the given face to achieve satisfactory similarity. Arc2Face, however, only needs the discriminative features of ArcFace to guide the generation, offering a robust prior for a plethora of tasks where ID consistency is of paramount importance. As an example, we train a FR model on synthetic images from our model and achieve superior performance to existing synthetic datasets.
本文介绍了一种名为Arc2Face的身份条件面部模型,它通过ArcFace嵌入一个人的形象,可以生成具有无与伦比的脸部相似度的多样化的照片现实图像。然而,尽管以前尝试将脸部识别特征编码为详细图像,我们发现常见的分辨率数据集(如FFHQ)缺乏足够的标识来重构任何主题。因此,我们仔细放大了WebFace42M数据库,这是面部识别(FR)领域公共数据中最大的一个。Arc2Face在预训练的Stable Diffusion模型基础上进行了调整,仅基于标识向量进行条件生成。我们强调了FR特征的紧凑性,可以完全捕捉到人脸的本质,而不是通过手动定制提示来获得相似度。至关重要的是,文本增强模型往往难以将标识和文本分离,通常需要对给定的脸进行描述以实现满意的相似度。然而,Arc2Face只需要ArcFace的区分特征来指导生成,为许多任务提供了至关重要的ID一致性先验,这些任务的ID一致性至关重要。例如,我们将FR模型从我们的模型上合成合成图像进行训练,实现了优于现有合成数据集的性能。
https://arxiv.org/abs/2403.11641
For privacy and security concerns, the need to erase unwanted information from pre-trained vision models is becoming evident nowadays. In real-world scenarios, erasure requests originate at any time from both users and model owners. These requests usually form a sequence. Therefore, under such a setting, selective information is expected to be continuously removed from a pre-trained model while maintaining the rest. We define this problem as continual forgetting and identify two key challenges. (i) For unwanted knowledge, efficient and effective deleting is crucial. (ii) For remaining knowledge, the impact brought by the forgetting procedure should be minimal. To address them, we propose Group Sparse LoRA (GS-LoRA). Specifically, towards (i), we use LoRA modules to fine-tune the FFN layers in Transformer blocks for each forgetting task independently, and towards (ii), a simple group sparse regularization is adopted, enabling automatic selection of specific LoRA groups and zeroing out the others. GS-LoRA is effective, parameter-efficient, data-efficient, and easy to implement. We conduct extensive experiments on face recognition, object detection and image classification and demonstrate that GS-LoRA manages to forget specific classes with minimal impact on other classes. Codes will be released on \url{this https URL}.
为了隐私和安全问题,现在需要从预训练视觉模型中清除不需要的信息变得越来越明显。在现实场景中,清除请求可能来自用户和模型所有者,通常会形成一个序列。因此,在这种情况下,我们期望在保持其余信息的同时,持续从预训练模型中移除特定的信息。我们将这个问题称为持续遗忘,并确定两个关键挑战。 (i) 对于不需要的知识,高效的删除至关重要。 (ii) 对于保留的知识,遗忘过程所带来的影响应最小化。 为了解决这些问题,我们提出了Group Sparse LoRA (GS-LoRA)。具体来说,对于(i),我们使用LoRA模块对每个遗忘任务独立微调Transformer块中的FFN层,而对于(ii),采用简单的组稀疏 regularization,使自动选择特定LoRA组并消除其他组。GS-LoRA有效、参数效率高、数据效率高,并且易于实现。我们在面部识别、目标检测和图像分类上进行广泛的实验,并证明GS-LoRA能够以最小的影响对待定类别的其他类别。代码发布在[这个链接](https:// this URL)。
https://arxiv.org/abs/2403.11530
Face morphing attacks circumvent face recognition systems (FRSs) by creating a morphed image that contains multiple identities. However, existing face morphing attack methods either sacrifice image quality or compromise the identity preservation capability. Consequently, these attacks fail to bypass FRSs verification well while still managing to deceive human observers. These methods typically rely on global information from contributing images, ignoring the detailed information from effective facial regions. To address the above issues, we propose a novel morphing attack method to improve the quality of morphed images and better preserve the contributing identities. Our proposed method leverages the hierarchical generative network to capture both local detailed and global consistency information. Additionally, a mask-guided image blending module is dedicated to removing artifacts from areas outside the face to improve the image's visual quality. The proposed attack method is compared to state-of-the-art methods on three public datasets in terms of FRSs' vulnerability, attack detectability, and image quality. The results show our method's potential threat of deceiving FRSs while being capable of passing multiple morphing attack detection (MAD) scenarios.
脸部变形攻击绕过了面部识别系统(FRSs),通过创建一个包含多个身份的变形图像来实现。然而,现有的面部变形攻击方法要么牺牲图像质量,要么破坏身份保留能力。因此,这些攻击在绕过FRSs验证的同时也无法欺骗人类观察者。这些方法通常依赖于来自贡献图像的全局信息,而忽略了有效面部区域的具体信息。为了应对上述问题,我们提出了一种新的变形攻击方法,以提高变形图像的质量并更好地保留贡献者的身份。我们的方法利用分层生成网络来捕获局部详细和全局一致性信息。此外,还专门针对脸部以外区域的修复图像质量模块,以提高图像的视觉质量。我们将在三个公共数据集上与最先进的攻击方法进行比较,比较结果表明,我们的方法在欺骗FRSs的同时,能够通过多个变形攻击检测(MAD)场景。
https://arxiv.org/abs/2403.11101
With the comprehensive research conducted on various face analysis tasks, there is a growing interest among researchers to develop a unified approach to face perception. Existing methods mainly discuss unified representation and training, which lack task extensibility and application efficiency. To tackle this issue, we focus on the unified model structure, exploring a face generalist model. As an intuitive design, Naive Faceptor enables tasks with the same output shape and granularity to share the structural design of the standardized output head, achieving improved task extensibility. Furthermore, Faceptor is proposed to adopt a well-designed single-encoder dual-decoder architecture, allowing task-specific queries to represent new-coming semantics. This design enhances the unification of model structure while improving application efficiency in terms of storage overhead. Additionally, we introduce Layer-Attention into Faceptor, enabling the model to adaptively select features from optimal layers to perform the desired tasks. Through joint training on 13 face perception datasets, Faceptor achieves exceptional performance in facial landmark localization, face parsing, age estimation, expression recognition, binary attribute classification, and face recognition, achieving or surpassing specialized methods in most tasks. Our training framework can also be applied to auxiliary supervised learning, significantly improving performance in data-sparse tasks such as age estimation and expression recognition. The code and models will be made publicly available at this https URL.
在对各种面部分析任务进行全面的调查和研究后,越来越多的研究者对发展统一的面部感知方法产生了浓厚的兴趣。现有的方法主要讨论了统一的表示和训练,缺乏任务的扩展性和应用效率。为解决这个问题,我们关注统一的模型结构,研究了一个面部通用模型。作为一种直观的设计,Naive Faceptor使具有相同输出形状和粒度的任务可以共享标准输出头的结构设计,从而实现提高任务扩展性的目标。此外,Faceptor还提出了一个设计良好的单编码器双解码器架构,允许任务特定的查询表示新兴的语义。这种设计在提高模型结构统一的同时,提高了存储开销的应用效率。此外,我们还引入了层注意力机制到Faceptor中,使模型能够动态选择最优层中的特征来执行所需任务。通过在13个面部感知数据集上进行联合训练,Faceptor在面部关键点定位、面部解析、年龄估计、表情识别、二进制属性分类和面部识别等方面取得了惊人的性能,超越了大多数专用方法。我们的训练框架也可以应用于辅助监督学习,在数据稀疏任务(如年龄估计和表情识别)中显著提高性能。代码和模型将在这个https:// URL上公开发布。
https://arxiv.org/abs/2403.09500
Due to advancements in digital cameras, it is easy to gather multiple images (or videos) from an object under different conditions. Therefore, image-set classification has attracted more attention, and different solutions were proposed to model them. A popular way to model image sets is subspaces, which form a manifold called the Grassmann manifold. In this contribution, we extend the application of Generalized Relevance Learning Vector Quantization to deal with Grassmann manifold. The proposed model returns a set of prototype subspaces and a relevance vector. While prototypes model typical behaviours within classes, the relevance factors specify the most discriminative principal vectors (or images) for the classification task. They both provide insights into the model's decisions by highlighting influential images and pixels for predictions. Moreover, due to learning prototypes, the model complexity of the new method during inference is independent of dataset size, unlike previous works. We applied it to several recognition tasks including handwritten digit recognition, face recognition, activity recognition, and object recognition. Experiments demonstrate that it outperforms previous works with lower complexity and can successfully model the variation, such as handwritten style or lighting conditions. Moreover, the presence of relevances makes the model robust to the selection of subspaces' dimensionality.
由于数字相机的进步,从不同条件下对象上收集多张图像(或视频)变得很容易。因此,图像集分类引起了更多关注,并提出了各种解决方案来建模它们。建模图像集的一种流行方法是子空间,它们形成了一个称为Grassmann子空间的维多面。在本文中,我们将扩展泛化相关性学习向量量化对Grassmann子空间的应用。所提出的模型返回了一组原型子空间和一个相关向量。虽然原型模型了类内典型行为,但相关因素指定了对分类任务的最具判别性的主向量(或图像)。它们通过突出预测过程中具有影响力的图像和像素,为模型决策提供了洞察。此外,由于学习原型,新方法在推理过程中的模型复杂度与数据集大小无关,而不同于之前的工作。我们将它应用于多个识别任务,包括手写数字识别、人脸识别、活动识别和物体识别。实验证明,它在较低复杂度的情况下优于之前的工作,并能够成功建模变化,如手写风格或光线条件。此外,存在相关性使得模型对选择子空间维数具有鲁棒性。
https://arxiv.org/abs/2403.09183
The utilization of personal sensitive data in training face recognition (FR) models poses significant privacy concerns, as adversaries can employ model inversion attacks (MIA) to infer the original training data. Existing defense methods, such as data augmentation and differential privacy, have been employed to mitigate this issue. However, these methods often fail to strike an optimal balance between privacy and accuracy. To address this limitation, this paper introduces an adaptive hybrid masking algorithm against MIA. Specifically, face images are masked in the frequency domain using an adaptive MixUp strategy. Unlike the traditional MixUp algorithm, which is predominantly used for data augmentation, our modified approach incorporates frequency domain mixing. Previous studies have shown that increasing the number of images mixed in MixUp can enhance privacy preservation but at the expense of reduced face recognition accuracy. To overcome this trade-off, we develop an enhanced adaptive MixUp strategy based on reinforcement learning, which enables us to mix a larger number of images while maintaining satisfactory recognition accuracy. To optimize privacy protection, we propose maximizing the reward function (i.e., the loss function of the FR system) during the training of the strategy network. While the loss function of the FR network is minimized in the phase of training the FR network. The strategy network and the face recognition network can be viewed as antagonistic entities in the training process, ultimately reaching a more balanced trade-off. Experimental results demonstrate that our proposed hybrid masking scheme outperforms existing defense algorithms in terms of privacy preservation and recognition accuracy against MIA.
将个人敏感数据用于训练人脸识别(FR)模型会带来显著的隐私问题,因为攻击者可以使用模型反向攻击(MIA)推断原始训练数据。为减轻这个问题,已经使用了数据增强和差分隐私等方法。然而,这些方法往往无法在隐私和准确性之间取得最优平衡。为了解决这个问题,本文引入了一种对抗MIA的自适应混合掩码算法。具体来说,使用自适应MixUp策略对频域中的人脸图像进行遮盖。与传统的MixUp算法不同,我们修改的方法包含了频域混合。以前的研究表明,增加MixUp中混合的图像数量可以提高隐私保护,但以降低人脸识别准确性为代价。为了克服这个权衡,我们基于强化学习开发了一种增强自适应MixUp策略,使得我们可以在保持满意识别准确性的同时混合更多图像。为了优化隐私保护,我们提出在策略网络训练过程中最大化奖励函数(即FR系统的损失函数)。而FR网络在训练过程中损失函数已经最小化。策略网络和面部识别网络可以看作是在训练过程中相互对抗的实体,最终达到更平衡的权衡。实验结果表明,我们提出的混合掩码方案在对抗MIA方面优于现有的防御算法,在隐私保护和识别准确性方面都表现出色。
https://arxiv.org/abs/2403.10558
Deep learning-based face recognition continues to face challenges due to its reliance on huge datasets obtained from web crawling, which can be costly to gather and raise significant real-world privacy concerns. To address this issue, we propose VIGFace, a novel framework capable of generating synthetic facial images. Initially, we train the face recognition model using a real face dataset and create a feature space for both real and virtual IDs where virtual prototypes are orthogonal to other prototypes. Subsequently, we generate synthetic images by using the diffusion model based on the feature space. Our proposed framework provides two significant benefits. Firstly, it allows for creating virtual facial images without concerns about portrait rights, guaranteeing that the generated virtual face images are clearly differentiated from existing individuals. Secondly, it serves as an effective augmentation method by incorporating real existing images. Further experiments demonstrate the efficacy of our framework, achieving state-of-the-art results from both perspectives without any external data.
基于深度学习的面部识别由于其依赖从网页爬取获得的巨大数据集,可能会产生昂贵的成本,同时也存在显著的隐私问题。为解决这一问题,我们提出了VIGFace,一种新框架,能够生成合成面部图像。首先,我们使用真实面部数据集训练面部识别模型,并创建了真实和虚拟ID之间的特征空间。接着,我们通过基于特征空间的扩散模型生成合成图像。我们提出的框架提供了两个显著的优势。首先,它允许在不考虑肖像权的情况下创建虚拟面部图像,确保生成的虚拟面部图像与现有个体有明显区别。其次,它是一种有效的增强方法,通过融入真实现有图像来发挥作用。进一步的实验证明了我们的框架的有效性,从两个方面实现了最先进的水平,而没有任何外部数据。
https://arxiv.org/abs/2403.08277
In this work we focus on learning facial representations that can be adapted to train effective face recognition models, particularly in the absence of labels. Firstly, compared with existing labelled face datasets, a vastly larger magnitude of unlabeled faces exists in the real world. We explore the learning strategy of these unlabeled facial images through self-supervised pretraining to transfer generalized face recognition performance. Moreover, motivated by one recent finding, that is, the face saliency area is critical for face recognition, in contrast to utilizing random cropped blocks of images for constructing augmentations in pretraining, we utilize patches localized by extracted facial landmarks. This enables our method - namely LAndmark-based Facial Self-supervised learning LAFS), to learn key representation that is more critical for face recognition. We also incorporate two landmark-specific augmentations which introduce more diversity of landmark information to further regularize the learning. With learned landmark-based facial representations, we further adapt the representation for face recognition with regularization mitigating variations in landmark positions. Our method achieves significant improvement over the state-of-the-art on multiple face recognition benchmarks, especially on more challenging few-shot scenarios.
在这项工作中,我们专注于学习可以在训练有效面部识别模型时进行自适应的面部表示。特别是在没有标签的情况下。首先,与现有的带有标签的人脸数据集相比,现实世界中存在大量未标记的脸。我们通过自监督预训练来探索这些未标记面部图像的学习策略,以转移通用的面部识别性能。此外,受到最近的一个发现的影响,即脸部显著区域对于面部识别至关重要,我们使用通过提取面部特征点来定位的补丁作为自监督学习中的标签。这使得我们的方法 - 基于LAndmark的特征点自监督学习LAFS) - 可以学习更具关键性的面部表示。我们还引入了两种特定特征点的自监督增强,以进一步规范化学习。通过学习基于特征点的面部表示,我们进一步通过正则化来缓解特征点位置的变化。我们的方法在多个面部识别基准测试中都实现了显著的改进,尤其是在更具挑战性的几拍场景中。
https://arxiv.org/abs/2403.08161
Facial attribute editing using generative models can impair automated face recognition. This degradation persists even with recent identity-preserving models such as InstantID. To mitigate this issue, we propose two techniques that perform local and global attribute editing. Local editing operates on the finer details via a regularization-free method based on ControlNet conditioned on depth maps and auxiliary semantic segmentation masks. Global editing operates on coarser details via a regularization-based method guided by custom loss and regularization set. In this work, we empirically ablate twenty-six facial semantic, demographic and expression-based attributes altered using state-of-the-art generative models and evaluate them using ArcFace and AdaFace matchers on CelebA, CelebAMaskHQ and LFW datasets. Finally, we use LLaVA, a vision-language framework for attribute prediction to validate our editing techniques. Our methods outperform SoTA (BLIP, InstantID) at facial editing while retaining identity.
使用生成模型进行面部属性编辑可能会损害自动人脸识别的准确性。即使使用最新的自证身份保留模型(如InstantID)也会出现这种降级。为了减轻这个问题,我们提出了两种方法,它们都进行局部和全局属性编辑。局部编辑通过基于深度图的条件自由度为ControlNet的规范化方法来操作更细的细节。全局编辑通过基于自定义损失和正则化设置的规范化方法来操作较粗的细节。在本文中,我们通过实验对使用最先进的生成模型(如ArcFace和AdaFace)对26个面部语义、人口统计和表情属性进行编辑,并使用CelebA、CelebAMaskHQ和LFW数据集评估它们。最后,我们使用LLaVa(用于属性预测的视觉语言框架)验证我们的编辑技术。我们的方法在保留身份的同时优于SoTA(BLIP和InstantID)在面部编辑方面。
https://arxiv.org/abs/2403.08092
This paper introduces the concept of uniform classification, which employs a unified threshold to classify all samples rather than adaptive threshold classifying each individual sample. We also propose the uniform classification accuracy as a metric to measure the model's performance in uniform classification. Furthermore, begin with a naive loss, we mathematically derive a loss function suitable for the uniform classification, which is the BCE function integrated with a unified bias. We demonstrate the unified threshold could be learned via the bias. The extensive experiments on six classification datasets and three feature extraction models show that, compared to the SoftMax loss, the models trained with the BCE loss not only exhibit higher uniform classification accuracy but also higher sample-wise classification accuracy. In addition, the learned bias from BCE loss is very close to the unified threshold used in the uniform classification. The features extracted by the models trained with BCE loss not only possess uniformity but also demonstrate better intra-class compactness and inter-class distinctiveness, yielding superior performance on open-set tasks such as face recognition.
本文引入了统一分类的概念,它采用统一的阈值对所有样本进行分类,而不是对每个个体样本进行自适应阈值分类。我们还提出了统一分类准确率作为一个指标来衡量模型在统一分类方面的表现。此外,从零开始损失函数,我们通过统一的偏差计算了一个适合统一分类的损失函数,即统一偏差函数。我们证明了偏差可以用于学习统一阈值。在六个分类数据集和三个特征提取模型的广泛实验中,与SoftMax损失相比,使用BCE损失训练的模型不仅表现出更高的统一分类准确率,而且表现出更高的样本级分类准确率。此外,从BCE损失中学习到的偏差非常接近用于统一分类的统一阈值。使用BCE损失训练的模型的特征不仅具有统一性,而且表现出更好的类内压缩性和类间差异,在开放设置的任务(如面部识别)上表现出卓越的性能。
https://arxiv.org/abs/2403.07289
2D face recognition encounters challenges in unconstrained environments due to varying illumination, occlusion, and pose. Recent studies focus on RGB-D face recognition to improve robustness by incorporating depth information. However, collecting sufficient paired RGB-D training data is expensive and time-consuming, hindering wide deployment. In this work, we first construct a diverse depth dataset generated by 3D Morphable Models for depth model pre-training. Then, we propose a domain-independent pre-training framework that utilizes readily available pre-trained RGB and depth models to separately perform face recognition without needing additional paired data for retraining. To seamlessly integrate the two distinct networks and harness the complementary benefits of RGB and depth information for improved accuracy, we propose an innovative Adaptive Confidence Weighting (ACW). This mechanism is designed to learn confidence estimates for each modality to achieve modality fusion at the score level. Our method is simple and lightweight, only requiring ACW training beyond the backbone models. Experiments on multiple public RGB-D face recognition benchmarks demonstrate state-of-the-art performance surpassing previous methods based on depth estimation and feature fusion, validating the efficacy of our approach.
2D面部识别在非约束环境中面临挑战,由于光照、遮挡和姿态的不同,2D面部识别遇到了困难。最近的研究专注于使用深度信息进行鲁棒性的RGB-D面部识别,以改善性能。然而,收集足够的成对RGB-D训练数据是昂贵和耗时的,这阻碍了广泛的部署。在这项工作中,我们首先使用3D可塑模型构建了一个多样化的深度数据集,用于深度模型预训练。然后,我们提出了一个领域无关的预训练框架,利用可用的预训练的RGB和深度模型分别进行面部识别,而不需要重新训练的成对数据。为了实现平滑地将两个不同的网络集成起来,并利用RGB和深度信息的互补优势来提高准确性,我们提出了一种创新的自适应信心加权(ACW)机制。这种机制旨在学习每个模态的置信估计,以达到在分数级别进行模式融合。我们的方法简单且轻便,只需要在后台模型之外进行ACW训练。在多个公开的RGB-D面部识别基准测试上进行的实验证明,我们的方法在基于深度估计和特征融合的前沿方法之上取得了最先进的性能,验证了我们的方法的有效性。
https://arxiv.org/abs/2403.06529
The state-of-the-art face recognition systems are typically trained on a single computer, utilizing extensive image datasets collected from various number of users. However, these datasets often contain sensitive personal information that users may hesitate to disclose. To address potential privacy concerns, we explore the application of federated learning, both with and without secure aggregators, in the context of both supervised and unsupervised face recognition systems. Federated learning facilitates the training of a shared model without necessitating the sharing of individual private data, achieving this by training models on decentralized edge devices housing the data. In our proposed system, each edge device independently trains its own model, which is subsequently transmitted either to a secure aggregator or directly to the central server. To introduce diverse data without the need for data transmission, we employ generative adversarial networks to generate imposter data at the edge. Following this, the secure aggregator or central server combines these individual models to construct a global model, which is then relayed back to the edge devices. Experimental findings based on the CelebA datasets reveal that employing federated learning in both supervised and unsupervised face recognition systems offers dual benefits. Firstly, it safeguards privacy since the original data remains on the edge devices. Secondly, the experimental results demonstrate that the aggregated model yields nearly identical performance compared to the individual models, particularly when the federated model does not utilize a secure aggregator. Hence, our results shed light on the practical challenges associated with privacy-preserving face image training, particularly in terms of the balance between privacy and accuracy.
目前最先进的人脸识别系统通常是在单个计算机上训练的,利用从多个用户收集的广泛图像数据集。然而,这些数据集中通常包含用户可能不愿意透露的敏感个人信息。为了应对潜在的隐私问题,我们探讨了在监督和无监督人脸识别系统中应用联邦学习的情况。联邦学习通过在集中存储数据的分布式边缘设备上训练共享模型而不需要共享个人隐私数据,从而实现了这一目标,通过在分布式边缘设备上训练模型。在我们所提出的系统中,每个边缘设备都会独立训练自己的模型,然后将其传输给安全聚合器或直接传输到中央服务器。为了引入多样化的数据而不需要数据传输,我们使用生成对抗网络在边缘生成伪造数据。然后,安全聚合器或中央服务器将这些个人模型组合成一个全局模型,并将其传回边缘设备。基于CelebA数据集的实验结果表明,在监督和无监督的人脸识别系统中应用联邦学习可以实现双重利益。首先,这可以保护隐私,因为原始数据仍然留在边缘设备上。其次,实验结果表明,在联邦模型不使用安全聚合器的情况下,聚合模型与人脸模型的性能几乎相同,特别是在联邦模型不利用安全聚合器的情况下。因此,我们的结果阐明了在保护隐私的同时提高准确性的隐私保护面部图像训练方面所面临的具体挑战。
https://arxiv.org/abs/2403.05344
Recent years have witnessed significant advancement in face recognition (FR) techniques, with their applications widely spread in people's lives and security-sensitive areas. There is a growing need for reliable interpretations of decisions of such systems. Existing studies relying on various mechanisms have investigated the usage of saliency maps as an explanation approach, but suffer from different limitations. This paper first explores the spatial relationship between face image and its deep representation via gradient backpropagation. Then a new explanation approach FGGB has been conceived, which provides precise and insightful similarity and dissimilarity saliency maps to explain the "Accept" and "Reject" decision of an FR system. Extensive visual presentation and quantitative measurement have shown that FGGB achieves superior performance in both similarity and dissimilarity maps when compared to current state-of-the-art explainable face verification approaches.
近年来,在生物识别(FR)技术方面取得了显著的进步,这些技术广泛应用于人们的生活和安全性敏感区域。对于这些系统的决策可靠解释的需求越来越大。现有研究依赖各种机制来调查使用特征图作为解释方法,但存在不同的局限性。本文首先通过梯度反向传播探讨了脸部图像和其深度表示之间的空间关系。然后,一种新的解释方法FGGB被提出,它提供了精确和直观的相似度和差异性特征图来解释FR系统的“接受”和“拒绝”决策。广泛的视觉展示和定量测量证明,与当前最先进的可解释面部验证方法相比,FGGB在相似度和差异度地图上实现了卓越的性能。
https://arxiv.org/abs/2403.04549
This paper explores the application of large language models (LLMs), like ChatGPT, for biometric tasks. We specifically examine the capabilities of ChatGPT in performing biometric-related tasks, with an emphasis on face recognition, gender detection, and age estimation. Since biometrics are considered as sensitive information, ChatGPT avoids answering direct prompts, and thus we crafted a prompting strategy to bypass its safeguard and evaluate the capabilities for biometrics tasks. Our study reveals that ChatGPT recognizes facial identities and differentiates between two facial images with considerable accuracy. Additionally, experimental results demonstrate remarkable performance in gender detection and reasonable accuracy for the age estimation tasks. Our findings shed light on the promising potentials in the application of LLMs and foundation models for biometrics.
本文探讨了大型语言模型(LLMs)如ChatGPT在生物识别任务中的应用。我们特别研究了ChatGPT在执行生物识别相关任务中的能力,重点关注人脸识别、性别检测和年龄估计。由于生物识别信息被认为是一种敏感信息,ChatGPT避免回答直接提示,因此我们制定了一种提示策略来绕过其保护并评估ChatGPT在生物识别任务中的能力。我们的研究显示,ChatGPT能够识别人脸并区分两个人脸,准确性相当高。此外,实验结果表明,在性别检测和年龄估计任务中,ChatGPT的表现非常出色。我们的研究揭示了LLM和基础模型在生物识别领域应用的光明前景。
https://arxiv.org/abs/2403.02965
Face Anti-Spoofing (FAS) is crucial for securing face recognition systems against presentation attacks. With advancements in sensor manufacture and multi-modal learning techniques, many multi-modal FAS approaches have emerged. However, they face challenges in generalizing to unseen attacks and deployment conditions. These challenges arise from (1) modality unreliability, where some modality sensors like depth and infrared undergo significant domain shifts in varying environments, leading to the spread of unreliable information during cross-modal feature fusion, and (2) modality imbalance, where training overly relies on a dominant modality hinders the convergence of others, reducing effectiveness against attack types that are indistinguishable sorely using the dominant modality. To address modality unreliability, we propose the Uncertainty-Guided Cross-Adapter (U-Adapter) to recognize unreliably detected regions within each modality and suppress the impact of unreliable regions on other modalities. For modality imbalance, we propose a Rebalanced Modality Gradient Modulation (ReGrad) strategy to rebalance the convergence speed of all modalities by adaptively adjusting their gradients. Besides, we provide the first large-scale benchmark for evaluating multi-modal FAS performance under domain generalization scenarios. Extensive experiments demonstrate that our method outperforms state-of-the-art methods. Source code and protocols will be released on this https URL.
面部抗伪造(FAS)对于保护人脸识别系统免受展示攻击至关重要。随着传感器制造和多模态学习技术的进步,出现了许多多模态FAS方法。然而,它们在泛化到未见到的攻击和部署条件方面面临挑战。这些挑战来自于(1)模态不可靠性,一些模态传感器(如深度和红外)在不同的环境中经历显著的领域转移,导致跨模态特征融合过程中传播不可靠信息;(2)模态不平衡,训练过度依赖主导模态会阻碍其他模态的收敛,从而降低对使用主导模态无法区分的攻击类型的效果。为了应对模态不可靠性,我们提出了不确定引导跨模态适配器(U-Adapter)来识别每个模态中检测到的不可靠区域,并抑制不可靠区域对其他模态的影响。对于模态不平衡,我们提出了自适应调整梯度的平衡模态梯度 modulation(ReGrad)策略来平衡所有模态的收敛速度。此外,我们还提供了评估在领域泛化场景下多模态FAS性能的第一个大规模基准。大量实验证明,我们的方法超越了最先进的水平。源代码和协议将在此处发布。
https://arxiv.org/abs/2402.19298
In recent years, model quantization for face recognition has gained prominence. Traditionally, compressing models involved vast datasets like the 5.8 million-image MS1M dataset as well as extensive training times, raising the question of whether such data enormity is essential. This paper addresses this by introducing an efficiency-driven approach, fine-tuning the model with just up to 14,000 images, 440 times smaller than MS1M. We demonstrate that effective quantization is achievable with a smaller dataset, presenting a new paradigm. Moreover, we incorporate an evaluation-based metric loss and achieve an outstanding 96.15% accuracy on the IJB-C dataset, establishing a new state-of-the-art compressed model training for face recognition. The subsequent analysis delves into potential applications, emphasizing the transformative power of this approach. This paper advances model quantization by highlighting the efficiency and optimal results with small data and training time.
近年来,在面部识别模型的量化方面取得了突出地位。传统上,压缩模型涉及庞大的数据集,如580万张图像的MS1M数据集以及 extensive training times,这引发了这样的问题:这样的数据量是否是必要的。本文通过引入一个效率驱动的方法来解决这一问题,仅用440张图像对模型进行微调,是MS1M的1/580。我们证明了,用较小的数据集实现有效的量化是可能的,并呈现了一个新的范例。此外,我们还引入了基于评估的损失指标,在IJB-C数据集上实现了出色的96.15%准确率,为面部识别领域树立了新的最优压缩模型训练标准。后续分析深入探讨了这种方法的潜在应用,强调了这种方法的可 transformative power。本文通过突出小数据和训练时间的效率和最佳结果,进步了模型量化。
https://arxiv.org/abs/2402.18163
JPEG compression can significantly impair the performance of adversarial face examples, which previous adversarial attacks on face recognition (FR) have not adequately addressed. Considering this challenge, we propose a novel adversarial attack on FR that aims to improve the resistance of adversarial examples against JPEG compression. Specifically, during the iterative process of generating adversarial face examples, we interpolate the adversarial face examples into a smaller size. Then we utilize these interpolated adversarial face examples to create the adversarial examples in the next iteration. Subsequently, we restore the adversarial face examples to their original size by interpolating. Throughout the entire process, our proposed method can smooth the adversarial perturbations, effectively mitigating the presence of high-frequency signals in the crafted adversarial face examples that are typically eliminated by JPEG compression. Our experimental results demonstrate the effectiveness of our proposed method in improving the JPEG-resistance of adversarial face examples.
JPEG压缩可能显著削弱 adversarial 面部示例的性能,这是之前对面部识别(FR)的 adversarial 攻击没有充分解决的问题。考虑到这一挑战,我们提出了一个旨在提高 adversarial 面部示例对 JPEG 压缩的抵抗力的 novel adversarial attack on FR。具体来说,在生成 adversarial 面部示例的迭代过程中,我们通过插值将 adversarial 面部示例缩小到更小的尺寸。然后,我们利用这些插值的 adversarial 面部示例在下一迭代中创建 adversarial 示例。随后,我们通过插值将 adversarial 面部示例恢复到其原始大小。在整个过程中,我们的方法可以平滑 adversarial 扰动,有效地减轻通过 JPEG 压缩通常消除的高频信号在 crafted adversarial 面部示例中存在的现象。我们的实验结果证明了我们在提高 adversarial 面部示例对 JPEG 压缩的抵抗力方面所采取的方法的有效性。
https://arxiv.org/abs/2402.16586
Recent advancements in deep learning have revolutionized technology and security measures, necessitating robust identification methods. Biometric approaches, leveraging personalized characteristics, offer a promising solution. However, Face Recognition Systems are vulnerable to sophisticated attacks, notably face morphing techniques, enabling the creation of fraudulent documents. In this study, we introduce a novel quadruplet loss function for increasing the robustness of face recognition systems against morphing attacks. Our approach involves specific sampling of face image quadruplets, combined with face morphs, for network training. Experimental results demonstrate the efficiency of our strategy in improving the robustness of face recognition networks against morphing attacks.
近年来,在深度学习的支持下,科技和安全措施得到了革命性的发展,迫切需要更强大的识别方法。生物特征方法利用个人特征,为深度学习提供了一种有前景的解决方案。然而,基于面部的识别系统容易受到复杂的攻击,特别是面变形技术,导致欺诈文件的创建。在这项研究中,我们提出了一种新的四元组损失函数,以提高基于面部的识别系统对面变形攻击的鲁棒性。我们的方法包括对面部图像四元组的特定采样,与面变形一起进行网络训练。实验结果表明,我们的策略在提高基于面部的识别网络对面变形攻击的鲁棒性方面非常有效。
https://arxiv.org/abs/2402.14665