Face recognition and verification are two computer vision tasks whose performances have advanced with the introduction of deep representations. However, ethical, legal, and technical challenges due to the sensitive nature of face data and biases in real-world training datasets hinder their development. Generative AI addresses privacy by creating fictitious identities, but fairness problems remain. Using the existing DCFace SOTA framework, we introduce a new controlled generation pipeline that improves fairness. Through classical fairness metrics and a proposed in-depth statistical analysis based on logit models and ANOVA, we show that our generation pipeline improves fairness more than other bias mitigation approaches while slightly improving raw performance.
https://arxiv.org/abs/2412.03349
Facial expression manipulation aims to change human facial expressions without affecting face recognition. In order to transform the facial expressions to target expressions, previous methods relied on expression labels to guide the manipulation process. However, these methods failed to preserve the details of facial features, which causes the weakening or the loss of identity information in the output image. In our work, we propose WEM-GAN, in short for wavelet-based expression manipulation GAN, which puts more efforts on preserving the details of the original image in the editing process. Firstly, we take advantage of the wavelet transform technique and combine it with our generator with a U-net autoencoder backbone, in order to improve the generator's ability to preserve more details of facial features. Secondly, we also implement the high-frequency component discriminator, and use high-frequency domain adversarial loss to further constrain the optimization of our model, providing the generated face image with more abundant details. Additionally, in order to narrow the gap between generated facial expressions and target expressions, we use residual connections between encoder and decoder, while also using relative action units (AUs) several times. Extensive qualitative and quantitative experiments have demonstrated that our model performs better in preserving identity features, editing capability, and image generation quality on the AffectNet dataset. It also shows superior performance in metrics such as Average Content Distance (ACD) and Expression Distance (ED).
https://arxiv.org/abs/2412.02530
With the rise of deep learning, facial recognition technology has seen extensive research and rapid development. Although facial recognition is considered a mature technology, we find that existing open-source models and commercial algorithms lack robustness in certain real-world Out-of-Distribution (OOD) scenarios, raising concerns about the reliability of these systems. In this paper, we introduce OODFace, which explores the OOD challenges faced by facial recognition models from two perspectives: common corruptions and appearance variations. We systematically design 30 OOD scenarios across 9 major categories tailored for facial recognition. By simulating these challenges on public datasets, we establish three robustness benchmarks: LFW-C/V, CFP-FP-C/V, and YTF-C/V. We then conduct extensive experiments on 19 different facial recognition models and 3 commercial APIs, along with extended experiments on face masks, Vision-Language Models (VLMs), and defense strategies to assess their robustness. Based on the results, we draw several key insights, highlighting the vulnerability of facial recognition systems to OOD data and suggesting possible solutions. Additionally, we offer a unified toolkit that includes all corruption and variation types, easily extendable to other datasets. We hope that our benchmarks and findings can provide guidance for future improvements in facial recognition model robustness.
https://arxiv.org/abs/2412.02479
In deep learning, the loss function plays a crucial role in optimizing the network. Many recent innovations in loss techniques have been made, and various margin-based angular loss functions (metric loss) have been designed particularly for face recognition. The concept of transformers is already well-researched and applied in many facets of machine vision. This paper presents a technique for loss evaluation that uses a transformer network as an additive loss in the face recognition domain. The standard metric loss function typically takes the final embedding of the main CNN backbone as its input. Here, we employ a transformer-metric loss, a combined approach that integrates both transformer-loss and metric-loss. This research intends to analyze the transformer behavior on the convolution output when the CNN outcome is arranged in a sequential vector. The transformer encoder takes input from the contextual vectors obtained from the final convolution layer of the network. With this technique, we use transformer loss with various base metric-loss functions to evaluate the effect of the combined loss functions. We observe that such a configuration allows the network to achieve SoTA results on various validation datasets with some limitations. This research expands the role of transformers in the machine vision domain and opens new possibilities for exploring transformers as a loss function.
https://arxiv.org/abs/2412.02198
Recent advancements in deep learning-based compression techniques have surpassed traditional methods. However, deep neural networks remain vulnerable to backdoor attacks, where pre-defined triggers induce malicious behaviors. This paper introduces a novel frequency-based trigger injection model for launching backdoor attacks with multiple triggers on learned image compression models. Inspired by the widely used DCT in compression codecs, triggers are embedded in the DCT domain. We design attack objectives tailored to diverse scenarios, including: 1) degrading compression quality in terms of bit-rate and reconstruction accuracy; 2) targeting task-driven measures like face recognition and semantic segmentation. To improve training efficiency, we propose a dynamic loss function that balances loss terms with fewer hyper-parameters, optimizing attack objectives effectively. For advanced scenarios, we evaluate the attack's resistance to defensive preprocessing and propose a two-stage training schedule with robust frequency selection to enhance resilience. To improve cross-model and cross-domain transferability for downstream tasks, we adjust the classification boundary in the attack loss during training. Experiments show that our trigger injection models, combined with minor modifications to encoder parameters, successfully inject multiple backdoors and their triggers into a single compression model, demonstrating strong performance and versatility. (*Due to the notification of arXiv "The Abstract field cannot be longer than 1,920 characters", the appeared Abstract is shortened. For the full Abstract, please download the Article.)
https://arxiv.org/abs/2412.01646
Synthetic data is gaining increasing popularity for face recognition technologies, mainly due to the privacy concerns and challenges associated with obtaining real data, including diverse scenarios, quality, and demographic groups, among others. It also offers some advantages over real data, such as the large amount of data that can be generated or the ability to customize it to adapt to specific problem-solving needs. To effectively use such data, face recognition models should also be specifically designed to exploit synthetic data to its fullest potential. In order to promote the proposal of novel Generative AI methods and synthetic data, and investigate the application of synthetic data to better train face recognition systems, we introduce the 2nd FRCSyn-onGoing challenge, based on the 2nd Face Recognition Challenge in the Era of Synthetic Data (FRCSyn), originally launched at CVPR 2024. This is an ongoing challenge that provides researchers with an accessible platform to benchmark i) the proposal of novel Generative AI methods and synthetic data, and ii) novel face recognition systems that are specifically proposed to take advantage of synthetic data. We focus on exploring the use of synthetic data both individually and in combination with real data to solve current challenges in face recognition such as demographic bias, domain adaptation, and performance constraints in demanding situations, such as age disparities between training and testing, changes in the pose, or occlusions. Very interesting findings are obtained in this second edition, including a direct comparison with the first one, in which synthetic databases were restricted to DCFace and GANDiffFace.
https://arxiv.org/abs/2412.01383
Diffusion models have demonstrated impressive performance in face restoration. Yet, their multi-step inference process remains computationally intensive, limiting their applicability in real-world scenarios. Moreover, existing methods often struggle to generate face images that are harmonious, realistic, and consistent with the subject's identity. In this work, we propose OSDFace, a novel one-step diffusion model for face restoration. Specifically, we propose a visual representation embedder (VRE) to better capture prior information and understand the input face. In VRE, low-quality faces are processed by a visual tokenizer and subsequently embedded with a vector-quantized dictionary to generate visual prompts. Additionally, we incorporate a facial identity loss derived from face recognition to further ensure identity consistency. We further employ a generative adversarial network (GAN) as a guidance model to encourage distribution alignment between the restored face and the ground truth. Experimental results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics, generating high-fidelity, natural face images with high identity consistency. The code and model will be released at this https URL.
扩散模型在面部修复方面展示了令人印象深刻的表现。然而,它们的多步骤推理过程仍然计算密集,限制了其在现实场景中的应用。此外,现有的方法常常难以生成和谐、逼真且与主体身份一致的人脸图像。在这项工作中,我们提出了OSDFace,这是一种用于面部修复的新的一次性扩散模型。具体来说,我们提出了一种视觉表示嵌入器(VRE),以更好地捕捉先验信息并理解输入的面部。在VRE中,低质量的脸部通过一个视觉标记化器进行处理,并随后用向量量化字典进行嵌入,生成视觉提示。此外,我们还结合了从人脸识别衍生而来的面部身份损失函数,进一步确保身份一致性。我们进一步采用生成对抗网络(GAN)作为指导模型,以促进修复脸部与真实图像之间的分布对齐。实验结果表明,OSDFace在视觉质量和定量指标上都超越了当前最先进的方法,能够生成高保真、自然且高度保持身份一致的人脸图像。代码和模型将在以下链接中发布:[此 https URL]。
https://arxiv.org/abs/2411.17163
In the realm of cardiovascular medicine, medical imaging plays a crucial role in accurately classifying cardiac diseases and making precise diagnoses. However, the field faces significant challenges when integrating data science techniques, as a significant volume of images is required for these techniques. As a consequence, it is necessary to investigate different avenues to overcome this challenge. In this contribution, we offer an innovative tool to conquer this limitation. In particular, we delve into the application of a well recognized method known as the EigenFaces approach to classify cardiac diseases. This approach was originally motivated for efficiently representing pictures of faces using principal component analysis, which provides a set of eigenvectors (aka eigenfaces), explaining the variation between face images. As this approach proven to be efficient for face recognition, it motivated us to explore its efficiency on more complicated data bases. In particular, we integrate this approach, with convolutional neural networks (CNNs) to classify echocardiography images taken from mice in five distinct cardiac conditions (healthy, diabetic cardiomyopathy, myocardial infarction, obesity and TAC hypertension). Performing a preprocessing step inspired from the eigenfaces approach on the echocardiography datasets, yields sets of pod modes, which we will call eigenhearts. To demonstrate the proposed approach, we compare two testcases: (i) supplying the CNN with the original images directly, (ii) supplying the CNN with images projected into the obtained pod modes. The results show a substantial and noteworthy enhancement when employing SVD for pre-processing, with classification accuracy increasing by approximately 50%.
在心血管医学领域,医学成像在准确分类心脏疾病和进行精确诊断中起着至关重要的作用。然而,在将数据科学技术融入这一领域时面临着显著挑战,因为这些技术需要大量的图像数据。因此,有必要探索不同的途径来克服这个挑战。本文提供了一个创新工具以攻克这一限制。特别是,我们探讨了一种广泛应用的方法——EigenFaces方法在分类心脏疾病中的应用。该方法最初是为使用主成分分析(PCA)高效表示面部图片而设计的,它提供了一组特征向量(即特征脸),解释了人脸图像之间的变化。鉴于这种方法在人脸识别上的有效性,这激发了我们探索其在更复杂数据库上有效性的兴趣。特别是,我们将此方法与卷积神经网络(CNNs)相结合,用于分类来自五种不同心脏状况的小鼠的超声心动图图像(健康、糖尿病心肌病、心肌梗死、肥胖和TAC高血压)。我们在超声心动图数据集上执行了受EigenFaces方法启发的预处理步骤,产生了我们称为特征心脏的一组模态。为了展示所提出的方案,我们将两种测试案例进行了比较:(i) 直接向CNN提供原始图像;(ii) 向CNN提供投影到获得的模态中的图像。结果显示,在使用奇异值分解(SVD)进行预处理时有显著且值得注意的改进,分类准确率提高了大约50%。
https://arxiv.org/abs/2411.16227
Recognition of low-quality face images remains a challenge due to invisible or deformation in partial facial regions. For low-quality images dominated by missing partial facial regions, local region similarity contributes more to face recognition (FR). Conversely, in cases dominated by local face deformation, excessive attention to local regions may lead to misjudgments, while global features exhibit better robustness. However, most of the existing FR methods neglect the bias in feature quality of low-quality images introduced by different factors. To address this issue, we propose a Local and Global Feature Attention Fusion (LGAF) network based on feature quality. The network adaptively allocates attention between local and global features according to feature quality and obtains more discriminative and high-quality face features through local and global information complementarity. In addition, to effectively obtain fine-grained information at various scales and increase the separability of facial features in high-dimensional space, we introduce a Multi-Head Multi-Scale Local Feature Extraction (MHMS) module. Experimental results demonstrate that the LGAF achieves the best average performance on $4$ validation sets (CFP-FP, CPLFW, AgeDB, and CALFW), and the performance on TinyFace and SCFace outperforms the state-of-the-art methods (SoTA).
低质量人脸图像的识别仍然是一个挑战,因为部分面部区域可能存在不可见或变形的情况。对于主要由部分面部区域缺失主导的低质量图像而言,局部区域相似性对人脸识别(FR)更为重要。相反,在由局部面部变形主导的情况下,过度关注局部区域可能导致误判,而全局特征表现出更好的鲁棒性。然而,现有的大多数FR方法忽略了不同因素引入的低质量图像特征质量偏差问题。为了解决这一问题,我们提出了一种基于特征质量的局部和全局特征注意力融合(LGAF)网络。该网络根据特征质量自适应地在局部和全局特征之间分配注意力,并通过局部与全局信息互补获得更具辨别力且高质量的人脸特征。此外,为了有效地获取各尺度上的细粒度信息并增加高维空间中面部特征的可分离性,我们引入了一种多头多尺度局部特征提取(MHMS)模块。实验结果表明,LGAF在网络验证集(CFP-FP、CPLFW、AgeDB和CALFW)上实现了最佳平均性能,并且在TinyFace和SCFace上的表现优于现有的最先进方法(SoTA)。
https://arxiv.org/abs/2411.16169
Face Recognition (FR) models are vulnerable to adversarial examples that subtly manipulate benign face images, underscoring the urgent need to improve the transferability of adversarial attacks in order to expose the blind spots of these systems. Existing adversarial attack methods often overlook the potential benefits of augmenting the surrogate model with diverse initializations, which limits the transferability of the generated adversarial examples. To address this gap, we propose a novel method called Diverse Parameters Augmentation (DPA) attack method, which enhances surrogate models by incorporating diverse parameter initializations, resulting in a broader and more diverse set of surrogate models. Specifically, DPA consists of two key stages: Diverse Parameters Optimization (DPO) and Hard Model Aggregation (HMA). In the DPO stage, we initialize the parameters of the surrogate model using both pre-trained and random parameters. Subsequently, we save the models in the intermediate training process to obtain a diverse set of surrogate models. During the HMA stage, we enhance the feature maps of the diversified surrogate models by incorporating beneficial perturbations, thereby further improving the transferability. Experimental results demonstrate that our proposed attack method can effectively enhance the transferability of the crafted adversarial face examples.
人脸识别(FR)模型容易受到对抗性实例的影响,这些实例会微妙地操纵良性人脸图像,这强调了改善对抗攻击转移性的紧迫需求,以揭示这些系统的盲点。现有的对抗性攻击方法往往忽视通过多样化初始化增强代理模型的潜在好处,从而限制了生成的对抗性实例的迁移能力。为了解决这一差距,我们提出了一种名为多样性参数增强(DPA)攻击的新方法,该方法通过引入多样化的参数初始化来增强代理模型,从而获得更广泛和多样的代理模型集合。具体来说,DPA包括两个关键阶段:多样性参数优化(DPO)和困难模型聚合(HMA)。在DPO阶段,我们使用预训练参数和随机参数初始化代理模型的参数。随后,在中间训练过程中保存模型以获取多样化的代理模型集。在HMA阶段,通过引入有益扰动来增强多样化代理模型的特征图,从而进一步提高迁移性。实验结果表明,我们的攻击方法能够有效提升所构建对抗性人脸示例的转移能力。
https://arxiv.org/abs/2411.15555
An important aspect of deploying face recognition (FR) algorithms in real-world applications is their ability to learn new face identities from a continuous data stream. However, the online training of existing deep neural network-based FR algorithms, which are pre-trained offline on large-scale stationary datasets, encounter two major challenges: (I) catastrophic forgetting of previously learned identities, and (II) the need to store past data for complete retraining from scratch, leading to significant storage constraints and privacy concerns. In this paper, we introduce CLFace, a continual learning framework designed to preserve and incrementally extend the learned knowledge. CLFace eliminates the classification layer, resulting in a resource-efficient FR model that remains fixed throughout lifelong learning and provides label-free supervision to a student model, making it suitable for open-set face recognition during incremental steps. We introduce an objective function that employs feature-level distillation to reduce drift between feature maps of the student and teacher models across multiple stages. Additionally, it incorporates a geometry-preserving distillation scheme to maintain the orientation of the teacher model's feature embedding. Furthermore, a contrastive knowledge distillation is incorporated to continually enhance the discriminative power of the feature representation by matching similarities between new identities. Experiments on several benchmark FR datasets demonstrate that CLFace outperforms baseline approaches and state-of-the-art methods on unseen identities using both in-domain and out-of-domain datasets.
将面部识别(FR)算法部署到实际应用中的一个重要方面是其从连续数据流中学习新面孔身份的能力。然而,现有的基于深度神经网络的FR算法已经离线在大规模静态数据集上预训练,在在线训练时会遇到两个主要挑战:(I) 忘记之前已学过的身份信息(灾难性遗忘),以及 (II) 需要存储过去的数据以从头开始进行完全重新训练,这导致了显著的存储限制和隐私问题。在本文中,我们介绍了CLFace,这是一个设计用于保存并逐渐扩展所学习知识的持续学习框架。CLFace消除了分类层,从而形成一个在整个终身学习过程中保持不变且资源高效的FR模型,并为学生模型提供无标签监督,使其适合于增量步骤中的开放集面部识别。我们引入了一个目标函数,该函数利用特征级蒸馏来减少学生和教师模型在多个阶段之间的特征图之间的漂移。此外,它还采用了一种保几何结构的蒸馏方案以保持教师模型特征嵌入的方向。另外,还结合了对比知识蒸馏,通过匹配新身份间的相似性,持续增强特征表示的判别能力。实验结果表明,在几个基准FR数据集上,CLFace在未见过的身份识别性能方面超过了基线方法和最先进的方法,并且这些结果是在使用域内和域外数据时都得到了证实。
https://arxiv.org/abs/2411.13886
Face recognition is a core task in computer vision designed to identify and authenticate individuals by analyzing facial patterns and features. This field intersects with artificial intelligence image processing and machine learning with applications in security authentication and personalization. Traditional approaches in facial recognition focus on capturing facial features like the eyes, nose and mouth and matching these against a database to verify identities However challenges such as high false positive rates have persisted often due to the similarity among individuals facial features. Recently Contrastive Language Image Pretraining (CLIP) a model developed by OpenAI has shown promising advancements by linking natural language processing with vision tasks allowing it to generalize across modalities. Using CLIP's vision language correspondence and single-shot finetuning the model can achieve lower false positive rates upon deployment without the need of mass facial features extraction. This integration demonstrating CLIP's potential to address persistent issues in face recognition model performance without complicating our training paradigm.
人脸识别是计算机视觉中的核心任务,旨在通过分析面部模式和特征来识别和验证个人身份。这一领域与人工智能图像处理及机器学习相交,并在安全认证和个人化方面有着广泛的应用。传统的人脸识别方法侧重于捕捉面部特征如眼睛、鼻子和嘴巴等,并将这些特征与数据库进行匹配以验证身份,但高误报率一直是挑战之一,这通常是因为个体之间面部特征的相似性造成的。 最近,由OpenAI开发的对比语言图像预训练(CLIP)模型展现了令人鼓舞的进步,它通过结合自然语言处理与视觉任务的能力,在跨模态的一般化上取得进展。借助CLIP的视觉-语言对应关系和单次微调能力,该模型在部署时可以实现更低的误报率,而无需大量提取面部特征。这种整合展示了CLIP解决人脸识别模型性能中持久问题的潜力,且不会使我们的训练范式复杂化。
https://arxiv.org/abs/2411.12319
Face recognition datasets are often collected by crawling Internet and without individuals' consents, raising ethical and privacy concerns. Generating synthetic datasets for training face recognition models has emerged as a promising alternative. However, the generation of synthetic datasets remains challenging as it entails adequate inter-class and intra-class variations. While advances in generative models have made it easier to increase intra-class variations in face datasets (such as pose, illumination, etc.), generating sufficient inter-class variation is still a difficult task. In this paper, we formulate the dataset generation as a packing problem on the embedding space (represented on a hypersphere) of a face recognition model and propose a new synthetic dataset generation approach, called HyperFace. We formalize our packing problem as an optimization problem and solve it with a gradient descent-based approach. Then, we use a conditional face generator model to synthesize face images from the optimized embeddings. We use our generated datasets to train face recognition models and evaluate the trained models on several benchmarking real datasets. Our experimental results show that models trained with HyperFace achieve state-of-the-art performance in training face recognition using synthetic datasets.
面部识别数据集通常通过爬取互联网收集,且未取得个人同意,这引发了伦理和隐私方面的担忧。生成用于训练面部识别模型的合成数据集作为一种有前景的替代方案应运而生。然而,在生成这些合成数据集时仍存在挑战,即需要包含足够的类间和类内变异。尽管在生成模型的进步使得增加面部数据集中类内的变异(如姿态、光照等)变得更容易,但生成充足的类间变异仍然是一个难题。本文将数据集的生成表述为面部识别模型嵌入空间(表示在一个超球面上)上的打包问题,并提出了一种新的合成数据集生成方法,称为HyperFace。我们将打包问题形式化为优化问题,并采用基于梯度下降的方法来解决它。接着,我们使用条件面部生成模型从优化后的嵌入中合成人脸图像。我们用生成的数据集训练面部识别模型,并在几个基准的真实数据集上评估训练好的模型。实验结果表明,使用HyperFace训练的模型在利用合成数据集进行面部识别训练方面达到了最先进的性能。
https://arxiv.org/abs/2411.08470
In mission-critical domains such as law enforcement and medical diagnosis, the ability to explain and interpret the outputs of deep learning models is crucial for ensuring user trust and supporting informed decision-making. Despite advancements in explainability, existing methods often fall short in providing explanations that mirror the depth and clarity of those given by human experts. Such expert-level explanations are essential for the dependable application of deep learning models in law enforcement and medical contexts. Additionally, we recognize that most explanations in real-world scenarios are communicated primarily through natural language. Addressing these needs, we propose a novel approach that utilizes characteristic descriptors to explain model decisions by identifying their presence in images, thereby generating expert-like explanations. Our method incorporates a concept bottleneck layer within the model architecture, which calculates the similarity between image and descriptor encodings to deliver inherent and faithful explanations. Through experiments in face recognition and chest X-ray diagnosis, we demonstrate that our approach offers a significant contrast over existing techniques, which are often limited to the use of saliency maps. We believe our approach represents a significant step toward making deep learning systems more accountable, transparent, and trustworthy in the critical domains of face recognition and medical diagnosis.
在执法和医疗诊断等关键任务领域,解释和解读深度学习模型输出的能力对于确保用户信任和支持知情决策至关重要。尽管在可解释性方面取得了进展,现有的方法通常难以提供与人类专家给出的解释相媲美的深度和清晰度。这种专家级别的解释对可靠应用深度学习模型于执法和医疗环境中是必不可少的。此外,我们认识到,在现实世界的情境中,大多数解释主要通过自然语言进行沟通。为了满足这些需求,我们提出了一种利用特征描述符的新方法来解释模型决策,通过对图像中特征描述符的存在情况进行识别,从而生成类似专家级别的解释。我们的方法在模型架构中引入了一个概念瓶颈层,该层计算图像和描述符编码之间的相似度以提供内在且忠实的解释。通过面部识别和胸部X光诊断中的实验,我们展示了我们的方法与现有技术相比有显著区别,这些现有技术通常仅限于使用显著性图。我们认为,我们的方法代表了在面部识别和医疗诊断等关键领域中使深度学习系统更具责任性、透明性和可信度的重要一步。
https://arxiv.org/abs/2411.04008
Face recognition systems extract embedding vectors from face images and use these embeddings to verify or identify individuals. Face reconstruction attack (also known as template inversion) refers to reconstructing face images from face embeddings and using the reconstructed face image to enter a face recognition system. In this paper, we propose to use a face foundation model to reconstruct face images from the embeddings of a blackbox face recognition model. The foundation model is trained with 42M images to generate face images from the facial embeddings of a fixed face recognition model. We propose to use an adapter to translate target embeddings into the embedding space of the foundation model. The generated images are evaluated on different face recognition models and different datasets, demonstrating the effectiveness of our method to translate embeddings of different face recognition models. We also evaluate the transferability of reconstructed face images when attacking different face recognition models. Our experimental results show that our reconstructed face images outperform previous reconstruction attacks against face recognition models.
面部识别系统从面部图像中提取嵌入向量,并使用这些嵌入向量来验证或识别个人。面部重建攻击(也称为模板反转)指的是从面部嵌入向量重构面部图像,并利用重构的面部图像进入面部识别系统。在本文中,我们提出了一种方法,即使用一个面部基础模型从黑盒面部识别模型的嵌入向量中重构面部图像。该基础模型是通过4200万张图像训练而成,用于生成固定面部识别模型的面部嵌入向量所对应的面部图像。我们建议使用适配器将目标嵌入转换为基础模型的嵌入空间中的表示形式。我们在不同的面部识别模型和数据集上评估了生成的图像,展示了我们的方法在不同面部识别模型的嵌入翻译上的有效性。我们也评估了重构面部图像在攻击不同面部识别模型时的转移性。实验结果表明,我们重构的面部图像优于针对面部识别模型之前的重建攻击。
https://arxiv.org/abs/2411.03960
This study is focused on enhancing the Haar Cascade Algorithm to decrease the false positive and false negative rate in face matching and face detection to increase the accuracy rate even under challenging conditions. The face recognition library was implemented with Haar Cascade Algorithm in which the 128-dimensional vectors representing the unique features of a face are encoded. A subprocess was applied where the grayscale image from Haar Cascade was converted to RGB to improve the face encoding. Logical process and face filtering are also used to decrease non-face detection. The Enhanced Haar Cascade Algorithm produced a 98.39% accuracy rate (21.39% increase), 63.59% precision rate, 98.30% recall rate, and 72.23% in F1 Score. In comparison, the Haar Cascade Algorithm achieved a 46.70% to 77.00% accuracy rate, 44.15% precision rate, 98.61% recall rate, and 47.01% in F1 Score. Both algorithms used the Confusion Matrix Test with 301,950 comparisons using the same dataset of 550 images. The 98.39% accuracy rate shows a significant decrease in false positive and false negative rates in facial recognition. Face matching and face detection are more accurate in images with complex backgrounds, lighting variations, and occlusions, or even those with similar attributes.
这项研究的重点是改进Haar级联算法,以降低面部匹配和面部检测中的误报率和漏报率,从而在具有挑战性的条件下提高准确性。该面部识别库使用了Haar级联算法,其中128维向量编码了面部的独特特征。通过一个子过程将Haar级联生成的灰度图像转换为RGB图像以改进面部编码。此外,还应用了逻辑处理和面部过滤来减少非面部检测。 增强型Haar级联算法实现了98.39%的准确率(提升了21.39%),63.59%的精确率,98.30%的召回率以及72.23%的F1分数。相比之下,传统的Haar级联算法实现了46.70%到77.00%的准确率,44.15%的精确率,98.61%的召回率和47.01%的F1分数。 两种算法均使用了混淆矩阵测试,并在相同的550张图像数据集上进行了301,950次比较。98.39%的准确率表明面部识别中的误报率和漏报率显著降低。这种改进使得面部匹配和检测在复杂背景、光照变化、遮挡甚至具有相似属性的情况下更加精确。
https://arxiv.org/abs/2411.03831
The accuracy of face recognition systems has improved significantly in the past few years, thanks to the large amount of data collected and the advancement in neural network architectures. However, these large-scale datasets are often collected without explicit consent, raising ethical and privacy concerns. To address this, there have been proposals to use synthetic datasets for training face recognition models. Yet, such models still rely on real data to train the generative models and generally exhibit inferior performance compared to those trained on real datasets. One of these datasets, DigiFace, uses a graphics pipeline to generate different identities and different intra-class variations without using real data in training the models. However, the performance of this approach is poor on face recognition benchmarks, possibly due to the lack of realism in the images generated from the graphics pipeline. In this work, we introduce a novel framework for realism transfer aimed at enhancing the realism of synthetically generated face images. Our method leverages the large-scale face foundation model, and we adapt the pipeline for realism enhancement. By integrating the controllable aspects of the graphics pipeline with our realism enhancement technique, we generate a large amount of realistic variations-combining the advantages of both approaches. Our empirical evaluations demonstrate that models trained using our enhanced dataset significantly improve the performance of face recognition systems over the baseline. The source code and datasets will be made available publicly.
在过去几年中,由于收集了大量数据并且神经网络架构得到了进步,面部识别系统的准确性有了显著提高。然而,这些大规模的数据集经常是在没有明确同意的情况下收集的,这引发了伦理和隐私方面的担忧。为了解决这一问题,有人提议使用合成数据集来训练面部识别模型。然而,这类模型仍然依赖于真实数据来训练生成模型,并且通常在性能上不如那些用真实数据集训练过的模型。其中一个数据集DigiFace利用图形管线生成不同的身份和类内变化,而无需在训练模型时使用真实数据。不过,这种方法在面部识别基准测试上的表现不佳,可能是因为从图形管线生成的图像缺乏现实感。在这项工作中,我们引入了一个新的现实性转移框架,旨在提升合成生成的脸部图像的真实性。我们的方法利用了大规模的人脸基础模型,并调整了用于增强现实性的流水线。通过将图形管线中的可控方面与我们的现实性增强技术相结合,我们可以生成大量真实的变体——结合两种方法的优点。实证评估表明,使用我们增强的数据集训练的模型显著提高了面部识别系统的性能。源代码和数据集将会公开提供。
https://arxiv.org/abs/2411.02188
Machine learning models are prone to adversarial attacks, where inputs can be manipulated in order to cause misclassifications. While previous research has focused on techniques like Generative Adversarial Networks (GANs), there's limited exploration of GANs and Synthetic Minority Oversampling Technique (SMOTE) in text and image classification models to perform adversarial attacks. Our study addresses this gap by training various machine learning models and using GANs and SMOTE to generate additional data points aimed at attacking text classification models. Furthermore, we extend our investigation to face recognition models, training a Convolutional Neural Network(CNN) and subjecting it to adversarial attacks with fast gradient sign perturbations on key features identified by GradCAM, a technique used to highlight key image characteristics CNNs use in classification. Our experiments reveal a significant vulnerability in classification models. Specifically, we observe a 20 % decrease in accuracy for the top-performing text classification models post-attack, along with a 30 % decrease in facial recognition accuracy. This highlights the susceptibility of these models to manipulation of input data. Adversarial attacks not only compromise the security but also undermine the reliability of machine learning systems. By showcasing the impact of adversarial attacks on both text classification and face recognition models, our study underscores the urgent need for develop robust defenses against such vulnerabilities.
机器学习模型容易受到对抗性攻击,即输入可以被操纵以导致错误分类。虽然先前的研究主要集中在诸如生成对抗网络(GANs)等技术上,但对GANs和合成少数过采样技术(SMOTE)在文本和图像分类模型中的应用进行对抗性攻击的探索仍然有限。我们的研究旨在填补这一空白,通过训练各种机器学习模型,并使用GANs和SMOTE来生成额外的数据点,以攻击文本分类模型。此外,我们将调查扩展到人脸识别模型上,训练了一个卷积神经网络(CNN),并通过快速梯度符号扰动对GradCAM识别的关键特征进行对抗性攻击,GradCAM是一种用于突出显示CNN在分类中使用的图像关键特性的技术。我们的实验揭示了分类模型的重大脆弱性。具体来说,在对抗性攻击后,我们观察到表现最佳的文本分类模型准确率下降了20%,面部识别准确率则下降了30%。这凸显了这些模型对输入数据操控的敏感性。对抗性攻击不仅威胁到了系统的安全性,还削弱了机器学习系统的可靠性。通过展示对抗性攻击在文本分类和人脸识别模型上的影响,我们的研究强调了迫切需要开发出能够抵御此类脆弱性的强大防御机制。
https://arxiv.org/abs/2411.03348
Current face anonymization techniques often depend on identity loss calculated by face recognition models, which can be inaccurate and unreliable. Additionally, many methods require supplementary data such as facial landmarks and masks to guide the synthesis process. In contrast, our approach uses diffusion models with only a reconstruction loss, eliminating the need for facial landmarks or masks while still producing images with intricate, fine-grained details. We validated our results on two public benchmarks through both quantitative and qualitative evaluations. Our model achieves state-of-the-art performance in three key areas: identity anonymization, facial attribute preservation, and image quality. Beyond its primary function of anonymization, our model can also perform face swapping tasks by incorporating an additional facial image as input, demonstrating its versatility and potential for diverse applications. Our code and models are available at this https URL .
当前的面部匿名化技术通常依赖于由人脸识别模型计算出的身份损失,这可能会导致不准确和不可靠的结果。此外,许多方法需要额外的数据,如面部特征点和遮罩来指导合成过程。相比之下,我们的方法仅使用扩散模型和重建损失,不需要面部特征点或遮罩,同时仍能生成具有复杂精细细节的图像。我们在两个公开基准上通过定量和定性评估验证了结果。我们的模型在三个关键领域达到了最先进的性能:身份匿名化、面部属性保留和图像质量。除了主要的匿名功能外,通过将额外的面部图像作为输入,我们的模型还可以执行面部交换任务,展示了其多样性和潜在的应用前景。我们的代码和模型可以在以下链接中获取: [此 HTTPS URL]。
https://arxiv.org/abs/2411.00762
Face recognition technologies are increasingly used in various applications, yet they are vulnerable to face spoofing attacks. These spoofing attacks often involve unique 3D structures, such as printed papers or mobile device screens. Although stereo-depth cameras can detect such attacks effectively, their high-cost limits their widespread adoption. Conversely, two-sensor systems without extrinsic calibration offer a cost-effective alternative but are unable to calculate depth using stereo techniques. In this work, we propose a method to overcome this challenge by leveraging facial attributes to derive disparity information and estimate relative depth for anti-spoofing purposes, using non-calibrated systems. We introduce a multi-modal anti-spoofing model, coined Disparity Model, that incorporates created disparity maps as a third modality alongside the two original sensor modalities. We demonstrate the effectiveness of the Disparity Model in countering various spoof attacks using a comprehensive dataset collected from the Intel RealSense ID Solution F455. Our method outperformed existing methods in the literature, achieving an Equal Error Rate (EER) of 1.71% and a False Negative Rate (FNR) of 2.77% at a False Positive Rate (FPR) of 1%. These errors are lower by 2.45% and 7.94% than the errors of the best comparison method, respectively. Additionally, we introduce a model ensemble that addresses 3D spoof attacks as well, achieving an EER of 2.04% and an FNR of 3.83% at an FPR of 1%. Overall, our work provides a state-of-the-art solution for the challenging task of anti-spoofing in non-calibrated systems that lack depth information.
面部识别技术越来越多地应用于各种场合,然而它们容易受到面部欺骗攻击的影响。这些欺骗攻击通常涉及独特的三维结构,如打印的纸张或移动设备屏幕。虽然立体深度相机可以有效检测此类攻击,但其高成本限制了广泛采用。相反,没有外在校准的双传感器系统提供了更具成本效益的选择,但却无法使用立体技术计算深度。在这项工作中,我们提出了一种方法来克服这一挑战,通过利用面部特征推导出视差信息并估计相对深度以对抗欺骗攻击,在非校准系统中实现这一点。我们引入了一个多模态的防欺骗模型,称为视差模型,该模型将创建的视差图作为第三种模式与两个原始传感器模式一起使用。我们使用从Intel RealSense ID解决方案F455收集的全面数据集来展示视差模型在对抗各种欺骗攻击中的有效性。我们的方法超越了现有文献中的其他方法,在等错误率(EER)上达到1.71%,假阴性率(FNR)为2.77%时,假阳性率(FPR)为1%,这些误差比最佳比较方法的误差分别低2.45%和7.94%。此外,我们还介绍了一种模型集成,它可以应对三维欺骗攻击,在假阳性率为1%的情况下实现了2.04%的等错误率(EER)和3.83%的假阴性率(FNR)。总体而言,我们的工作为缺乏深度信息的非校准系统中的防欺骗挑战提供了一个最先进的解决方案。
https://arxiv.org/abs/2410.24031