Face recognition in the wild has gained a lot of focus in the last few years, and many face recognition models are designed to verify faces in medium-quality images. Especially due to the availability of large training datasets with similar conditions, deep face recognition models perform exceptionally well in such tasks. However, in other tasks where substantially less training data is available, such methods struggle, especially when required to compare high-quality enrollment images with low-quality probes. On the other hand, traditional RankList-based methods have been developed that compare faces indirectly by comparing to cohort faces with similar conditions. In this paper, we revisit these RankList methods and extend them to use the logits of the state-of-the-art DaliFace network, instead of an external cohort. We show that through a reasonable Logit-Cohort Selection (LoCoS) the performance of RankList-based functions can be improved drastically. Experiments on two challenging face recognition datasets not only demonstrate the enhanced performance of our proposed method but also set the stage for future advancements in handling diverse image qualities.
近年来,在野外进行面部识别已经获得了很多关注,并且许多面部识别模型都被设计用于在中等质量图像上验证面部。特别是,由于有大量具有相似条件的训练数据可用,深度面部识别模型在这些任务上表现出色。然而,在训练数据非常有限的其他任务中,这种方法会挣扎,尤其是在需要将高质量的学生图像与低质量的探测图像进行比较时。另一方面,传统基于RankList的方法通过与具有相似条件的队列面部进行间接比较来进行面部识别。在本文中,我们复习了这些基于RankList的方法,并将它们扩展到使用最先进的DaliFace网络的状态级日志,而不是使用外部队列。我们证明了通过合理的LoCoS选择,基于RankList的函数的性能可以大幅提高。在两个具有挑战性的面部识别数据集上的实验不仅证明了我们提出方法的有效性,而且为未来在处理不同图像质量方面的进一步发展奠定了基础。
https://arxiv.org/abs/2410.01498
Deep learning models, such as those used for face recognition and attribute prediction, are susceptible to manipulations like adversarial noise and unintentional noise, including Gaussian and impulse noise. This paper introduces CIAI, a Class-Independent Adversarial Intent detection network built on a modified vision transformer with detection layers. CIAI employs a novel loss function that combines Maximum Mean Discrepancy and Center Loss to detect both intentional (adversarial attacks) and unintentional noise, regardless of the image class. It is trained in a multi-step fashion. We also introduce the aspect of intent during detection that can act as an added layer of security. We further showcase the performance of our proposed detector on CelebA, CelebA-HQ, LFW, AgeDB, and CIFAR-10 datasets. Our detector is able to detect both intentional (like FGSM, PGD, and DeepFool) and unintentional (like Gaussian and Salt & Pepper noises) perturbations.
Deep learning models, such as those used for face recognition and attribute prediction, are susceptible to manipulations like adversarial noise and unintentional noise, including Gaussian and impulse noise. This paper introduces CIAI, a Class-Independent Adversarial Intent detection network built on a modified vision transformer with detection layers. CIAI employs a novel loss function that combines Maximum Mean Discrepancy and Center Loss to detect both intentional (adversarial attacks) and unintentional noise, regardless of the image class. It is trained in a multi-step fashion. We also introduce the aspect of intent during detection that can act as an added layer of security. We further showcase the performance of our proposed detector on CelebA, CelebA-HQ, LFW, AgeDB, and CIFAR-10 datasets. Our detector is able to detect both intentional (like FGSM, PGD, and DeepFool) and unintentional (like Gaussian and Salt & Pepper noises) perturbations.
https://arxiv.org/abs/2409.19619
Artificial Neural Networks (ANNs), commonly mimicking neurons with non-linear functions to output floating-point numbers, consistently receive the same signals of a data point during its forward time. Unlike ANNs, Spiking Neural Networks (SNNs) get various input signals in the forward time of a data point and simulate neurons in a biologically plausible way, i.e., producing a spike (a binary value) if the accumulated membrane potential of a neuron is larger than a threshold. Even though ANNs have achieved remarkable success in multiple tasks, e.g., face recognition and object detection, SNNs have recently obtained attention due to their low power consumption, fast inference, and event-driven properties. While privacy threats against ANNs are widely explored, much less work has been done on SNNs. For instance, it is well-known that ANNs are vulnerable to the Membership Inference Attack (MIA), but whether the same applies to SNNs is not explored. In this paper, we evaluate the membership privacy of SNNs by considering eight MIAs, seven of which are inspired by MIAs against ANNs. Our evaluation results show that SNNs are more vulnerable (maximum 10% higher in terms of balanced attack accuracy) than ANNs when both are trained with neuromorphic datasets (with time dimension). On the other hand, when training ANNs or SNNs with static datasets (without time dimension), the vulnerability depends on the dataset used. If we convert ANNs trained with static datasets to SNNs, the accuracy of MIAs drops (maximum 11.5% with a reduction of 7.6% on the test accuracy of the target model). Next, we explore the impact factors of MIAs on SNNs by conducting a hyperparameter study. Finally, we show that the basic data augmentation method for static data and two recent data augmentation methods for neuromorphic data can considerably (maximum reduction of 25.7%) decrease MIAs' performance on SNNs.
人工神经网络(ANNs)通常通过非线性函数模拟神经元的信号,在数据点的正向过程中始终接收到相同的数据点信号。与ANNs不同,Spiking Neural Networks(SNNs)在数据点的正向过程中获取各种输入信号,并以生物合理的方式模拟神经元,即当神经元的累积膜电位大于阈值时,产生一个尖刺(二进制值)。尽管ANNs在多个任务上取得了显著的成功,例如面部识别和物体检测,SNNs最近因低功耗、快速推理和事件驱动特性而受到关注。尽管对ANNs的隐私威胁进行了广泛探讨,但关于SNNs的研究还相对较少。例如,众所周知ANNs易受成员推断攻击(MIA)的影响,但SNNs是否同样适用尚无探讨。在本文中,我们通过考虑八个MIA来评估SNN的成员隐私。我们的评估结果表明,当使用神经元形态数据集训练SNN时(具有时间维度),SNN比ANN更易受攻击(平衡攻击准确度最高达10%)。另一方面,当用静态数据集训练ANN或SNN时(不具有时间维度),攻击的脆弱性取决于使用的数据集。如果我们将用静态数据集训练的ANN转换为SNN,MIAs的准确度会降低(在测试目标模型上的精度降低7.6%,最高达11.5%)。接下来,我们通过进行参数研究来探讨MIAs对SNN的影响。最后,我们证明了使用静态数据和两种针对神经元形态数据的最近增强方法可以大大降低MIAs在SNN上的性能(最高减少25.7%)。
https://arxiv.org/abs/2409.19413
Privacy issue is a main concern in developing face recognition techniques. Although synthetic face images can partially mitigate potential legal risks while maintaining effective face recognition (FR) performance, FR models trained by face images synthesized by existing generative approaches frequently suffer from performance degradation problems due to the insufficient discriminative quality of these synthesized samples. In this paper, we systematically investigate what contributes to solid face recognition model training, and reveal that face images with certain degree of similarities to their identity centers show great effectiveness in the performance of trained FR models. Inspired by this, we propose a novel diffusion-based approach (namely Center-based Semi-hard Synthetic Face Generation (CemiFace)) which produces facial samples with various levels of similarity to the subject center, thus allowing to generate face datasets containing effective discriminative samples for training face recognition. Experimental results show that with a modest degree of similarity, training on the generated dataset can produce competitive performance compared to previous generation methods.
隐私问题是在开发面部识别技术中的主要关注点。尽管通过合成现有生成方法生成的 synthetic 面部图像可以在一定程度上减轻潜在的法律风险,同时保持有效的面部识别(FR)性能,但由现有生成方法训练的 FR 模型经常由于这些合成样本的缺乏鉴别性质量而出现性能下降问题。在本文中,我们系统地研究了哪些因素有助于构建可靠的面部识别模型,并揭示出具有与其身份中心一定相似度的面部图像在训练FR模型方面具有很大的效果。受到这一启发,我们提出了一个新颖的扩散基础方法(即基于中心点的半硬合成面部生成方法(CemiFace)),它产生具有各种程度相似度的面部样本,从而允许生成包含有效鉴别性样本的面部数据,用于训练面部识别。实验结果表明,在相当地相似度水平下,通过生成数据集训练可以产生与以前生成方法相当的竞争性能。
https://arxiv.org/abs/2409.18876
Synthetic face recognition (SFR) aims to generate synthetic face datasets that mimic the distribution of real face data, which allows for training face recognition models in a privacy-preserving manner. Despite the remarkable potential of diffusion models in image generation, current diffusion-based SFR models struggle with generalization to real-world faces. To address this limitation, we outline three key objectives for SFR: (1) promoting diversity across identities (inter-class diversity), (2) ensuring diversity within each identity by injecting various facial attributes (intra-class diversity), and (3) maintaining identity consistency within each identity group (intra-class identity preservation). Inspired by these goals, we introduce a diffusion-fueled SFR model termed $\text{ID}^3$. $\text{ID}^3$ employs an ID-preserving loss to generate diverse yet identity-consistent facial appearances. Theoretically, we show that minimizing this loss is equivalent to maximizing the lower bound of an adjusted conditional log-likelihood over ID-preserving data. This equivalence motivates an ID-preserving sampling algorithm, which operates over an adjusted gradient vector field, enabling the generation of fake face recognition datasets that approximate the distribution of real-world faces. Extensive experiments across five challenging benchmarks validate the advantages of $\text{ID}^3$.
合成面部识别(SFR)旨在生成逼真的真实面部数据集,以模仿真实面部数据的分布,从而在保护隐私的前提下训练面部识别模型。尽管扩散模型在图像生成方面的显著潜力,但目前的扩散型SFR模型在真实世界 faces上的泛化能力有限。为了应对这一局限,我们提出了三个SFR的关键目标:(1)在身份之间促进多样性(跨类多样性),(2)通过注入各种面部属性来确保每个身份的多样性(类内多样性),和(3)在每组身份内部保持身份一致性(类内身份保持)。为了实现这些目标,我们引入了一个基于扩散的SFR模型,称为$\text{ID}^3$。$\text{ID}^3$采用一个ID保留损失来生成多样且身份一致的面部外观。从理论上讲,我们证明了最小化这一损失等于最大化调整条件对ID保留数据的下界。这一等价性激发了ID保留采样算法,该算法在调整梯度向量场上操作,从而生成逼真的假脸识别数据,近似于真实世界 faces的分布。在五个具有挑战性的基准测试中进行了广泛的实验,验证了$\text{ID}^3$的优势。
https://arxiv.org/abs/2409.17576
Face Recognition (FR) has advanced significantly with the development of deep learning, achieving high accuracy in several applications. However, the lack of interpretability of these systems raises concerns about their accountability, fairness, and reliability. In the present study, we propose an interactive framework to enhance the explainability of FR models by combining model-agnostic Explainable Artificial Intelligence (XAI) and Natural Language Processing (NLP) techniques. The proposed framework is able to accurately answer various questions of the user through an interactive chatbot. In particular, the explanations generated by our proposed method are in the form of natural language text and visual representations, which for example can describe how different facial regions contribute to the similarity measure between two faces. This is achieved through the automatic analysis of the output's saliency heatmaps of the face images and a BERT question-answering model, providing users with an interface that facilitates a comprehensive understanding of the FR decisions. The proposed approach is interactive, allowing the users to ask questions to get more precise information based on the user's background knowledge. More importantly, in contrast to previous studies, our solution does not decrease the face recognition performance. We demonstrate the effectiveness of the method through different experiments, highlighting its potential to make FR systems more interpretable and user-friendly, especially in sensitive applications where decision-making transparency is crucial.
面部识别(FR)在深度学习的推动下取得了显著的进步,在多个应用中实现了高准确度。然而,这些系统的可解释性不足会引发对其责任、公平性和可靠性的担忧。在当前的研究中,我们提出了一个交互式框架,通过结合模型无关的可解释人工智能(XAI)和自然语言处理(NLP)技术来增强FR模型的可解释性。所提出的框架能够通过交互式聊天机器人为用户准确回答各种问题。特别是,我们提出的方法生成的解释是以自然语言文本和视觉表示的形式,例如可以描述不同面部区域如何贡献于两个面部的相似度测量。这是通过分析面部图像输出的元启发式热力图以及BERT问题回答模型自动生成的,为用户提供了全面了解FR决策的界面。所提出的方法具有交互性,允许用户根据其背景知识提出问题以获得更精确的信息。 与以前的研究相比,我们的解决方案没有降低面部识别性能。我们通过不同实验展示了这种方法的有效性,并强调了其对使FR系统更具可解释性和用户友好性的潜力,尤其是在敏感应用中,决策透明度至关重要。
https://arxiv.org/abs/2409.16089
Watermarking is an essential technique for embedding an identifier (i.e., watermark message) within digital images to assert ownership and monitor unauthorized alterations. In face recognition systems, watermarking plays a pivotal role in ensuring data integrity and security. However, an adversary could potentially interfere with the watermarking process, significantly impairing recognition performance. We explore the interaction between watermarking and adversarial attacks on face recognition models. Our findings reveal that while watermarking or input-level perturbation alone may have a negligible effect on recognition accuracy, the combined effect of watermarking and perturbation can result in an adversarial watermarking attack, significantly degrading recognition performance. Specifically, we introduce a novel threat model, the adversarial watermarking attack, which remains stealthy in the absence of watermarking, allowing images to be correctly recognized initially. However, once watermarking is applied, the attack is activated, causing recognition failures. Our study reveals a previously unrecognized vulnerability: adversarial perturbations can exploit the watermark message to evade face recognition systems. Evaluated on the CASIA-WebFace dataset, our proposed adversarial watermarking attack reduces face matching accuracy by 67.2% with an $\ell_\infty$ norm-measured perturbation strength of ${2}/{255}$ and by 95.9% with a strength of ${4}/{255}$.
纹理化是一种将标识(即水印消息)嵌入数字图像中的重要技术,以维护数据 integrity 和 security。在面部识别系统中,纹理化在确保数据 integrity 和 security 方面发挥着关键作用。然而,一个攻击者可能会干扰纹理化过程,显著地破坏识别性能。我们研究了纹理化与面部识别模型上的对抗攻击之间的相互作用。我们的研究结果表明,虽然纹理化或输入级扰动可能对识别准确性产生微不足道的影響,但纹理化与扰动的联合效果可能会导致对抗纹理化攻击,显著降低识别性能。具体来说,我们引入了一种新的威胁模型,即对抗纹理化攻击,该攻击在无需纹理化的情况下仍然保持隐身,允许图像首先正确识别。然而,一旦纹理化应用,攻击就会被激活,导致识别失败。我们的研究揭示了一个之前未被认识到的漏洞:对抗扰动可以利用水印消息规避面部识别系统。在 CASIA-WebFace 数据集上进行评估,我们提出的对抗纹理化攻击通过 ${2}/{255} 的欧氏距离测量扰动强度减少了面部匹配准确性 67.2%,以及通过 ${4}/{255} 的强度减少了 95.9% 的面部匹配准确性。
https://arxiv.org/abs/2409.16056
Many important decisions in our everyday lives, such as authentication via biometric models, are made by Artificial Intelligence (AI) systems. These can be in poor alignment with human expectations, and testing them on clear-cut existing data may not be enough to uncover those cases. We propose a method to find samples in the latent space of a generative model, designed to be challenging for a decision-making model with regard to matching human expectations. By presenting those samples to both the decision-making model and human raters, we can identify areas where its decisions align with human intuition and where they contradict it. We apply this method to a face recognition model and collect a dataset of 11,200 human ratings from 100 participants. We discuss findings from our dataset and how our approach can be used to explore the performance of AI models in different contexts and for different user groups.
许多我们日常生活中重要的决策,如通过生物识别模型进行身份验证,是由人工智能(AI)系统做出的。这些决策可能与人类的期望不符,而且仅在明确存在的数据上对其进行测试可能不足以发现这些情况。我们提出了一个方法来查找具有生成模型潜在空间中的样本,该模型在匹配人类期望方面具有挑战性。通过将这些样本展示给决策模型和人类评估者,我们可以确定其决策与人类直觉相符还是相矛盾。我们将这个方法应用于一个面部识别模型,并从100名参与者那里收集了11,200个人类评分。我们讨论了我们的数据集中的发现以及我们的方法如何用于研究不同场景和不同用户组中AI模型的性能。
https://arxiv.org/abs/2409.12801
Many real-world applications today like video surveillance and urban governance need to address the recognition of masked faces, where content replacement by diverse masks often brings in incomplete appearance and ambiguous representation, leading to a sharp drop in accuracy. Inspired by recent progress on amodal perception, we propose to migrate the mechanism of amodal completion for the task of masked face recognition with an end-to-end de-occlusion distillation framework, which consists of two modules. The \textit{de-occlusion} module applies a generative adversarial network to perform face completion, which recovers the content under the mask and eliminates appearance ambiguity. The \textit{distillation} module takes a pre-trained general face recognition model as the teacher and transfers its knowledge to train a student for completed faces using massive online synthesized face pairs. Especially, the teacher knowledge is represented with structural relations among instances in multiple orders, which serves as a posterior regularization to enable the adaptation. In this way, the knowledge can be fully distilled and transferred to identify masked faces. Experiments on synthetic and realistic datasets show the efficacy of the proposed approach.
许多现实世界的应用,如视频监控和城市治理,需要解决戴口罩时的面部识别问题,由于内容替换为不同口罩往往带来不完整的面部外观和模糊的表示,导致准确性下降。受到最近关于无模态感知进步的启发,我们提出了一个端到端去雾框架,将无模态完成机制迁移到口罩面部识别任务中,该框架包括两个模块。 \textit{去雾}模块应用生成对抗网络进行面部完成,从而恢复口罩下的内容并消除外观不确定性。 \textit{去雾}模块使用预训练的通用面部识别模型作为教师,将其知识传递给学生以训练完成面部的模型。特别地,教师的知识用多个序列中的实例之间的结构关系表示,这作为后验正则化,以实现适应。在这种方式下,知识可以完全去雾和转移,以识别戴口罩的面部。在合成和真实数据集上的实验证明了所提出方法的有效性。
https://arxiv.org/abs/2409.12385
Over the past decade, there has been a steady advancement in enhancing face recognition algorithms leveraging advanced machine learning methods. The role of the loss function is pivotal in addressing face verification problems and playing a game-changing role. These loss functions have mainly explored variations among intra-class or inter-class separation. This research examines the natural phenomenon of facial symmetry in the face verification problem. The symmetry between the left and right hemi faces has been widely used in many research areas in recent decades. This paper adopts this simple approach judiciously by splitting the face image vertically into two halves. With the assumption that the natural phenomena of facial symmetry can enhance face verification methodology, we hypothesize that the two output embedding vectors of split faces must project close to each other in the output embedding space. Inspired by this concept, we penalize the network based on the disparity of embedding of the symmetrical pair of split faces. Symmetrical loss has the potential to minimize minor asymmetric features due to facial expression and lightning conditions, hence significantly increasing the inter-class variance among the classes and leading to more reliable face embedding. This loss function propels any network to outperform its baseline performance across all existing network architectures and configurations, enabling us to achieve SoTA results.
在过去的十年里,利用先进的机器学习方法加强面部识别算法已经取得了持续的进步。损失函数在解决面部验证问题和发挥游戏般重要作用方面至关重要。这些损失函数主要研究了类内或类间分离的差异。本文研究了面部验证问题中的自然现象——左半脸和右半脸的对称性。近年来,左半脸和右半脸的对称性在许多研究领域得到了广泛应用。本文谨慎采用这一简单方法,将面部图像沿垂直方向分割成两个半部分。假设自然面部对称现象可以增强面部验证方法,我们假设分割后的左右半脸在输出嵌入空间中靠近彼此。受到这一概念的启发,我们根据对称性对网络进行惩罚。对称损失有可能最小化由于表情和闪电等面部情况导致的微小不对称特征,从而显著增加类间方差,使面部嵌入更加可靠。这种损失函数推动任何网络在所有现有网络架构和配置上超越基线性能,使我们能够实现SOTA结果。
https://arxiv.org/abs/2409.11816
Face recognition in the wild is now advancing towards light-weight models, fast inference speed and resolution-adapted capability. In this paper, we propose a bridge distillation approach to turn a complex face model pretrained on private high-resolution faces into a light-weight one for low-resolution face recognition. In our approach, such a cross-dataset resolution-adapted knowledge transfer problem is solved via two-step distillation. In the first step, we conduct cross-dataset distillation to transfer the prior knowledge from private high-resolution faces to public high-resolution faces and generate compact and discriminative features. In the second step, the resolution-adapted distillation is conducted to further transfer the prior knowledge to synthetic low-resolution faces via multi-task learning. By learning low-resolution face representations and mimicking the adapted high-resolution knowledge, a light-weight student model can be constructed with high efficiency and promising accuracy in recognizing low-resolution faces. Experimental results show that the student model performs impressively in recognizing low-resolution faces with only 0.21M parameters and 0.057MB memory. Meanwhile, its speed reaches up to 14,705, ~934 and 763 faces per second on GPU, CPU and mobile phone, respectively.
野外的面部识别现在正在朝着轻量级模型、快速推理速度和分辨率适应能力的发展方向前进。在本文中,我们提出了一种桥蒸馏方法,将私有的高分辨率面部预训练模型转化为低分辨率面部识别的轻量级模型。在我们的方法中,通过两步蒸馏解决了跨数据集分辨率适应知识传递问题。第一步,我们进行跨数据集蒸馏,将高分辨率私人面部上的先验知识传递到高分辨率公共面部,并生成紧凑且具有区分性的特征。第二步,通过多任务学习进一步将先验知识传递给合成低分辨率面部。通过学习低分辨率面部表示并模拟适应高分辨率知识,可以构建具有高效率和令人满意的准确性的轻量学生模型来识别低分辨率面部。实验结果表明,仅使用0.21M参数和0.057MB内存的高学生模型在识别低分辨率面部时表现出色。同时,其速度分别在GPU、CPU和移动手机上达到14,705、~934和763张/秒。
https://arxiv.org/abs/2409.11786
Despite the remarkable performance of deep neural networks for face detection and recognition tasks in the visible spectrum, their performance on more challenging non-visible domains is comparatively still lacking. While significant research has been done in the fields of domain adaptation and domain generalization, in this paper we tackle scenarios in which these methods have limited applicability owing to the lack of training data from target domains. We focus on the problem of single-source (visible) and multi-target (SWIR, long-range/remote, surveillance, and body-worn) face recognition task. We show through experiments that a good template generation algorithm becomes crucial as the complexity of the target domain increases. In this context, we introduce a template generation algorithm called Norm Pooling (and a variant known as Sparse Pooling) and show that it outperforms average pooling across different domains and networks, on the IARPA JANUS Benchmark Multi-domain Face (IJB-MDF) dataset.
尽管在可见光谱范围内,深度神经网络在 face 检测和识别任务中的表现非常出色,但它们在更具有挑战性的非可见领域中的性能相对仍然不足。尽管在领域迁移和泛化领域已经进行了大量的研究,但在本文中,我们关注的是由于目标领域缺乏训练数据而使得这些方法的应用受限的情况。我们专注于单一源(可见)和多目标(SWIR,远距离/远程,监视,可穿戴)面部识别问题。我们通过实验证明,随着目标领域的复杂性的增加,一个好的模板生成算法变得至关重要。在这种情况下,我们引入了一个名为 Norm Pooling(及其变体 Sparse Pooling)的模板生成算法,并在 IARPA JANUS 基准多领域面部(IJB-MDF)数据集上证明了它优于不同领域和网络的平均池化。
https://arxiv.org/abs/2409.09832
Adversarial patches present significant challenges to the robustness of deep learning models, making the development of effective defenses become critical for real-world applications. This paper introduces DIFFender, a novel DIFfusion-based DeFender framework that leverages the power of a text-guided diffusion model to counter adversarial patch attacks. At the core of our approach is the discovery of the Adversarial Anomaly Perception (AAP) phenomenon, which enables the diffusion model to accurately detect and locate adversarial patches by analyzing distributional anomalies. DIFFender seamlessly integrates the tasks of patch localization and restoration within a unified diffusion model framework, enhancing defense efficacy through their close interaction. Additionally, DIFFender employs an efficient few-shot prompt-tuning algorithm, facilitating the adaptation of the pre-trained diffusion model to defense tasks without the need for extensive retraining. Our comprehensive evaluation, covering image classification and face recognition tasks, as well as real-world scenarios, demonstrates DIFFender's robust performance against adversarial attacks. The framework's versatility and generalizability across various settings, classifiers, and attack methodologies mark a significant advancement in adversarial patch defense strategies. Except for the popular visible domain, we have identified another advantage of DIFFender: its capability to easily expand into the infrared domain. Consequently, we demonstrate the good flexibility of DIFFender, which can defend against both infrared and visible adversarial patch attacks alternatively using a universal defense framework.
对抗性补丁对深度学习模型的稳健性提出了重大挑战,使得开发有效的防御变得至关重要。本文介绍了一种新颖的基于DIFfusion的DeFender框架,它利用文本指导的扩散模型的力量来对抗对抗性补丁攻击。我们方法的核心是发现对抗性异常感知(AAP)现象,该现象使扩散模型能够准确检测和定位对抗性补丁,通过分析分布异常。DIFFender无缝地将补丁定位和修复任务整合到一个统一的扩散模型框架中,通过其与防御任务亲密交互来提高防御效果。此外,DIFFender采用了一种高效的仅几轮提示的调优算法,无需进行广泛的重新训练,将预训练的扩散模型适应防御任务。我们对图像分类和面部识别任务以及现实场景进行全面评估,证明了DIFFender在对抗性攻击方面的稳健性能。该框架在不同设置、分类器和攻击方法上的通用性和可扩展性标志着对抗性补丁防御策略的重大进展。除了流行的可见域,我们还发现了DIFFender的一个优势:它容易扩展到红外域。因此,我们证明了DIFFender的灵活性,可以使用通用防御框架同时防御红外和可见性对抗性补丁攻击。
https://arxiv.org/abs/2409.09406
Demographic bias is one of the major challenges for face recognition systems. The majority of existing studies on demographic biases are heavily dependent on specific demographic groups or demographic classifier, making it difficult to address performance for unrecognised groups. This paper introduces ``LabellessFace'', a novel framework that improves demographic bias in face recognition without requiring demographic group labeling typically required for fairness considerations. We propose a novel fairness enhancement metric called the class favoritism level, which assesses the extent of favoritism towards specific classes across the dataset. Leveraging this metric, we introduce the fair class margin penalty, an extension of existing margin-based metric learning. This method dynamically adjusts learning parameters based on class favoritism levels, promoting fairness across all attributes. By treating each class as an individual in facial recognition systems, we facilitate learning that minimizes biases in authentication accuracy among individuals. Comprehensive experiments have demonstrated that our proposed method is effective for enhancing fairness while maintaining authentication accuracy.
人口偏见是面部识别系统的一个主要挑战。现有关于人口偏见的多数研究都严重依赖具体的人口群体或人口分类器,这使得未识别的人口群体在性能上难以得到解决。本文介绍了一种名为“无标签面部”的新框架,通过不需要通常用于公平考虑的人口群体标签来改善面部识别中的人口偏见。我们提出了一种名为类偏见水平的新公平性增强度量,该度量评估了数据集中的特定类之间的偏好程度。利用这个度量,我们引入了公平类边际惩罚,这是现有边际基线度量学习的一种扩展。这种方法根据类偏见水平动态调整学习参数,推动了所有属性的身份验证准确性之间的公平性。将每个类别视为一个独立个体,我们推动了学习,最小化了个体身份验证准确性中的偏见。全面的实验证明,我们提出的方法在提高公平性的同时保持了身份验证准确性。
https://arxiv.org/abs/2409.09274
Face anti-spoofing (FAS) plays a vital role in preventing face recognition (FR) systems from presentation attacks. Nowadays, FAS systems face the challenge of domain shift, impacting the generalization performance of existing FAS methods. In this paper, we rethink about the inherence of domain shift and deconstruct it into two factors: image style and image quality. Quality influences the purity of the presentation of spoof information, while style affects the manner in which spoof information is presented. Based on our analysis, we propose DiffFAS framework, which quantifies quality as prior information input into the network to counter image quality shift, and performs diffusion-based high-fidelity cross-domain and cross-attack types generation to counter image style shift. DiffFAS transforms easily collectible live faces into high-fidelity attack faces with precise labels while maintaining consistency between live and spoof face identities, which can also alleviate the scarcity of labeled data with novel type attacks faced by nowadays FAS system. We demonstrate the effectiveness of our framework on challenging cross-domain and cross-attack FAS datasets, achieving the state-of-the-art performance. Available at this https URL.
面对面抗伪造(FAS)在防止人脸识别(FR)系统受到展示攻击方面发挥了关键作用。如今,FAS系统面临着领域漂移的挑战,这会影响现有FAS方法的泛化性能。在本文中,我们重新思考了领域漂移的本质,将其分解为两个因素:图像风格和图像质量。质量会影响展示伪造信息的纯度,而风格会影响伪造信息的展示方式。根据我们的分析,我们提出了DiffFAS框架,将质量作为网络输入的前置信息来对抗图像质量漂移,并采用扩散基高保真跨域和跨攻击类型生成来对抗图像风格漂移。DiffFAS将容易收集的活人面转化为带有精确标签的高保真攻击面,同时保持活人和伪造人身份之间的 consistency,这也可以减轻现代FAS系统面临的新类型攻击的稀缺数据。我们在具有挑战性的跨域和跨攻击FAS数据集上证明了我们的框架的有效性,实现了与最先进性能的水平相当的效果。该框架可在以下链接处获得:https://url.cn/
https://arxiv.org/abs/2409.08572
As Artificial Intelligence applications expand, the evaluation of models faces heightened scrutiny. Ensuring public readiness requires evaluation datasets, which differ from training data by being disjoint and ethically sourced in compliance with privacy regulations. The performance and fairness of face recognition systems depend significantly on the quality and representativeness of these evaluation datasets. This data is sometimes scraped from the internet without user's consent, causing ethical concerns that can prohibit its use without proper releases. In rare cases, data is collected in a controlled environment with consent, however, this process is time-consuming, expensive, and logistically difficult to execute. This creates a barrier for those unable to conjure the immense resources required to gather ethically sourced evaluation datasets. To address these challenges, we introduce the Synthetic Identity Generation pipeline, or SIG, that allows for the targeted creation of ethical, balanced datasets for face recognition evaluation. Our proposed and demonstrated pipeline generates high-quality images of synthetic identities with controllable pose, facial features, and demographic attributes, such as race, gender, and age. We also release an open-source evaluation dataset named ControlFace10k, consisting of 10,008 face images of 3,336 unique synthetic identities balanced across race, gender, and age, generated using the proposed SIG pipeline. We analyze ControlFace10k along with a non-synthetic BUPT dataset using state-of-the-art face recognition algorithms to demonstrate its effectiveness as an evaluation tool. This analysis highlights the dataset's characteristics and its utility in assessing algorithmic bias across different demographic groups.
随着人工智能应用的扩展,对模型的评估面临更加严格的审查。确保公众可用需要评估数据集,这些数据集与训练数据不同,因为它们符合隐私法规,且不共享伦理来源。人脸识别系统的性能和公平性取决于这些评估数据集的质量和代表性。有时候,这些数据集从互联网上未经用户许可地 scraped,导致道德问题,如果没有适当发布,将无法使用。在极少数情况下,数据是在经过同意的控制环境中收集的,然而,这个过程耗时、昂贵且物流困难。这为无法筹集到足够资源来收集道德来源评估数据的人设置了障碍。为解决这些挑战,我们引入了Synthetic Identity Generation pipeline(SIG),它允许为面部识别评估创建具有可控姿态、面部特征和 demographic 属性的道德、平衡数据集。我们所提议和演示的管道生成具有可控制姿态、面部特征和 demographic 属性的合成身份的高质量图像。我们还发布了名为ControlFace10k的开放式源评估数据集,其中包括3,336个独特合成身份的人脸图像,这些图像平衡种族、性别和年龄,使用所提出的SIG管道生成。我们使用最先进的人脸识别算法沿着与非合成BUPT数据集一起分析ControlFace10k,以展示其作为评估工具的有效性。这个分析突出了该数据集的特性以及它在评估不同年龄群体中的算法偏见方面的价值。
https://arxiv.org/abs/2409.08345
Foundational models, trained on vast and diverse datasets, have demonstrated remarkable capabilities in generalizing across different domains and distributions for various zero-shot tasks. Our work addresses the challenge of retaining these powerful generalization capabilities when adapting foundational models to specific downstream tasks through fine-tuning. To this end, we introduce a novel approach we call "similarity loss", which can be incorporated into the fine-tuning process of any task. By minimizing the distortion of fine-tuned embeddings from the pre-trained embeddings, our method strikes a balance between task-specific adaptation and preserving broad generalization abilities. We evaluate our approach on two diverse tasks: image classification on satellite imagery and face recognition, focusing on open-class and domain shift scenarios to assess out-of-distribution (OOD) performance. We demonstrate that this approach significantly improves OOD performance while maintaining strong in-distribution (ID) performance.
基础模型通过训练于丰富多样的数据集表现出在不同领域和分布下的显著泛化能力。我们的工作解决了在将基础模型应用于特定下游任务时保留这些强大的泛化能力的问题。为此,我们引入了一种名为“相似性损失”的新方法,可以将其纳入任何任务的微调过程中。通过最小化微调前预训练嵌入的变形,我们的方法在任务特定适应和保留广泛的泛化能力之间取得了平衡。我们在两个具有丰富多样性的任务上评估我们的方法:卫星图像上的图像分类和面部识别,重点关注开标签和领域迁移场景以评估离散(OD)性能。我们证明了这种方法在保持强大的域内(ID)性能的同时显著提高了离散(OD)性能。
https://arxiv.org/abs/2409.07582
In the current landscape of biometrics and surveillance, the ability to accurately recognize faces in uncontrolled settings is paramount. The Watchlist Challenge addresses this critical need by focusing on face detection and open-set identification in real-world surveillance scenarios. This paper presents a comprehensive evaluation of participating algorithms, using the enhanced UnConstrained College Students (UCCS) dataset with new evaluation protocols. In total, four participants submitted four face detection and nine open-set face recognition systems. The evaluation demonstrates that while detection capabilities are generally robust, closed-set identification performance varies significantly, with models pre-trained on large-scale datasets showing superior performance. However, open-set scenarios require further improvement, especially at higher true positive identification rates, i.e., lower thresholds.
在当前的人脸识别和监视领域,在未受控的环境中准确识别脸部至关重要。Watchlist挑战通过专注于现实主义监视场景中的人脸检测和开箱即用识别来解决这一关键需求。本文全面评估了参赛算法的性能,使用增强的UCCS数据集及其新的评估协议。总共四名参赛者提交了四个面部检测和九个开箱即用的人脸识别系统。评估表明,虽然检测能力通常很强,但开箱即用识别性能差异很大,在预训练于大型数据集的模型上表现优异。然而,开箱即用场景需要进一步改进,尤其是在更高的真阳性识别率上,即较低的阈值。
https://arxiv.org/abs/2409.07220
During the COVID-19 pandemic, face masks have become ubiquitous in our lives. Face masks can cause some face recognition models to fail since they cover significant portion of a face. In addition, removing face masks from captured images or videos can be desirable, e.g., for better social interaction and for image/video editing and enhancement purposes. Hence, we propose a generative face inpainting method to effectively recover/reconstruct the masked part of a face. Face inpainting is more challenging compared to traditional inpainting, since it requires high fidelity while maintaining the identity at the same time. Our proposed method includes a Multi-scale Channel-Spatial Attention Module (M-CSAM) to mitigate the spatial information loss and learn the inter- and intra-channel correlation. In addition, we introduce an approach enforcing the supervised signal to focus on masked regions instead of the whole image. We also synthesize our own Masked-Faces dataset from the CelebA dataset by incorporating five different types of face masks, including surgical mask, regular mask and scarves, which also cover the neck area. The experimental results show that our proposed method outperforms different baselines in terms of structural similarity index measure, peak signal-to-noise ratio and l1 loss, while also providing better outputs qualitatively. The code will be made publicly available. Code is available at GitHub.
在新冠病毒疫情期间,口罩在我们的生活中无处不在。口罩会导致一些人脸识别模型失效,因为它们遮盖了面部的大部分。此外,从捕获的图像或视频中移除口罩可能是值得的,例如,为了更好的社交互动和图像/视频编辑和增强目的。因此,我们提出了一个生成式人脸修复方法,以有效地恢复/重构遮住的面部部分。与传统修复相比,修复口罩更具挑战性,因为它需要在保持身份的同时具有高保真度。 我们的方法包括多尺度通道空间关注模块(M-CSAM),以减轻空间信息损失并学习跨通道相关性。此外,我们还引入了一种使监督信号集中于遮住区域的策略。我们还通过将五种不同类型的人脸口罩(包括手术口罩、普通口罩和围巾)集成到CelebA数据集中,合成了自己的人脸口罩数据集。 实验结果表明,与各种基线相比,我们的方法在结构相似性指数测量、峰值信号-噪声比和L1损失方面表现优异,同时提供更好的定性输出。代码将在GitHub上公开发布。代码可用在GitHub上。
https://arxiv.org/abs/2409.06845
Very low-resolution face recognition is challenging due to the serious loss of informative facial details in resolution degradation. In this paper, we propose a generative-discriminative representation distillation approach that combines generative representation with cross-resolution aligned knowledge distillation. This approach facilitates very low-resolution face recognition by jointly distilling generative and discriminative models via two distillation modules. Firstly, the generative representation distillation takes the encoder of a diffusion model pretrained for face super-resolution as the generative teacher to supervise the learning of the student backbone via feature regression, and then freezes the student backbone. After that, the discriminative representation distillation further considers a pretrained face recognizer as the discriminative teacher to supervise the learning of the student head via cross-resolution relational contrastive distillation. In this way, the general backbone representation can be transformed into discriminative head representation, leading to a robust and discriminative student model for very low-resolution face recognition. Our approach improves the recovery of the missing details in very low-resolution faces and achieves better knowledge transfer. Extensive experiments on face datasets demonstrate that our approach enhances the recognition accuracy of very low-resolution faces, showcasing its effectiveness and adaptability.
由于在分辨率降低过程中丢失了有用的面部细节,低分辨率面部识别具有挑战性。在本文中,我们提出了一种结合生成表示和跨分辨率平滑知识蒸馏的生成-判别表示蒸馏方法。通过两个蒸馏模块共同蒸馏生成和判别模型,这种方法有助于实现低分辨率面部识别。首先,生成表示蒸馏将预训练的扩散模型编码器作为生成教师,通过特征回归指导学生骨架的学习,然后将学生骨架冻结。接着,判别表示蒸馏将预训练的人脸识别器作为判别教师,通过跨分辨率关系对比蒸馏指导学生头的学习。这样,总体骨架表示可以转换为判别头部表示,从而实现对低分辨率面部识别的稳健和判别能力。在面部数据集上进行的大量实验证明,我们的方法提高了低分辨率面部识别的识别准确性,展示了其有效性和可适应性。
https://arxiv.org/abs/2409.06371