For privacy and security concerns, the need to erase unwanted information from pre-trained vision models is becoming evident nowadays. In real-world scenarios, erasure requests originate at any time from both users and model owners, and these requests usually form a sequence. Therefore, under such a setting, selective information is expected to be continuously removed from a pre-trained model while maintaining the rest. We define this problem as continual forgetting and identify three key challenges. (i) For unwanted knowledge, efficient and effective deleting is crucial. (ii) For remaining knowledge, the impact brought by the forgetting procedure should be minimal. (iii) In real-world scenarios, the training samples may be scarce or partially missing during the process of forgetting. To address them, we first propose Group Sparse LoRA (GS-LoRA). Specifically, towards (i), we introduce LoRA modules to fine-tune the FFN layers in Transformer blocks for each forgetting task independently, and towards (ii), a simple group sparse regularization is adopted, enabling automatic selection of specific LoRA groups and zeroing out the others. To further extend GS-LoRA to more practical scenarios, we incorporate prototype information as additional supervision and introduce a more practical approach, GS-LoRA++. For each forgotten class, we move the logits away from its original prototype. For the remaining classes, we pull the logits closer to their respective prototypes. We conduct extensive experiments on face recognition, object detection and image classification and demonstrate that our method manages to forget specific classes with minimal impact on other classes. Codes have been released on this https URL.
出于隐私和安全方面的考虑,从预训练的视觉模型中删除不需要的信息的需求变得越来越明显。在现实场景中,用户和模型所有者随时都可能提出擦除请求,并且这些请求通常形成一个序列。因此,在这种设置下,期望能够持续地从预训练模型中移除特定信息的同时保持其余部分不受影响。我们将这个问题定义为连续遗忘问题,并识别出三个关键挑战。(i)对于不需要的知识,高效的删除方法至关重要。(ii)对于保留下来的知识,遗忘过程带来的负面影响应该最小化。(iii)在现实场景中,在遗忘过程中可用的训练样本可能非常有限或不完整。 为了应对这些挑战,我们首先提出了组稀疏LoRA(GS-LoRA)。具体来说,针对(i),我们引入了用于独立微调Transformer块中的FFN层的LoRA模块,并且对于(ii),采用了简单的组稀疏正则化方法,从而能够自动选择特定的LoRA组并将其他部分置零。为了将GS-LoRA进一步扩展到更多实际场景中使用,我们将原型信息作为额外监督引入,并提出了一种更实用的方法——GS-LoRA++。对于每个被遗忘的类别,我们将其logits远离其原始原型;而对于剩余的类别,则吸引它们各自的原型。我们在人脸识别、目标检测和图像分类上进行了广泛的实验,证明我们的方法能够以最小影响从特定类中进行遗忘操作。 代码已经在以下网址发布:[此链接处应填写实际提供的GitHub或相关代码存储库URL]。
https://arxiv.org/abs/2501.09705
Face recognition technology has dramatically transformed the landscape of security, surveillance, and authentication systems, offering a user-friendly and non-invasive biometric solution. However, despite its significant advantages, face recognition systems face increasing threats from physical and digital spoofing attacks. Current research typically treats face recognition and attack detection as distinct classification challenges. This approach necessitates the implementation of separate models for each task, leading to considerable computational complexity, particularly on devices with limited resources. Such inefficiencies can stifle scalability and hinder performance. In response to these challenges, this paper introduces an innovative unified model designed for face recognition and detection of physical and digital attacks. By leveraging the advanced Swin Transformer backbone and incorporating HiLo attention in a convolutional neural network framework, we address unified face recognition and spoof attack detection more effectively. Moreover, we introduce augmentation techniques that replicate the traits of physical and digital spoofing cues, significantly enhancing our model robustness. Through comprehensive experimental evaluation across various datasets, we showcase the effectiveness of our model in unified face recognition and spoof detection. Additionally, we confirm its resilience against unseen physical and digital spoofing attacks, underscoring its potential for real-world applications.
面部识别技术已显著改变了安全、监控和认证系统的格局,提供了一种用户友好且非侵入性的生物特征解决方案。然而,尽管其具有明显的优势,但面部识别系统面临着来自物理和数字伪造攻击的日益增加的威胁。目前的研究通常将面部识别与攻击检测视为两个独立的分类挑战。这种方法需要为每个任务实施单独的模型,导致计算复杂性大幅增加,尤其是在资源有限的设备上。这种低效会限制可扩展性并阻碍性能。 为了应对这些挑战,本文介绍了一种创新的一体化模型,用于面部识别和物理及数字攻击检测。通过利用先进的Swin Transformer骨干网,并在卷积神经网络框架中融入HiLo注意力机制,我们更有效地解决了统一的面部识别和伪造攻击检测问题。此外,我们引入了增强技术来复制物理和数字伪造线索的特点,大大增强了模型的鲁棒性。 通过跨多种数据集进行全面实验评估,我们展示了我们的模型在统一面部识别和伪造检测方面的有效性。另外,我们也确认了该模型对未见过的物理及数字伪造攻击具有抗御能力,突显其在实际应用中的潜力。
https://arxiv.org/abs/2501.09635
Facial brightness is a key image quality factor impacting face recognition accuracy differentials across demographic groups. In this work, we aim to decrease the accuracy gap between the similarity score distributions for Caucasian and African American female mated image pairs, as measured by d' between distributions. To balance brightness across demographic groups, we conduct three experiments, interpreting brightness in the face skin region either as median pixel value or as the distribution of pixel values. Balancing based on median brightness alone yields up to a 46.8% decrease in d', while balancing based on brightness distribution yields up to a 57.6% decrease. In all three cases, the similarity scores of the individual distributions improve, with mean scores maximally improving 5.9% for Caucasian females and 3.7% for African American females.
面部亮度是影响不同人口群体间人脸识别准确率差异的关键图像质量因素。在这项工作中,我们的目标是减少白人和非裔美国女性人脸配对图像相似度得分分布之间的准确性差距,该差距通过d'(两种分布之间区分性的衡量标准)来测量。 为了在各个人口群体中平衡亮度,我们进行了三项实验,分别将面部皮肤区域的亮度解释为像素值的中位数或像素值的分布情况。仅基于中位数亮度进行调整可以最多减少46.8%的d',而基于亮度分布情况进行调整则可最多减少57.6%的d'。 在所有三种情况下,个别分布的相似度得分都有所提高,其中白人女性的平均分数最大增幅为5.9%,非裔美国女性的最大增幅为3.7%。
https://arxiv.org/abs/2501.08910
In this work, we propose a novel pipeline for face recognition and out-of-distribution (OOD) detection using short-range FMCW radar. The proposed system utilizes Range-Doppler and micro Range-Doppler Images. The architecture features a primary path (PP) responsible for the classification of in-distribution (ID) faces, complemented by intermediate paths (IPs) dedicated to OOD detection. The network is trained in two stages: first, the PP is trained using triplet loss to optimize ID face classification. In the second stage, the PP is frozen, and the IPs-comprising simple linear autoencoder networks-are trained specifically for OOD detection. Using our dataset generated with a 60 GHz FMCW radar, our method achieves an ID classification accuracy of 99.30% and an OOD detection AUROC of 96.91%.
在这项工作中,我们提出了一种使用短程FMCW雷达进行面部识别和分布外(OOD)检测的新颖管道。所提出的系统利用了范围-多普勒和微范围-多普勒图像。该架构包括一条主要路径(PP),负责对在分布内(ID)的面孔进行分类,并辅以专门用于OOD检测的中间路径(IPs)。网络训练分为两个阶段:首先,使用三元损失函数训练PP以优化ID面部分类;其次,在冻结PP之后,由简单线性自编码网络组成的IPs被特别针对OOD检测进行训练。利用我们用60 GHz FMCW雷达生成的数据集,我们的方法在ID分类上的准确率为99.30%,而在OOD检测的AUROC上达到了96.91%。
https://arxiv.org/abs/2501.08440
A face image is a mandatory part of ID and travel documents. Obtaining high-quality face images when issuing such documents is crucial for both human examiners and automated face recognition systems. In several international standards, face image quality requirements are intricate and defined in detail. Identifying and understanding non-compliance or defects in the submitted face images is crucial for both issuing authorities and applicants. In this work, we introduce FaceOracle, an LLM-powered AI assistant that helps its users analyze a face image in a natural conversational manner using standard compliant algorithms. Leveraging the power of LLMs, users can get explanations of various face image quality concepts as well as interpret the outcome of face image quality assessment (FIQA) algorithms. We implement a proof-of-concept that demonstrates how experts at an issuing authority could integrate FaceOracle into their workflow to analyze, understand, and communicate their decisions more efficiently, resulting in enhanced productivity.
人脸图像在身份证件和旅行证件中是必不可少的部分。在发放此类证件时获取高质量的人脸图像是对于人工检查员及自动化面部识别系统而言都至关重要的环节。在一些国际标准中,对脸部图像质量的要求复杂且详细规定。识别并理解提交的面部图片中的不符合或缺陷,无论是对颁发机构还是申请人来说都是关键的。 在这项工作中,我们介绍了FaceOracle,这是一个基于大语言模型(LLM)的人工智能助手,它能够以符合标准的方式帮助用户通过自然对话分析人脸图像。利用LLMs的力量,用户可以获得关于各种面部图象质量概念的解释,并理解面部图象质量评估(FIQA)算法的结果。 我们实施了一个概念验证,演示了颁发机构中的专家如何将FaceOracle整合到他们的工作流程中,从而更高效地进行分析、理解和沟通他们的决策,最终提高生产力。
https://arxiv.org/abs/2501.07202
Acquiring face images of sufficiently high quality is important for online ID and travel document issuance applications using face recognition systems (FRS). Low-quality, manipulated (intentionally or unintentionally), or distorted images degrade the FRS performance and facilitate documents' misuse. Securing quality for enrolment images, especially in the unsupervised self-enrolment scenario via a smartphone, becomes important to assure FRS performance. In this work, we focus on the less studied area of radial distortion (a.k.a., the fish-eye effect) in face images and its impact on FRS performance. We introduce an effective radial distortion detection model that can detect and flag radial distortion in the enrolment scenario. We formalize the detection model as a face image quality assessment (FIQA) algorithm and provide a careful inspection of the effect of radial distortion on FRS performance. Evaluation results show excellent detection results for the proposed models, and the study on the impact on FRS uncovers valuable insights into how to best use these models in operational systems.
获取高质量的面部图像对于使用人脸识别系统(FRS)进行在线身份验证和旅行证件发放应用非常重要。低质量、被篡改(有意或无意地)或变形的图像会降低FRS性能,并增加文档滥用的风险。确保注册图像的质量,特别是在通过智能手机进行无监督自我注册场景中尤为重要,以保证FRS的性能。在这项工作中,我们关注较少研究的面部图像径向畸变(又称鱼眼效应)领域及其对FRS性能的影响。我们介绍了一种有效的径向畸变检测模型,可以检测并在注册过程中标记径向畸变。我们将检测模型正式定义为一种面部图像质量评估(FIQA)算法,并详细检查了径向畸变对FRS性能的影响。评价结果显示提出的模型具有出色的检测效果,而关于其对FRS影响的研究揭示了如何在实际系统中最佳使用这些模型的重要见解。
https://arxiv.org/abs/2501.07179
Fair operational systems are crucial in gaining and maintaining society's trust in face recognition systems (FRS). FRS start with capturing an image and assessing its quality before using it further for enrollment or verification. Fair Face Image Quality Assessment (FIQA) schemes therefore become equally important in the context of fair FRS. This work examines the sclera as a quality assessment region for obtaining a fair FIQA. The sclera region is agnostic to demographic variations and skin colour for assessing the quality of a face image. We analyze three skin tone related ISO/IEC face image quality assessment measures and assess the sclera region as an alternative area for assessing FIQ. Our analysis of the face dataset of individuals from different demographic groups representing different skin tones indicates sclera as an alternative to measure dynamic range, over- and under-exposure of face using sclera region alone. The sclera region being agnostic to skin tone, i.e., demographic factors, provides equal utility as a fair FIQA as shown by our Error-vs-Discard Characteristic (EDC) curve analysis.
公正的操作系统对于获得和维持社会对人脸识别系统的信任至关重要。人脸识别系统(FRS)从捕捉图像并评估其质量开始,然后进一步用于注册或验证。因此,在公平的人脸识别系统背景下,公平的脸部图像质量评估(FIQA)方案同样重要。这项工作探讨了巩膜作为获取公正FIQA的质量评估区域的作用。巩膜区域对于评估面部图像的质量而言,不依赖于人口统计学差异和肤色。 我们分析了三种与皮肤色调相关的ISO/IEC脸部图像质量评估措施,并将巩膜区域视为一种替代的评估FIQ(公平的脸部图像质量)区域。我们对不同人种群体代表的不同肤色的面部数据集进行分析后发现,仅通过使用巩膜区域即可测量动态范围、过度曝光和不足曝光情况。由于巩膜区域不受皮肤色调的影响,即不受人口统计学因素影响,因此其作为公正FIQA提供的效用与我们的误差-丢弃特性(EDC)曲线分析结果一致。
https://arxiv.org/abs/2501.07158
Despite the considerable performance improvements of face recognition algorithms in recent years, the same scientific advances responsible for this progress can also be used to create efficient ways to attack them, posing a threat to their secure deployment. Morphing attack detection (MAD) systems aim to detect a specific type of threat, morphing attacks, at an early stage, preventing them from being considered for verification in critical processes. Foundation models (FM) learn from extensive amounts of unlabeled data, achieving remarkable zero-shot generalization to unseen domains. Although this generalization capacity might be weak when dealing with domain-specific downstream tasks such as MAD, FMs can easily adapt to these settings while retaining the built-in knowledge acquired during pre-training. In this work, we recognize the potential of FMs to perform well in the MAD task when properly adapted to its specificities. To this end, we adapt FM CLIP architectures with LoRA weights while simultaneously training a classification header. The proposed framework, MADation surpasses our alternative FM and transformer-based frameworks and constitutes the first adaption of FMs to the MAD task. MADation presents competitive results with current MAD solutions in the literature and even surpasses them in several evaluation scenarios. To encourage reproducibility and facilitate further research in MAD, we publicly release the implementation of MADation at https: //github.com/gurayozgur/MADation
尽管近年来面部识别算法的性能有了显著提升,但推动这些进步的科学成果同样可以用于创建高效的方法来攻击它们,这对它们的安全部署构成了威胁。变形攻击检测(MAD)系统旨在早期发现特定类型的威胁——即变形攻击,并防止其在关键流程中被考虑用于验证。 基础模型(FM)通过从大量的未标记数据中学习,实现了对未见过领域的卓越零样本泛化能力。尽管这种泛化能力可能在处理如MAD这样的领域特异性下游任务时较弱,但FMs能够轻松适应这些场景,并保留了预训练期间获得的内置知识。 在这项工作中,我们认识到当适当地调整以满足其特定需求时,基础模型(FM)有能力在MAD任务中表现出色。为此,我们在微调CLIP架构权重的同时,同时训练了一个分类头。提出的框架MADation超越了其他基于FM和Transformer的方法,并且是首个将基础模型应用于MAD任务的适应性改进。 MADation在文献中的当前解决方案所提出的情景下展现出了竞争性的结果,在某些评估场景中甚至超过了这些解决方案。为了促进重现性和进一步的研究,我们公开发布了MADation的实现代码:https://github.com/gurayozgur/MADation
https://arxiv.org/abs/2501.03800
The assessment of face image quality is crucial to ensure reliable face recognition. In order to provide data subjects and operators with explainable and actionable feedback regarding captured face images, relevant quality components have to be measured. Quality components that are known to negatively impact the utility of face images include JPEG and JPEG 2000 compression artefacts, among others. Compression can result in a loss of important image details which may impair the recognition performance. In this work, deep neural networks are trained to detect the compression artefacts in a face images. For this purpose, artefact-free facial images are compressed with the JPEG and JPEG 2000 compression algorithms. Subsequently, the PSNR and SSIM metrics are employed to obtain training labels based on which neural networks are trained using a single network to detect JPEG and JPEG 2000 artefacts, respectively. The evaluation of the proposed method shows promising results: in terms of detection accuracy, error rates of 2-3% are obtained for utilizing PSNR labels during training. In addition, we show that error rates of different open-source and commercial face recognition systems can be significantly reduced by discarding face images exhibiting severe compression artefacts. To minimize resource consumption, EfficientNetV2 serves as basis for the presented algorithm, which is available as part of the OFIQ software.
面部图像质量的评估对于确保可靠的面部识别至关重要。为了向数据主体和操作者提供关于捕获的面部图像可解释且具有行动意义的反馈,必须测量相关质量组件。已知会对面部图像实用性产生负面影响的质量因素包括JPEG和JPEG 2000压缩伪影等。 压缩可能导致丢失重要图像细节,从而影响识别性能。在本工作中,通过训练深度神经网络来检测面部图像中的压缩伪影。为此,使用无伪影的面部图像并用JPEG和JPEG 2000压缩算法进行压缩处理。随后采用PSNR(峰值信噪比)和SSIM(结构相似性指数度量)指标来获取基于这些质量标签的数据,并利用单一网络分别训练以检测JPEG和JPEG 2000伪影。 所提出方法的评估显示出了有希望的结果:使用PSNR标签进行训练时,其识别准确率误差仅为2-3%。此外,我们还展示了通过排除严重压缩伪影的面部图像可以显著降低不同开源及商业面部识别系统的错误率。为了减少资源消耗,EfficientNetV2架构作为所展示算法的基础,并且该软件已作为OFIQ的一部分提供。
https://arxiv.org/abs/2501.03619
Although face recognition systems have seen a massive performance enhancement in recent years, they are still targeted by threats such as presentation attacks, leading to the need for generalizable presentation attack detection (PAD) algorithms. Current PAD solutions suffer from two main problems: low generalization to unknown cenarios and large training data requirements. Foundation models (FM) are pre-trained on extensive datasets, achieving remarkable results when generalizing to unseen domains and allowing for efficient task-specific adaption even when little training data are available. In this work, we recognize the potential of FMs to address common PAD problems and tackle the PAD task with an adapted FM for the first time. The FM under consideration is adapted with LoRA weights while simultaneously training a classification header. The resultant architecture, FoundPAD, is highly generalizable to unseen domains, achieving competitive results in several settings under different data availability scenarios and even when using synthetic training data. To encourage reproducibility and facilitate further research in PAD, we publicly release the implementation of FoundPAD at this https URL .
尽管近年来面部识别系统在性能上有了显著的提升,但它们仍然面临着诸如呈现攻击(presentation attacks)等威胁,这促使了通用呈现攻击检测(Presentation Attack Detection,PAD)算法的需求。当前的PAD解决方案主要面临两大问题:对未知场景的泛化能力差以及需要大量的训练数据。预训练模型(Foundation Models, FM)在广泛的大型数据集上进行了预先训练,在迁移到未见领域时能够取得出色的结果,并且即使可用的训练数据较少,也能实现高效的特定任务适应。在这项工作中,我们认识到FMs解决常见PAD问题的巨大潜力,并首次使用经过调整的FM来应对PAD任务。我们在同时训练分类头部的情况下,通过LoRA权重对所考虑的预训练模型进行了微调。由此产生的架构FoundPAD具有高度的泛化能力,能够在不同数据可用性场景下的多个设置中取得竞争性的结果,甚至在使用合成训练数据时也是如此。为了鼓励可重复性和促进进一步的研究,在这个[URL](https://this https URL "请将这里的URL替换为实际链接")上我们公开发布了FoundPAD的实现。
https://arxiv.org/abs/2501.02892
Generalized age feature extraction is crucial for age-related facial analysis tasks, such as age estimation and age-invariant face recognition (AIFR). Despite the recent successes of models in homogeneous-dataset experiments, their performance drops significantly in cross-dataset evaluations. Most of these models fail to extract generalized age features as they only attempt to map extracted features with training age labels directly without explicitly modeling the natural progression of aging. In this paper, we propose Order-Enhanced Contrastive Learning (OrdCon), which aims to extract generalized age features to minimize the domain gap across different datasets and scenarios. OrdCon aligns the direction vector of two features with either the natural aging direction or its reverse to effectively model the aging process. The method also leverages metric learning which is incorporated with a novel soft proxy matching loss to ensure that features are positioned around the center of each age cluster with minimum intra-class variance. We demonstrate that our proposed method achieves comparable results to state-of-the-art methods on various benchmark datasets in homogeneous-dataset evaluations for both age estimation and AIFR. In cross-dataset experiments, our method reduces the mean absolute error by about 1.38 in average for age estimation task and boosts the average accuracy for AIFR by 1.87%.
通用年龄特征提取对于与年龄相关的面部分析任务(如年龄估计和跨年龄不变的人脸识别)至关重要。尽管最近在同质数据集实验中,模型取得了显著成功,但其性能在跨数据集评估时大幅下降。大多数这些模型未能提取到泛化的年龄特征,因为它们仅尝试将提取的特征直接映射到训练年龄标签上,而没有显式地建模自然老化过程。 在这篇论文中,我们提出了增强对比学习(Order-Enhanced Contrastive Learning, OrdCon),旨在通过最小化不同数据集和场景间的领域差距来提取泛化的年龄特征。OrdCon 使两个特征的方向向量与自然衰老方向或其反方向对齐,有效地建模了老化过程。此外,该方法还利用度量学习,并结合一种新颖的软代理匹配损失,确保每个年龄段内的特征被定位在各年龄簇中心并减少类内方差。 我们展示了我们的方法在同质数据集评估(包括年龄估计和跨年龄不变的人脸识别)的各种基准数据集中,取得了与最新方法相当的结果。在跨数据集实验中,对于年龄估计任务,我们的方法将平均绝对误差减少了约1.38;而对于跨年龄不变的人脸识别任务,平均准确率提高了1.87%。
https://arxiv.org/abs/2501.01760
Face recognition has witnessed remarkable advancements in recent years, thanks to the development of deep learning this http URL, an off-the-shelf face recognition model as a commercial service could be stolen by model stealing attacks, posing great threats to the rights of the model this http URL fingerprinting, as a model stealing detection method, aims to verify whether a suspect model is stolen from the victim model, gaining more and more attention this http URL methods always utilize transferable adversarial examples as the model fingerprint, but this method is known to be sensitive to adversarial defense and transfer learning this http URL address this issue, we consider the pairwise relationship between samples instead and propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC).Specifically, we present SAC-JC that selects JPEG compressed samples as model inputs and calculates the correlation matrix among their model this http URL results validate that SAC successfully defends against various model stealing attacks in deep face recognition, encompassing face verification and face emotion recognition, exhibiting the highest performance in terms of AUC, p-value and F1 this http URL, we extend our evaluation of SAC-JC to object recognition datasets including Tiny-ImageNet and CIFAR10, which also demonstrates the superior performance of SAC-JC to previous this http URL code will be available at \url{this https URL}.
近年来,面部识别技术得益于深度学习的发展取得了显著进步。然而,现成的面部识别模型作为商业服务可能遭到模型窃取攻击,这对模型所有者的权利构成了重大威胁。作为一种检测模型窃取的方法,指纹识别旨在验证嫌疑模型是否从受害模型中盗版而来,并越来越受到关注。 传统方法通常使用可转移的对抗样本作为模型指纹,但这种方法已知对对抗防御和迁移学习敏感。为解决这一问题,我们考虑了样本之间的成对关系,并提出了一种基于样本相关性(SAC)的新颖且简单的模型窃取检测方法。 具体而言,我们提出了SAC-JC,它选择了JPEG压缩后的样本作为模型输入,并计算这些样本间的相关矩阵。实验结果显示,SAC在深度面部识别中的各种模型窃取攻击中成功防御,包括面部验证和面部情感识别,在AUC、p值和F1分数方面表现出最佳性能。 此外,我们将SAC-JC的评估扩展到了Tiny-ImageNet和CIFAR10等对象识别数据集上,这也展示了SAC-JC优于先前方法的优越性。相关代码将在\[此URL\]提供。
https://arxiv.org/abs/2412.20768
With the rapid growth usage of face recognition in people's daily life, face anti-spoofing becomes increasingly important to avoid malicious attacks. Recent face anti-spoofing models can reach a high classification accuracy on multiple datasets but these models can only tell people ``this face is fake'' while lacking the explanation to answer ``why it is fake''. Such a system undermines trustworthiness and causes user confusion, as it denies their requests without providing any explanations. In this paper, we incorporate XAI into face anti-spoofing and propose a new problem termed X-FAS (eXplainable Face Anti-Spoofing) empowering face anti-spoofing models to provide an explanation. We propose SPED (SPoofing Evidence Discovery), an X-FAS method which can discover spoof concepts and provide reliable explanations on the basis of discovered concepts. To evaluate the quality of X-FAS methods, we propose an X-FAS benchmark with annotated spoofing evidence by experts. We analyze SPED explanations on face anti-spoofing dataset and compare SPED quantitatively and qualitatively with previous XAI methods on proposed X-FAS benchmark. Experimental results demonstrate SPED's ability to generate reliable explanations.
随着人脸识别技术在日常生活中的快速普及,面部防伪变得越来越重要,以防止恶意攻击。最近的面部防伪模型可以在多个数据集上达到较高的分类准确性,但这些模型只能告诉人们“这张脸是伪造的”,而无法解释为什么它是假的。这样的系统缺乏可信度并导致用户困惑,因为它拒绝了请求却没有提供任何解释。在本文中,我们将XAI(可解释的人工智能)融入面部防伪,并提出一个新的问题——X-FAS(可解释的面部防伪),使面部防伪模型能够提供解释。我们提出了SPED(伪造证据发现),一种X-FAS方法,可以发现伪造概念并基于这些发现的概念提供可靠的解释。为了评估X-FAS方法的质量,我们提出了一项由专家标注了伪造证据的X-FAS基准测试。我们在面部防伪数据集上分析了SPED的解释,并在提出的X-FAS基准测试中对SPED进行了定量和定性的比较。实验结果表明了SPED生成可靠解释的能力。
https://arxiv.org/abs/2412.17541
While face recognition (FR) models have brought remarkable convenience in face verification and identification, they also pose substantial privacy risks to the public. Existing facial privacy protection schemes usually adopt adversarial examples to disrupt face verification of FR models. However, these schemes often suffer from weak transferability against black-box FR models and permanently damage the identifiable information that cannot fulfill the requirements of authorized operations such as forensics and authentication. To address these limitations, we propose ErasableMask, a robust and erasable privacy protection scheme against black-box FR models. Specifically, via rethinking the inherent relationship between surrogate FR models, ErasableMask introduces a novel meta-auxiliary attack, which boosts black-box transferability by learning more general features in a stable and balancing optimization strategy. It also offers a perturbation erasion mechanism that supports the erasion of semantic perturbations in protected face without degrading image quality. To further improve performance, ErasableMask employs a curriculum learning strategy to mitigate optimization conflicts between adversarial attack and perturbation erasion. Extensive experiments on the CelebA-HQ and FFHQ datasets demonstrate that ErasableMask achieves the state-of-the-art performance in transferability, achieving over 72% confidence on average in commercial FR systems. Moreover, ErasableMask also exhibits outstanding perturbation erasion performance, achieving over 90% erasion success rate.
尽管人脸识别(FR)模型在面部验证和识别方面带来了显著的便利,但它们也给公众带来了重大的隐私风险。现有的面部隐私保护方案通常采用对抗样本来干扰FR模型的面部验证过程。然而,这些方案往往在面对黑盒FR模型时转移能力较弱,并且永久性地破坏了可识别信息,无法满足取证和认证等授权操作的要求。为了解决这些问题,我们提出了一种针对黑盒FR模型的强大而可擦除的隐私保护方案——ErasableMask。具体来说,通过重新思考代理FR模型之间的内在关系,ErasableMask引入了一种新颖的元辅助攻击方法,该方法通过在稳定且平衡的优化策略中学习更通用的特征来提升黑盒转移能力。它还提供了一个扰动擦除机制,支持在不降低图像质量的情况下擦除保护面部中的语义扰动。为进一步提高性能,ErasableMask采用了一种课程学习策略以缓解对抗攻击和扰动擦除之间的优化冲突。在CelebA-HQ和FFHQ数据集上的大量实验表明,ErasableMask在转移能力方面达到了最先进的水平,在商用FR系统中平均实现了超过72%的置信度。此外,ErasableMask还表现出出色的扰动擦除性能,其擦除成功率超过了90%。
https://arxiv.org/abs/2412.17038
Diffusion models represent the state-of-the-art in generative modeling. Due to their high training costs, many works leverage pre-trained diffusion models' powerful representations for downstream tasks, such as face super-resolution (FSR), through fine-tuning or prior-based methods. However, relying solely on priors without supervised training makes it challenging to meet the pixel-level accuracy requirements of discrimination task. Although prior-based methods can achieve high fidelity and high-quality results, ensuring consistency remains a significant challenge. In this paper, we propose a masking strategy with strong and weak constraints and iterative refinement for real-world FSR, termed Diffusion Prior Interpolation (DPI). We introduce conditions and constraints on consistency by masking different sampling stages based on the structural characteristics of the face. Furthermore, we propose a condition Corrector (CRT) to establish a reciprocal posterior sampling process, enhancing FSR performance by mutual refinement of conditions and samples. DPI can balance consistency and diversity and can be seamlessly integrated into pre-trained models. In extensive experiments conducted on synthetic and real datasets, along with consistency validation in face recognition, DPI demonstrates superiority over SOTA FSR methods. The code is available at \url{this https URL}.
扩散模型代表了生成建模的最新技术。由于其高昂的训练成本,许多工作通过微调或基于先验的方法利用预训练扩散模型的强大表示能力来实现下游任务,如面部超分辨率(FSR)。然而,仅依赖于先验而没有监督训练使得很难满足判别任务所需的像素级准确性要求。尽管基于先验的方法可以达到高保真度和高质量的结果,但确保一致性仍然是一个重大挑战。在本文中,我们提出了一种具有强弱约束条件的掩码策略以及迭代细化方法用于现实世界的面部超分辨率(FSR),称为扩散模型先验插值(DPI)。根据人脸的结构特征,在不同采样阶段通过掩码引入一致性的条件和限制。此外,我们还提出了一个条件校正器(CRT)以建立互为后验的采样过程,通过条件与样本之间的相互细化来提升FSR性能。DPI可以平衡一致性与多样性,并能够无缝集成到预训练模型中。在合成数据集和真实数据集上的广泛实验以及面部识别的一致性验证表明,DPI优于现有的FSR方法。代码可在\url{this https URL}获得。
https://arxiv.org/abs/2412.16552
Face recognition has made remarkable strides, driven by the expanding scale of datasets, advancements in various backbone and discriminative losses. However, face recognition performance is heavily affected by the label noise, especially closed-set noise. While numerous studies have focused on handling label noise, addressing closed-set noise still poses challenges. This paper identifies this challenge as training isn't robust to noise at the early-stage training, and necessitating an appropriate learning strategy for samples with low confidence, which are often misclassified as closed-set noise in later training phases. To address these issues, we propose a new framework to stabilize the training at early stages and split the samples into clean, ambiguous and noisy groups which are devised with separate training strategies. Initially, we employ generated auxiliary closed-set noisy samples to enable the model to identify noisy data at the early stages of training. Subsequently, we introduce how samples are split into clean, ambiguous and noisy groups by their similarity to the positive and nearest negative centers. Then we perform label fusion for ambiguous samples by incorporating accumulated model predictions. Finally, we apply label smoothing within the closed set, adjusting the label to a point between the nearest negative class and the initially assigned label. Extensive experiments validate the effectiveness of our method on mainstream face datasets, achieving state-of-the-art results. The code will be released upon acceptance.
面部识别已经取得了显著的进展,这得益于数据集规模的扩大以及各种骨干网络和判别损失的进步。然而,标签噪声,特别是闭集噪声,对面部识别性能的影响很大。虽然许多研究集中在处理标签噪声上,但解决闭集噪声仍然存在挑战。本文指出了这一挑战,即在早期训练阶段,训练对噪声不够鲁棒,并且需要为低置信度样本制定合适的训练策略,这些样本通常会在后期训练中被错误分类为闭集噪声。为了应对这些问题,我们提出了一种新的框架来稳定早期训练阶段,并将样本分为干净、模糊和嘈杂三组,每组采用不同的训练策略。最初,我们使用生成的辅助闭集噪声样本使模型能够在早期训练阶段识别出噪声数据。随后,我们将通过其与正类中心和最近负类中心相似性的对比,将样本分成干净、模糊和嘈杂三组。然后,我们通过对积累的模型预测进行标签融合来处理模糊样本。最后,在闭集中应用标签平滑,调整标签至最近的负类和初始分配标签之间的某个点。广泛的实验验证了我们的方法在主流面部数据集上的有效性,并达到了最先进的结果。代码将在接受后发布。
https://arxiv.org/abs/2412.12031
Face alignment is a crucial step in preparing face images for feature extraction in facial analysis tasks. For applications such as face recognition, facial expression recognition, and facial attribute classification, alignment is widely utilized during both training and inference to standardize the positions of key landmarks in the face. It is well known that the application and method of face alignment significantly affect the performance of facial analysis models. However, the impact of alignment on face image quality has not been thoroughly investigated. Current FIQA studies often assume alignment as a prerequisite but do not explicitly evaluate how alignment affects quality metrics, especially with the advent of modern deep learning-based detectors that integrate detection and landmark localization. To address this need, our study examines the impact of face alignment on face image quality scores. We conducted experiments on the LFW, IJB-B, and SCFace datasets, employing MTCNN and RetinaFace models for face detection and alignment. To evaluate face image quality, we utilized several assessment methods, including SER-FIQ, FaceQAN, DifFIQA, and SDD-FIQA. Our analysis included examining quality score distributions for the LFW and IJB-B datasets and analyzing average quality scores at varying distances in the SCFace dataset. Our findings reveal that face image quality assessment methods are sensitive to alignment. Moreover, this sensitivity increases under challenging real-life conditions, highlighting the importance of evaluating alignment's role in quality assessment.
面部对齐是为面部分析任务中的特征提取准备面部图像的关键步骤。对于诸如人脸识别、面部表情识别和面部属性分类等应用,对齐技术在训练和推理过程中广泛使用,以标准化面部关键点的位置。众所周知,面部对齐的应用和方法显著影响面部分析模型的性能。然而,对齐对面部图像质量的影响尚未被彻底研究。当前的FIQA(面部图像质量评估)研究通常假设对齐是先决条件,但没有明确评估对齐如何影响质量指标,尤其是在现代基于深度学习的检测器集成了检测和关键点定位技术之后。为解决这一需求,我们的研究考察了面部对齐对面部图像质量评分的影响。我们在LFW、IJB-B和SCFace数据集上进行了实验,使用MTCNN和RetinaFace模型进行面部检测和对齐。为了评估面部图像的质量,我们采用了多种评估方法,包括SER-FIQ、FaceQAN、DifFIQA和SDD-FIQA。我们的分析涵盖了LFW和IJB-B数据集中质量评分的分布情况,并在SCFace数据集中分析了不同距离下的平均质量评分。研究结果表明,面部图像质量评估方法对对齐非常敏感。此外,在具有挑战性的现实条件下,这种敏感性会增加,突显了评估对齐在质量评估中作用的重要性。
https://arxiv.org/abs/2412.11779
Traditional adversarial attacks typically produce adversarial examples under norm-constrained conditions, whereas unrestricted adversarial examples are free-form with semantically meaningful perturbations. Current unrestricted adversarial impersonation attacks exhibit limited control over adversarial face attributes and often suffer from low transferability. In this paper, we propose a novel Text Controlled Attribute Attack (TCA$^2$) to generate photorealistic adversarial impersonation faces guided by natural language. Specifically, the category-level personal softmax vector is employed to precisely guide the impersonation attacks. Additionally, we propose both data and model augmentation strategies to achieve transferable attacks on unknown target models. Finally, a generative model, \textit{i.e}, Style-GAN, is utilized to synthesize impersonated faces with desired attributes. Extensive experiments on two high-resolution face recognition datasets validate that our TCA$^2$ method can generate natural text-guided adversarial impersonation faces with high transferability. We also evaluate our method on real-world face recognition systems, \textit{i.e}, Face++ and Aliyun, further demonstrating the practical potential of our approach.
传统的对抗攻击通常在范数约束条件下生成对抗样本,而无限制的对抗样本则是带有语义上有意义扰动的自由形式。当前不受限制的对抗模仿攻击对对抗人脸属性的控制有限,并且常常遭受低迁移性的问题。本文中,我们提出了一种新颖的文字控制属性攻击(TCA$^2$),该方法通过自然语言引导生成逼真的对抗模仿人脸。具体而言,类别级个人softmax向量被用来精确指导这些模仿攻击。此外,我们还提出了数据和模型增强策略来实现对未知目标模型的可迁移攻击。最后,使用生成模型,即Style-GAN,合成具有所需属性的模仿人脸。在两个高分辨率的人脸识别数据集上的大量实验验证了我们的TCA$^2$方法能够生成自然的文字引导下的对抗模仿人脸,并且具有高度的迁移性。我们还在真实世界的人脸识别系统,如Face++和阿里云上评估了我们的方法,进一步证明了我们的方法的实际应用潜力。
https://arxiv.org/abs/2412.11735
The task of privacy-preserving face recognition (PPFR) currently faces two major unsolved challenges: (1) existing methods are typically effective only on specific face recognition models and struggle to generalize to black-box face recognition models; (2) current methods employ data-driven reversible representation encoding for privacy protection, making them susceptible to adversarial learning and reconstruction of the original image. We observe that face recognition models primarily rely on local features ({e.g., face contour, skin texture, and so on) for identification. Thus, by disrupting global features while enhancing local features, we achieve effective recognition even in black-box environments. Additionally, to prevent adversarial models from learning and reversing the anonymization process, we adopt an adversarial learning-based approach with irreversible stochastic injection to ensure the stochastic nature of the anonymization. Experimental results demonstrate that our method achieves an average recognition accuracy of 94.21\% on black-box models, outperforming existing methods in both privacy protection and anti-reconstruction capabilities.
隐私保护人脸识别(PPFR)当前面临着两大未解决的挑战:(1) 现有方法通常仅在特定的人脸识别模型上有效,难以泛化到黑盒人脸识别模型;(2) 当前的方法使用数据驱动的可逆表示编码来保护隐私,使其容易受到对抗性学习和原始图像重建的影响。我们观察到,人脸识别模型主要依赖局部特征(如人脸轮廓、皮肤纹理等)进行识别。因此,通过扰乱全局特征同时增强局部特征,我们在黑盒环境中也能实现有效的识别。此外,为了防止对抗模型学习并逆转匿名化过程,我们采用基于对抗学习的方法,并引入不可逆的随机注入以确保匿名化的随机性。实验结果显示,我们的方法在黑盒模型上实现了平均94.21%的识别准确率,在隐私保护和防重建能力方面均优于现有方法。
https://arxiv.org/abs/2412.08276
Pre-training on large-scale datasets and utilizing margin-based loss functions have been highly successful in training models for high-resolution face recognition. However, these models struggle with low-resolution face datasets, in which the faces lack the facial attributes necessary for distinguishing different faces. Full fine-tuning on low-resolution datasets, a naive method for adapting the model, yields inferior performance due to catastrophic forgetting of pre-trained knowledge. Additionally the domain difference between high-resolution (HR) gallery images and low-resolution (LR) probe images in low resolution datasets leads to poor convergence for a single model to adapt to both gallery and probe after fine-tuning. To this end, we propose PETALface, a Parameter-Efficient Transfer Learning approach for low-resolution face recognition. Through PETALface, we attempt to solve both the aforementioned problems. (1) We solve catastrophic forgetting by leveraging the power of parameter efficient fine-tuning(PEFT). (2) We introduce two low-rank adaptation modules to the backbone, with weights adjusted based on the input image quality to account for the difference in quality for the gallery and probe images. To the best of our knowledge, PETALface is the first work leveraging the powers of PEFT for low resolution face recognition. Extensive experiments demonstrate that the proposed method outperforms full fine-tuning on low-resolution datasets while preserving performance on high-resolution and mixed-quality datasets, all while using only 0.48% of the parameters. Code: this https URL
大规模数据集上的预训练和使用基于边界的损失函数在高分辨率人脸识别模型的训练中取得了高度成功。然而,这些模型在低分辨率人脸数据集中表现不佳,因为这些脸缺乏必要的面部特征来区分不同的面孔。直接对低分辨率数据集进行全量微调作为适应模型的一种方法,由于灾难性遗忘而性能较差,即预训练的知识被遗忘。此外,高分辨率(HR)画廊图像和低分辨率(LR)探针图像之间的领域差异导致单一模型在微调后很难同时适应画廊和探针图像,收敛效果不佳。为此,我们提出了PETALface,这是一种用于低分辨率人脸识别的参数高效迁移学习方法。通过PETALface,我们试图解决上述两个问题。(1)我们利用参数高效的微调(PEFT)来解决灾难性遗忘的问题。(2)我们在主干模型中引入了两个低秩适应模块,权重根据输入图像质量进行调整,以应对画廊和探针图像之间质量的差异。据我们所知,PETALface是首个利用PEFT优势进行低分辨率人脸识别的工作。广泛的实验表明,提出的方法在低分辨率数据集上的性能优于全量微调,并且在高分辨率和混合质量的数据集上保持了良好的性能,同时仅使用了0.48%的参数。代码:[链接](此处为提供的URL)
https://arxiv.org/abs/2412.07771