With the comprehensive research conducted on various face analysis tasks, there is a growing interest among researchers to develop a unified approach to face perception. Existing methods mainly discuss unified representation and training, which lack task extensibility and application efficiency. To tackle this issue, we focus on the unified model structure, exploring a face generalist model. As an intuitive design, Naive Faceptor enables tasks with the same output shape and granularity to share the structural design of the standardized output head, achieving improved task extensibility. Furthermore, Faceptor is proposed to adopt a well-designed single-encoder dual-decoder architecture, allowing task-specific queries to represent new-coming semantics. This design enhances the unification of model structure while improving application efficiency in terms of storage overhead. Additionally, we introduce Layer-Attention into Faceptor, enabling the model to adaptively select features from optimal layers to perform the desired tasks. Through joint training on 13 face perception datasets, Faceptor achieves exceptional performance in facial landmark localization, face parsing, age estimation, expression recognition, binary attribute classification, and face recognition, achieving or surpassing specialized methods in most tasks. Our training framework can also be applied to auxiliary supervised learning, significantly improving performance in data-sparse tasks such as age estimation and expression recognition. The code and models will be made publicly available at this https URL.
在对各种面部分析任务进行全面的调查和研究后,越来越多的研究者对发展统一的面部感知方法产生了浓厚的兴趣。现有的方法主要讨论了统一的表示和训练,缺乏任务的扩展性和应用效率。为解决这个问题,我们关注统一的模型结构,研究了一个面部通用模型。作为一种直观的设计,Naive Faceptor使具有相同输出形状和粒度的任务可以共享标准输出头的结构设计,从而实现提高任务扩展性的目标。此外,Faceptor还提出了一个设计良好的单编码器双解码器架构,允许任务特定的查询表示新兴的语义。这种设计在提高模型结构统一的同时,提高了存储开销的应用效率。此外,我们还引入了层注意力机制到Faceptor中,使模型能够动态选择最优层中的特征来执行所需任务。通过在13个面部感知数据集上进行联合训练,Faceptor在面部关键点定位、面部解析、年龄估计、表情识别、二进制属性分类和面部识别等方面取得了惊人的性能,超越了大多数专用方法。我们的训练框架也可以应用于辅助监督学习,在数据稀疏任务(如年龄估计和表情识别)中显著提高性能。代码和模型将在这个https:// URL上公开发布。
https://arxiv.org/abs/2403.09500
Due to advancements in digital cameras, it is easy to gather multiple images (or videos) from an object under different conditions. Therefore, image-set classification has attracted more attention, and different solutions were proposed to model them. A popular way to model image sets is subspaces, which form a manifold called the Grassmann manifold. In this contribution, we extend the application of Generalized Relevance Learning Vector Quantization to deal with Grassmann manifold. The proposed model returns a set of prototype subspaces and a relevance vector. While prototypes model typical behaviours within classes, the relevance factors specify the most discriminative principal vectors (or images) for the classification task. They both provide insights into the model's decisions by highlighting influential images and pixels for predictions. Moreover, due to learning prototypes, the model complexity of the new method during inference is independent of dataset size, unlike previous works. We applied it to several recognition tasks including handwritten digit recognition, face recognition, activity recognition, and object recognition. Experiments demonstrate that it outperforms previous works with lower complexity and can successfully model the variation, such as handwritten style or lighting conditions. Moreover, the presence of relevances makes the model robust to the selection of subspaces' dimensionality.
由于数字相机的进步,从不同条件下对象上收集多张图像(或视频)变得很容易。因此,图像集分类引起了更多关注,并提出了各种解决方案来建模它们。建模图像集的一种流行方法是子空间,它们形成了一个称为Grassmann子空间的维多面。在本文中,我们将扩展泛化相关性学习向量量化对Grassmann子空间的应用。所提出的模型返回了一组原型子空间和一个相关向量。虽然原型模型了类内典型行为,但相关因素指定了对分类任务的最具判别性的主向量(或图像)。它们通过突出预测过程中具有影响力的图像和像素,为模型决策提供了洞察。此外,由于学习原型,新方法在推理过程中的模型复杂度与数据集大小无关,而不同于之前的工作。我们将它应用于多个识别任务,包括手写数字识别、人脸识别、活动识别和物体识别。实验证明,它在较低复杂度的情况下优于之前的工作,并能够成功建模变化,如手写风格或光线条件。此外,存在相关性使得模型对选择子空间维数具有鲁棒性。
https://arxiv.org/abs/2403.09183
The utilization of personal sensitive data in training face recognition (FR) models poses significant privacy concerns, as adversaries can employ model inversion attacks (MIA) to infer the original training data. Existing defense methods, such as data augmentation and differential privacy, have been employed to mitigate this issue. However, these methods often fail to strike an optimal balance between privacy and accuracy. To address this limitation, this paper introduces an adaptive hybrid masking algorithm against MIA. Specifically, face images are masked in the frequency domain using an adaptive MixUp strategy. Unlike the traditional MixUp algorithm, which is predominantly used for data augmentation, our modified approach incorporates frequency domain mixing. Previous studies have shown that increasing the number of images mixed in MixUp can enhance privacy preservation but at the expense of reduced face recognition accuracy. To overcome this trade-off, we develop an enhanced adaptive MixUp strategy based on reinforcement learning, which enables us to mix a larger number of images while maintaining satisfactory recognition accuracy. To optimize privacy protection, we propose maximizing the reward function (i.e., the loss function of the FR system) during the training of the strategy network. While the loss function of the FR network is minimized in the phase of training the FR network. The strategy network and the face recognition network can be viewed as antagonistic entities in the training process, ultimately reaching a more balanced trade-off. Experimental results demonstrate that our proposed hybrid masking scheme outperforms existing defense algorithms in terms of privacy preservation and recognition accuracy against MIA.
将个人敏感数据用于训练人脸识别(FR)模型会带来显著的隐私问题,因为攻击者可以使用模型反向攻击(MIA)推断原始训练数据。为减轻这个问题,已经使用了数据增强和差分隐私等方法。然而,这些方法往往无法在隐私和准确性之间取得最优平衡。为了解决这个问题,本文引入了一种对抗MIA的自适应混合掩码算法。具体来说,使用自适应MixUp策略对频域中的人脸图像进行遮盖。与传统的MixUp算法不同,我们修改的方法包含了频域混合。以前的研究表明,增加MixUp中混合的图像数量可以提高隐私保护,但以降低人脸识别准确性为代价。为了克服这个权衡,我们基于强化学习开发了一种增强自适应MixUp策略,使得我们可以在保持满意识别准确性的同时混合更多图像。为了优化隐私保护,我们提出在策略网络训练过程中最大化奖励函数(即FR系统的损失函数)。而FR网络在训练过程中损失函数已经最小化。策略网络和面部识别网络可以看作是在训练过程中相互对抗的实体,最终达到更平衡的权衡。实验结果表明,我们提出的混合掩码方案在对抗MIA方面优于现有的防御算法,在隐私保护和识别准确性方面都表现出色。
https://arxiv.org/abs/2403.10558
Deep learning-based face recognition continues to face challenges due to its reliance on huge datasets obtained from web crawling, which can be costly to gather and raise significant real-world privacy concerns. To address this issue, we propose VIGFace, a novel framework capable of generating synthetic facial images. Initially, we train the face recognition model using a real face dataset and create a feature space for both real and virtual IDs where virtual prototypes are orthogonal to other prototypes. Subsequently, we generate synthetic images by using the diffusion model based on the feature space. Our proposed framework provides two significant benefits. Firstly, it allows for creating virtual facial images without concerns about portrait rights, guaranteeing that the generated virtual face images are clearly differentiated from existing individuals. Secondly, it serves as an effective augmentation method by incorporating real existing images. Further experiments demonstrate the efficacy of our framework, achieving state-of-the-art results from both perspectives without any external data.
基于深度学习的面部识别由于其依赖从网页爬取获得的巨大数据集,可能会产生昂贵的成本,同时也存在显著的隐私问题。为解决这一问题,我们提出了VIGFace,一种新框架,能够生成合成面部图像。首先,我们使用真实面部数据集训练面部识别模型,并创建了真实和虚拟ID之间的特征空间。接着,我们通过基于特征空间的扩散模型生成合成图像。我们提出的框架提供了两个显著的优势。首先,它允许在不考虑肖像权的情况下创建虚拟面部图像,确保生成的虚拟面部图像与现有个体有明显区别。其次,它是一种有效的增强方法,通过融入真实现有图像来发挥作用。进一步的实验证明了我们的框架的有效性,从两个方面实现了最先进的水平,而没有任何外部数据。
https://arxiv.org/abs/2403.08277
In this work we focus on learning facial representations that can be adapted to train effective face recognition models, particularly in the absence of labels. Firstly, compared with existing labelled face datasets, a vastly larger magnitude of unlabeled faces exists in the real world. We explore the learning strategy of these unlabeled facial images through self-supervised pretraining to transfer generalized face recognition performance. Moreover, motivated by one recent finding, that is, the face saliency area is critical for face recognition, in contrast to utilizing random cropped blocks of images for constructing augmentations in pretraining, we utilize patches localized by extracted facial landmarks. This enables our method - namely LAndmark-based Facial Self-supervised learning LAFS), to learn key representation that is more critical for face recognition. We also incorporate two landmark-specific augmentations which introduce more diversity of landmark information to further regularize the learning. With learned landmark-based facial representations, we further adapt the representation for face recognition with regularization mitigating variations in landmark positions. Our method achieves significant improvement over the state-of-the-art on multiple face recognition benchmarks, especially on more challenging few-shot scenarios.
在这项工作中,我们专注于学习可以在训练有效面部识别模型时进行自适应的面部表示。特别是在没有标签的情况下。首先,与现有的带有标签的人脸数据集相比,现实世界中存在大量未标记的脸。我们通过自监督预训练来探索这些未标记面部图像的学习策略,以转移通用的面部识别性能。此外,受到最近的一个发现的影响,即脸部显著区域对于面部识别至关重要,我们使用通过提取面部特征点来定位的补丁作为自监督学习中的标签。这使得我们的方法 - 基于LAndmark的特征点自监督学习LAFS) - 可以学习更具关键性的面部表示。我们还引入了两种特定特征点的自监督增强,以进一步规范化学习。通过学习基于特征点的面部表示,我们进一步通过正则化来缓解特征点位置的变化。我们的方法在多个面部识别基准测试中都实现了显著的改进,尤其是在更具挑战性的几拍场景中。
https://arxiv.org/abs/2403.08161
Facial attribute editing using generative models can impair automated face recognition. This degradation persists even with recent identity-preserving models such as InstantID. To mitigate this issue, we propose two techniques that perform local and global attribute editing. Local editing operates on the finer details via a regularization-free method based on ControlNet conditioned on depth maps and auxiliary semantic segmentation masks. Global editing operates on coarser details via a regularization-based method guided by custom loss and regularization set. In this work, we empirically ablate twenty-six facial semantic, demographic and expression-based attributes altered using state-of-the-art generative models and evaluate them using ArcFace and AdaFace matchers on CelebA, CelebAMaskHQ and LFW datasets. Finally, we use LLaVA, a vision-language framework for attribute prediction to validate our editing techniques. Our methods outperform SoTA (BLIP, InstantID) at facial editing while retaining identity.
使用生成模型进行面部属性编辑可能会损害自动人脸识别的准确性。即使使用最新的自证身份保留模型(如InstantID)也会出现这种降级。为了减轻这个问题,我们提出了两种方法,它们都进行局部和全局属性编辑。局部编辑通过基于深度图的条件自由度为ControlNet的规范化方法来操作更细的细节。全局编辑通过基于自定义损失和正则化设置的规范化方法来操作较粗的细节。在本文中,我们通过实验对使用最先进的生成模型(如ArcFace和AdaFace)对26个面部语义、人口统计和表情属性进行编辑,并使用CelebA、CelebAMaskHQ和LFW数据集评估它们。最后,我们使用LLaVa(用于属性预测的视觉语言框架)验证我们的编辑技术。我们的方法在保留身份的同时优于SoTA(BLIP和InstantID)在面部编辑方面。
https://arxiv.org/abs/2403.08092
This paper introduces the concept of uniform classification, which employs a unified threshold to classify all samples rather than adaptive threshold classifying each individual sample. We also propose the uniform classification accuracy as a metric to measure the model's performance in uniform classification. Furthermore, begin with a naive loss, we mathematically derive a loss function suitable for the uniform classification, which is the BCE function integrated with a unified bias. We demonstrate the unified threshold could be learned via the bias. The extensive experiments on six classification datasets and three feature extraction models show that, compared to the SoftMax loss, the models trained with the BCE loss not only exhibit higher uniform classification accuracy but also higher sample-wise classification accuracy. In addition, the learned bias from BCE loss is very close to the unified threshold used in the uniform classification. The features extracted by the models trained with BCE loss not only possess uniformity but also demonstrate better intra-class compactness and inter-class distinctiveness, yielding superior performance on open-set tasks such as face recognition.
本文引入了统一分类的概念,它采用统一的阈值对所有样本进行分类,而不是对每个个体样本进行自适应阈值分类。我们还提出了统一分类准确率作为一个指标来衡量模型在统一分类方面的表现。此外,从零开始损失函数,我们通过统一的偏差计算了一个适合统一分类的损失函数,即统一偏差函数。我们证明了偏差可以用于学习统一阈值。在六个分类数据集和三个特征提取模型的广泛实验中,与SoftMax损失相比,使用BCE损失训练的模型不仅表现出更高的统一分类准确率,而且表现出更高的样本级分类准确率。此外,从BCE损失中学习到的偏差非常接近用于统一分类的统一阈值。使用BCE损失训练的模型的特征不仅具有统一性,而且表现出更好的类内压缩性和类间差异,在开放设置的任务(如面部识别)上表现出卓越的性能。
https://arxiv.org/abs/2403.07289
2D face recognition encounters challenges in unconstrained environments due to varying illumination, occlusion, and pose. Recent studies focus on RGB-D face recognition to improve robustness by incorporating depth information. However, collecting sufficient paired RGB-D training data is expensive and time-consuming, hindering wide deployment. In this work, we first construct a diverse depth dataset generated by 3D Morphable Models for depth model pre-training. Then, we propose a domain-independent pre-training framework that utilizes readily available pre-trained RGB and depth models to separately perform face recognition without needing additional paired data for retraining. To seamlessly integrate the two distinct networks and harness the complementary benefits of RGB and depth information for improved accuracy, we propose an innovative Adaptive Confidence Weighting (ACW). This mechanism is designed to learn confidence estimates for each modality to achieve modality fusion at the score level. Our method is simple and lightweight, only requiring ACW training beyond the backbone models. Experiments on multiple public RGB-D face recognition benchmarks demonstrate state-of-the-art performance surpassing previous methods based on depth estimation and feature fusion, validating the efficacy of our approach.
2D面部识别在非约束环境中面临挑战,由于光照、遮挡和姿态的不同,2D面部识别遇到了困难。最近的研究专注于使用深度信息进行鲁棒性的RGB-D面部识别,以改善性能。然而,收集足够的成对RGB-D训练数据是昂贵和耗时的,这阻碍了广泛的部署。在这项工作中,我们首先使用3D可塑模型构建了一个多样化的深度数据集,用于深度模型预训练。然后,我们提出了一个领域无关的预训练框架,利用可用的预训练的RGB和深度模型分别进行面部识别,而不需要重新训练的成对数据。为了实现平滑地将两个不同的网络集成起来,并利用RGB和深度信息的互补优势来提高准确性,我们提出了一种创新的自适应信心加权(ACW)机制。这种机制旨在学习每个模态的置信估计,以达到在分数级别进行模式融合。我们的方法简单且轻便,只需要在后台模型之外进行ACW训练。在多个公开的RGB-D面部识别基准测试上进行的实验证明,我们的方法在基于深度估计和特征融合的前沿方法之上取得了最先进的性能,验证了我们的方法的有效性。
https://arxiv.org/abs/2403.06529
The state-of-the-art face recognition systems are typically trained on a single computer, utilizing extensive image datasets collected from various number of users. However, these datasets often contain sensitive personal information that users may hesitate to disclose. To address potential privacy concerns, we explore the application of federated learning, both with and without secure aggregators, in the context of both supervised and unsupervised face recognition systems. Federated learning facilitates the training of a shared model without necessitating the sharing of individual private data, achieving this by training models on decentralized edge devices housing the data. In our proposed system, each edge device independently trains its own model, which is subsequently transmitted either to a secure aggregator or directly to the central server. To introduce diverse data without the need for data transmission, we employ generative adversarial networks to generate imposter data at the edge. Following this, the secure aggregator or central server combines these individual models to construct a global model, which is then relayed back to the edge devices. Experimental findings based on the CelebA datasets reveal that employing federated learning in both supervised and unsupervised face recognition systems offers dual benefits. Firstly, it safeguards privacy since the original data remains on the edge devices. Secondly, the experimental results demonstrate that the aggregated model yields nearly identical performance compared to the individual models, particularly when the federated model does not utilize a secure aggregator. Hence, our results shed light on the practical challenges associated with privacy-preserving face image training, particularly in terms of the balance between privacy and accuracy.
目前最先进的人脸识别系统通常是在单个计算机上训练的,利用从多个用户收集的广泛图像数据集。然而,这些数据集中通常包含用户可能不愿意透露的敏感个人信息。为了应对潜在的隐私问题,我们探讨了在监督和无监督人脸识别系统中应用联邦学习的情况。联邦学习通过在集中存储数据的分布式边缘设备上训练共享模型而不需要共享个人隐私数据,从而实现了这一目标,通过在分布式边缘设备上训练模型。在我们所提出的系统中,每个边缘设备都会独立训练自己的模型,然后将其传输给安全聚合器或直接传输到中央服务器。为了引入多样化的数据而不需要数据传输,我们使用生成对抗网络在边缘生成伪造数据。然后,安全聚合器或中央服务器将这些个人模型组合成一个全局模型,并将其传回边缘设备。基于CelebA数据集的实验结果表明,在监督和无监督的人脸识别系统中应用联邦学习可以实现双重利益。首先,这可以保护隐私,因为原始数据仍然留在边缘设备上。其次,实验结果表明,在联邦模型不使用安全聚合器的情况下,聚合模型与人脸模型的性能几乎相同,特别是在联邦模型不利用安全聚合器的情况下。因此,我们的结果阐明了在保护隐私的同时提高准确性的隐私保护面部图像训练方面所面临的具体挑战。
https://arxiv.org/abs/2403.05344
Recent years have witnessed significant advancement in face recognition (FR) techniques, with their applications widely spread in people's lives and security-sensitive areas. There is a growing need for reliable interpretations of decisions of such systems. Existing studies relying on various mechanisms have investigated the usage of saliency maps as an explanation approach, but suffer from different limitations. This paper first explores the spatial relationship between face image and its deep representation via gradient backpropagation. Then a new explanation approach FGGB has been conceived, which provides precise and insightful similarity and dissimilarity saliency maps to explain the "Accept" and "Reject" decision of an FR system. Extensive visual presentation and quantitative measurement have shown that FGGB achieves superior performance in both similarity and dissimilarity maps when compared to current state-of-the-art explainable face verification approaches.
近年来,在生物识别(FR)技术方面取得了显著的进步,这些技术广泛应用于人们的生活和安全性敏感区域。对于这些系统的决策可靠解释的需求越来越大。现有研究依赖各种机制来调查使用特征图作为解释方法,但存在不同的局限性。本文首先通过梯度反向传播探讨了脸部图像和其深度表示之间的空间关系。然后,一种新的解释方法FGGB被提出,它提供了精确和直观的相似度和差异性特征图来解释FR系统的“接受”和“拒绝”决策。广泛的视觉展示和定量测量证明,与当前最先进的可解释面部验证方法相比,FGGB在相似度和差异度地图上实现了卓越的性能。
https://arxiv.org/abs/2403.04549
This paper explores the application of large language models (LLMs), like ChatGPT, for biometric tasks. We specifically examine the capabilities of ChatGPT in performing biometric-related tasks, with an emphasis on face recognition, gender detection, and age estimation. Since biometrics are considered as sensitive information, ChatGPT avoids answering direct prompts, and thus we crafted a prompting strategy to bypass its safeguard and evaluate the capabilities for biometrics tasks. Our study reveals that ChatGPT recognizes facial identities and differentiates between two facial images with considerable accuracy. Additionally, experimental results demonstrate remarkable performance in gender detection and reasonable accuracy for the age estimation tasks. Our findings shed light on the promising potentials in the application of LLMs and foundation models for biometrics.
本文探讨了大型语言模型(LLMs)如ChatGPT在生物识别任务中的应用。我们特别研究了ChatGPT在执行生物识别相关任务中的能力,重点关注人脸识别、性别检测和年龄估计。由于生物识别信息被认为是一种敏感信息,ChatGPT避免回答直接提示,因此我们制定了一种提示策略来绕过其保护并评估ChatGPT在生物识别任务中的能力。我们的研究显示,ChatGPT能够识别人脸并区分两个人脸,准确性相当高。此外,实验结果表明,在性别检测和年龄估计任务中,ChatGPT的表现非常出色。我们的研究揭示了LLM和基础模型在生物识别领域应用的光明前景。
https://arxiv.org/abs/2403.02965
Face Anti-Spoofing (FAS) is crucial for securing face recognition systems against presentation attacks. With advancements in sensor manufacture and multi-modal learning techniques, many multi-modal FAS approaches have emerged. However, they face challenges in generalizing to unseen attacks and deployment conditions. These challenges arise from (1) modality unreliability, where some modality sensors like depth and infrared undergo significant domain shifts in varying environments, leading to the spread of unreliable information during cross-modal feature fusion, and (2) modality imbalance, where training overly relies on a dominant modality hinders the convergence of others, reducing effectiveness against attack types that are indistinguishable sorely using the dominant modality. To address modality unreliability, we propose the Uncertainty-Guided Cross-Adapter (U-Adapter) to recognize unreliably detected regions within each modality and suppress the impact of unreliable regions on other modalities. For modality imbalance, we propose a Rebalanced Modality Gradient Modulation (ReGrad) strategy to rebalance the convergence speed of all modalities by adaptively adjusting their gradients. Besides, we provide the first large-scale benchmark for evaluating multi-modal FAS performance under domain generalization scenarios. Extensive experiments demonstrate that our method outperforms state-of-the-art methods. Source code and protocols will be released on this https URL.
面部抗伪造(FAS)对于保护人脸识别系统免受展示攻击至关重要。随着传感器制造和多模态学习技术的进步,出现了许多多模态FAS方法。然而,它们在泛化到未见到的攻击和部署条件方面面临挑战。这些挑战来自于(1)模态不可靠性,一些模态传感器(如深度和红外)在不同的环境中经历显著的领域转移,导致跨模态特征融合过程中传播不可靠信息;(2)模态不平衡,训练过度依赖主导模态会阻碍其他模态的收敛,从而降低对使用主导模态无法区分的攻击类型的效果。为了应对模态不可靠性,我们提出了不确定引导跨模态适配器(U-Adapter)来识别每个模态中检测到的不可靠区域,并抑制不可靠区域对其他模态的影响。对于模态不平衡,我们提出了自适应调整梯度的平衡模态梯度 modulation(ReGrad)策略来平衡所有模态的收敛速度。此外,我们还提供了评估在领域泛化场景下多模态FAS性能的第一个大规模基准。大量实验证明,我们的方法超越了最先进的水平。源代码和协议将在此处发布。
https://arxiv.org/abs/2402.19298
In recent years, model quantization for face recognition has gained prominence. Traditionally, compressing models involved vast datasets like the 5.8 million-image MS1M dataset as well as extensive training times, raising the question of whether such data enormity is essential. This paper addresses this by introducing an efficiency-driven approach, fine-tuning the model with just up to 14,000 images, 440 times smaller than MS1M. We demonstrate that effective quantization is achievable with a smaller dataset, presenting a new paradigm. Moreover, we incorporate an evaluation-based metric loss and achieve an outstanding 96.15% accuracy on the IJB-C dataset, establishing a new state-of-the-art compressed model training for face recognition. The subsequent analysis delves into potential applications, emphasizing the transformative power of this approach. This paper advances model quantization by highlighting the efficiency and optimal results with small data and training time.
近年来,在面部识别模型的量化方面取得了突出地位。传统上,压缩模型涉及庞大的数据集,如580万张图像的MS1M数据集以及 extensive training times,这引发了这样的问题:这样的数据量是否是必要的。本文通过引入一个效率驱动的方法来解决这一问题,仅用440张图像对模型进行微调,是MS1M的1/580。我们证明了,用较小的数据集实现有效的量化是可能的,并呈现了一个新的范例。此外,我们还引入了基于评估的损失指标,在IJB-C数据集上实现了出色的96.15%准确率,为面部识别领域树立了新的最优压缩模型训练标准。后续分析深入探讨了这种方法的潜在应用,强调了这种方法的可 transformative power。本文通过突出小数据和训练时间的效率和最佳结果,进步了模型量化。
https://arxiv.org/abs/2402.18163
JPEG compression can significantly impair the performance of adversarial face examples, which previous adversarial attacks on face recognition (FR) have not adequately addressed. Considering this challenge, we propose a novel adversarial attack on FR that aims to improve the resistance of adversarial examples against JPEG compression. Specifically, during the iterative process of generating adversarial face examples, we interpolate the adversarial face examples into a smaller size. Then we utilize these interpolated adversarial face examples to create the adversarial examples in the next iteration. Subsequently, we restore the adversarial face examples to their original size by interpolating. Throughout the entire process, our proposed method can smooth the adversarial perturbations, effectively mitigating the presence of high-frequency signals in the crafted adversarial face examples that are typically eliminated by JPEG compression. Our experimental results demonstrate the effectiveness of our proposed method in improving the JPEG-resistance of adversarial face examples.
JPEG压缩可能显著削弱 adversarial 面部示例的性能,这是之前对面部识别(FR)的 adversarial 攻击没有充分解决的问题。考虑到这一挑战,我们提出了一个旨在提高 adversarial 面部示例对 JPEG 压缩的抵抗力的 novel adversarial attack on FR。具体来说,在生成 adversarial 面部示例的迭代过程中,我们通过插值将 adversarial 面部示例缩小到更小的尺寸。然后,我们利用这些插值的 adversarial 面部示例在下一迭代中创建 adversarial 示例。随后,我们通过插值将 adversarial 面部示例恢复到其原始大小。在整个过程中,我们的方法可以平滑 adversarial 扰动,有效地减轻通过 JPEG 压缩通常消除的高频信号在 crafted adversarial 面部示例中存在的现象。我们的实验结果证明了我们在提高 adversarial 面部示例对 JPEG 压缩的抵抗力方面所采取的方法的有效性。
https://arxiv.org/abs/2402.16586
Recent advancements in deep learning have revolutionized technology and security measures, necessitating robust identification methods. Biometric approaches, leveraging personalized characteristics, offer a promising solution. However, Face Recognition Systems are vulnerable to sophisticated attacks, notably face morphing techniques, enabling the creation of fraudulent documents. In this study, we introduce a novel quadruplet loss function for increasing the robustness of face recognition systems against morphing attacks. Our approach involves specific sampling of face image quadruplets, combined with face morphs, for network training. Experimental results demonstrate the efficiency of our strategy in improving the robustness of face recognition networks against morphing attacks.
近年来,在深度学习的支持下,科技和安全措施得到了革命性的发展,迫切需要更强大的识别方法。生物特征方法利用个人特征,为深度学习提供了一种有前景的解决方案。然而,基于面部的识别系统容易受到复杂的攻击,特别是面变形技术,导致欺诈文件的创建。在这项研究中,我们提出了一种新的四元组损失函数,以提高基于面部的识别系统对面变形攻击的鲁棒性。我们的方法包括对面部图像四元组的特定采样,与面变形一起进行网络训练。实验结果表明,我们的策略在提高基于面部的识别网络对面变形攻击的鲁棒性方面非常有效。
https://arxiv.org/abs/2402.14665
AI based Face Recognition Systems (FRSs) are now widely distributed and deployed as MLaaS solutions all over the world, moreso since the COVID-19 pandemic for tasks ranging from validating individuals' faces while buying SIM cards to surveillance of citizens. Extensive biases have been reported against marginalized groups in these systems and have led to highly discriminatory outcomes. The post-pandemic world has normalized wearing face masks but FRSs have not kept up with the changing times. As a result, these systems are susceptible to mask based face occlusion. In this study, we audit four commercial and nine open-source FRSs for the task of face re-identification between different varieties of masked and unmasked images across five benchmark datasets (total 14,722 images). These simulate a realistic validation/surveillance task as deployed in all major countries around the world. Three of the commercial and five of the open-source FRSs are highly inaccurate; they further perpetuate biases against non-White individuals, with the lowest accuracy being 0%. A survey for the same task with 85 human participants also results in a low accuracy of 40%. Thus a human-in-the-loop moderation in the pipeline does not alleviate the concerns, as has been frequently hypothesized in literature. Our large-scale study shows that developers, lawmakers and users of such services need to rethink the design principles behind FRSs, especially for the task of face re-identification, taking cognizance of observed biases.
基于AI的Face Recognition系统(FRS)现在在全球范围内广泛分布并作为MLaaS解决方案部署,尤其是在COVID-19大流行期间,例如验证个人购买SIM卡时的人脸识别和监视公民等任务。在这些系统中,针对边缘化群体的偏见报道较多,导致高度歧视性结果。大流行过后,世界已经正常化戴口罩,但FRS并没有跟上时代的变化。因此,这些系统容易受到口罩为基础的人脸遮挡。在本文中,我们对四个商业和九个开源FRS进行了审计,针对不同口罩和未戴口罩的图像进行人脸识别,跨越五个基准数据集(总共14,722张图片)。这些模拟了一个真实世界的验证/监视任务,类似于全球各国广泛部署的任务。三个商业和五个开源FRS高度不准确;它们进一步加剧了针对非白人居民的偏见,最低准确度仅为0%。同样,使用85名人类参与者的相同任务调查也结果不理想,准确度只有40%。因此,在流水线中的人类-闭环管理并不能减轻这些担忧,正如文献中经常假设的那样。 我们的大规模研究显示,开发人员、立法者和使用这些服务的用户需要重新思考FRS的设计原则,尤其是针对人脸识别任务,要关注所观察到的偏见。
https://arxiv.org/abs/2402.13771
Recently, there has been an explosion of mobile applications that perform computationally intensive tasks such as video streaming, data mining, virtual reality, augmented reality, image processing, video processing, face recognition, and online gaming. However, user devices (UDs), such as tablets and smartphones, have a limited ability to perform the computation needs of the tasks. Mobile edge computing (MEC) has emerged as a promising technology to meet the increasing computing demands of UDs. Task offloading in MEC is a strategy that meets the demands of UDs by distributing tasks between UDs and MEC servers. Deep reinforcement learning (DRL) is gaining attention in task-offloading problems because it can adapt to dynamic changes and minimize online computational complexity. However, the various types of continuous and discrete resource constraints on UDs and MEC servers pose challenges to the design of an efficient DRL-based task-offloading strategy. Existing DRL-based task-offloading algorithms focus on the constraints of the UDs, assuming the availability of enough storage resources on the server. Moreover, existing multiagent DRL (MADRL)--based task-offloading algorithms are homogeneous agents and consider homogeneous constraints as a penalty in their reward function. We proposed a novel combinatorial client-master MADRL (CCM\_MADRL) algorithm for task offloading in MEC (CCM\_MADRL\_MEC) that enables UDs to decide their resource requirements and the server to make a combinatorial decision based on the requirements of the UDs. CCM\_MADRL\_MEC is the first MADRL in task offloading to consider server storage capacity in addition to the constraints in the UDs. By taking advantage of the combinatorial action selection, CCM\_MADRL\_MEC has shown superior convergence over existing MADDPG and heuristic algorithms.
近年来,出现了大量执行计算密集型任务(如视频流媒体、数据挖掘、虚拟现实、增强现实、图像处理、视频处理、人脸识别和在线游戏)的移动应用程序。然而,用户设备(UDs)如平板电脑和智能手机,执行任务所需的计算能力有限。移动边缘计算(MEC)作为一种有望满足UDs计算需求的先进技术,已经引起了人们的关注。在MEC中的任务卸载是一种通过将任务分配给UD和MEC服务器来满足UD需求的战略。深度强化学习(DRL)在任务卸载问题中受到关注,因为它可以适应动态变化并最小化在线计算复杂性。然而,UD和MEC服务器上各种类型的连续和离散资源限制对基于DRL的任务卸载策略的设计提出了挑战。现有的DRL基于任务卸载算法集中关注UD的约束,假设服务器上存储资源足够丰富。此外,现有的多智能体DRL(MADRL)基于任务卸载算法是同质代理,将同质约束视为其奖励函数中的惩罚项。我们提出了一个名为CCM_MADRL的组合客户端-主机MADRL(CCM_MADRL_MEC)算法,用于在MEC中进行任务卸载,该算法允许UD决定其资源需求,而服务器根据UD的要求进行组合决策。CCM_MADRL_MEC是第一个考虑服务器存储容量以及UD约束的MADRL。通过利用组合动作选择,CCM_MADRL_MEC在现有MADDPG和启发式算法上表现出优越的收敛性能。
https://arxiv.org/abs/2402.11653
Deep neural networks are extensively applied to real-world tasks, such as face recognition and medical image classification, where privacy and data protection are critical. Image data, if not protected, can be exploited to infer personal or contextual information. Existing privacy preservation methods, like encryption, generate perturbed images that are unrecognizable to even humans. Adversarial attack approaches prohibit automated inference even for authorized stakeholders, limiting practical incentives for commercial and widespread adaptation. This pioneering study tackles an unexplored practical privacy preservation use case by generating human-perceivable images that maintain accurate inference by an authorized model while evading other unauthorized black-box models of similar or dissimilar objectives, and addresses the previous research gaps. The datasets employed are ImageNet, for image classification, Celeba-HQ dataset, for identity classification, and AffectNet, for emotion classification. Our results show that the generated images can successfully maintain the accuracy of a protected model and degrade the average accuracy of the unauthorized black-box models to 11.97%, 6.63%, and 55.51% on ImageNet, Celeba-HQ, and AffectNet datasets, respectively.
深度神经网络在现实任务中得到了广泛应用,如面部识别和医学图像分类,其中隐私和数据保护至关重要。如果图像数据未受到保护,它们可以用于推断个人或上下文信息。现有的隐私保护方法,如加密,生成的图像让人类无法识别。对抗性攻击方法禁止自动推理,甚至对于经过授权的利益相关者也会如此,这限制了商业和广泛应用的动力。这项开创性的研究通过生成人类可感知且保持授权模型准确性的图像来解决一个未探索的隐私保护应用案例,同时避开其他具有类似或不同目标的黑盒模型,并填补了前人研究的空白。所使用的数据集包括ImageNet用于图像分类、Celeba-HQ用于身份分类和AffectNet用于情感分类。我们的结果表明,生成的图像能够成功地保持受保护模型的准确性,同时削弱未经授权的黑盒模型的平均准确率为11.97%,6.63%和55.51%在ImageNet、Celeba-HQ和AffectNet数据集上。
https://arxiv.org/abs/2402.09316
This paper introduces the Membership Inference Test (MINT), a novel approach that aims to empirically assess if specific data was used during the training of Artificial Intelligence (AI) models. Specifically, we propose two novel MINT architectures designed to learn the distinct activation patterns that emerge when an audited model is exposed to data used during its training process. The first architecture is based on a Multilayer Perceptron (MLP) network and the second one is based on Convolutional Neural Networks (CNNs). The proposed MINT architectures are evaluated on a challenging face recognition task, considering three state-of-the-art face recognition models. Experiments are carried out using six publicly available databases, comprising over 22 million face images in total. Also, different experimental scenarios are considered depending on the context available of the AI model to test. Promising results, up to 90% accuracy, are achieved using our proposed MINT approach, suggesting that it is possible to recognize if an AI model has been trained with specific data.
本文介绍了一种名为 Membership Inference Test (MINT) 的全新方法,旨在通过实证评估在人工智能(AI)模型训练过程中是否使用了特定数据。具体来说,我们提出了两个新的 MINT 架构,用于学习在受到训练过程中使用的特定数据激活模式时出现的独特激活模式。第一个架构是基于多层感知器(MLP)网络,第二个架构是基于卷积神经网络(CNN)。所提出的 MINT 架构在具有挑战性的人脸识别任务上进行了评估,考虑了三种最先进的人脸识别模型。实验使用了六个公开可用的数据库,包括总共超过 2200 万张人脸图片。此外,根据 AI 模型所处的上下文,还考虑了不同实验场景。使用我们提出的 MINT 方法获得的预测准确率可以达到 90%。这表明,使用特定数据可以训练 AI 模型。
https://arxiv.org/abs/2402.09225
Ensuring robustness in face recognition systems across various challenging conditions is crucial for their versatility. State-of-the-art methods often incorporate additional information, such as depth, thermal, or angular data, to enhance performance. However, light field-based face recognition approaches that leverage angular information face computational limitations. This paper investigates the fundamental trade-off between spatio-angular resolution in light field representation to achieve improved face recognition performance. By utilizing macro-pixels with varying angular resolutions while maintaining the overall image size, we aim to quantify the impact of angular information at the expense of spatial resolution, while considering computational constraints. Our experimental results demonstrate a notable performance improvement in face recognition systems by increasing the angular resolution, up to a certain extent, at the cost of spatial resolution.
确保各种挑战条件下的识别系统具有稳健性对它们的多样性至关重要。最先进的方法通常包含额外的信息,例如深度、热或角数据,以提高性能。然而,基于光场的人脸识别方法利用角信息面临着计算限制。本文研究了光场表示中空间角分辨率的基本权衡,以实现提高人脸识别性能。通过在保持整体图像大小的同时利用具有不同角分辨率的宏像素,我们旨在量化角信息在空间分辨率上的影响,同时考虑计算限制。我们的实验结果表明,通过增加角分辨率,在一定程度上牺牲空间分辨率,人脸识别系统的性能得到了显著提高。
https://arxiv.org/abs/2402.07263