With the increasing deployment of intelligent CCTV systems in outdoor environments, there is a growing demand for face recognition systems optimized for challenging weather conditions. Adverse weather significantly degrades image quality, which in turn reduces recognition accuracy. Although recent face image restoration (FIR) models based on generative adversarial networks (GANs) and diffusion models have shown progress, their performance remains limited due to the lack of dedicated modules that explicitly address weather-induced degradations. This leads to distorted facial textures and structures. To address these limitations, we propose a novel GAN-based blind FIR framework that integrates two key components: local Statistical Facial Feature Transformation (SFFT) and Degradation-Agnostic Feature Embedding (DAFE). The local SFFT module enhances facial structure and color fidelity by aligning the local statistical distributions of low-quality (LQ) facial regions with those of high-quality (HQ) counterparts. Complementarily, the DAFE module enables robust statistical facial feature extraction under adverse weather conditions by aligning LQ and HQ encoder representations, thereby making the restoration process adaptive to severe weather-induced degradations. Experimental results demonstrate that the proposed degradation-agnostic SFFT model outperforms existing state-of-the-art FIR methods based on GAN and diffusion models, particularly in suppressing texture distortions and accurately reconstructing facial structures. Furthermore, both the SFFT and DAFE modules are empirically validated in enhancing structural fidelity and perceptual quality in face restoration under challenging weather scenarios.
随着智能CCTV系统在户外环境中的部署日益增加,对于能够在恶劣天气条件下优化的面部识别系统的市场需求也在不断增长。不良天气条件显著降低图像质量,进而减少了识别准确率。尽管基于生成对抗网络(GAN)和扩散模型的最新面部图像恢复(FIR)方法显示出了一定的进步,但由于缺乏专门针对由天气引起的退化问题进行处理的模块,其性能仍然有限,导致了面部纹理和结构变形。 为了克服这些限制,我们提出了一种新颖的基于GAN的盲态FIR框架,该框架集成了两个关键组件:局部统计面部特征转换(SFFT)以及不受退化影响的特征嵌入(DAFE)。局部SFFT模块通过使低质量(LQ)面部区域与高质量(HQ)对应部分之间的局部统计分布对齐来增强面部结构和颜色保真度。同时,DAFE模块允许在恶劣天气条件下进行鲁棒性统计面部特征提取,通过使LQ和HQ编码器表示对齐,从而使恢复过程适应于严重的由天气引起的退化。 实验结果表明,所提出的不受退化影响的SFFT模型优于现有的基于GAN和扩散模型的FIR方法,在抑制纹理失真并准确重建面部结构方面尤其有效。此外,无论是SFFT模块还是DAFE模块都在挑战性天气场景下的面部恢复过程中被实证验证为能够增强结构保真度和感知质量。
https://arxiv.org/abs/2507.07464
This study presents findings from long-term biometric evaluations conducted at the Biometric Evaluation Center (bez). Over the course of two and a half years, our ongoing research with over 400 participants representing diverse ethnicities, genders, and age groups were regularly assessed using a variety of biometric tools and techniques at the controlled testing facilities. Our findings are based on the General Data Protection Regulation-compliant local bez database with more than 238.000 biometric data sets categorized into multiple biometric modalities such as face and finger. We used state-of-the-art face recognition algorithms to analyze long-term comparison scores. Our results show that these scores fluctuate more significantly between individual days than over the entire measurement period. These findings highlight the importance of testing biometric characteristics of the same individuals over a longer period of time in a controlled measurement environment and lays the groundwork for future advancements in biometric data analysis.
这项研究展示了在生物特征评估中心(bez)进行的长期生物识别评估的结果。历时两年半,我们持续的研究项目涉及超过400名参与者,这些参与者涵盖了不同的种族、性别和年龄群体,并且他们定期使用多种生物识别工具和技术,在受控测试设施内接受评估。 我们的发现基于符合《通用数据保护条例》的地方bez数据库,该数据库包含超过238,000份分类为面部、指纹等多种生物特征模式的生物识别数据集。我们采用了最先进的面部识别算法来分析长期比较得分,并且研究结果表明这些得分在不同的日子之间波动较大,但在整个测量期间相对稳定。 这些发现强调了在受控环境中长时间测试同一对象的生物特征的重要性,并为未来生物识别数据分析的进步奠定了基础。
https://arxiv.org/abs/2507.06858
In this paper, we address the following question: How do generic foundation models (e.g., CLIP, BLIP, LLaVa, DINO) compare against a domain-specific face recognition model (viz., AdaFace or ArcFace) on the face recognition task? Through a series of experiments involving several foundation models and benchmark datasets, we are able to report the following findings: (a) In all datasets considered, domain-specific models outperformed zero-shot foundation models. (b) The performance of zero-shot generic foundation models improves on over-segmented face images than tightly cropped faces thereby suggesting the importance of contextual clues. For example, at a False Match Rate (FMR) of 0.01%, the True Match Rate (TMR) of OpenCLIP improved from 64.97% to 81.73% on the LFW dataset as the face crop increased from 112x112 to 250x250 while the TMR of domain-specific AdaFace dropped from 99.09% to 77.31%. (c) A simple score-level fusion of a foundation model with a domain-specific FR model improved the accuracy at low FMRs. For example, the TMR of AdaFace when fused with BLIP improved from 72.64% to 83.31% at an FMR of 0.0001% on the IJB-B dataset and from 73.17% to 85.81% on the IJB-C dataset. (d) Foundation models, such as ChatGPT, can be used to impart explainability to the FR pipeline (e.g., ``Despite minor lighting and head tilt differences, the two left-profile images show high consistency in forehead slope, nose shape, chin contour...''). In some instances, foundation models are even able to resolve low-confidence decisions made by AdaFace (e.g., ``Although AdaFace assigns a low similarity score of 0.21, both images exhibit visual similarity...and the pair is likely of the same person''), thereby reiterating the importance of combining domain-specific FR models with generic foundation models in a judicious manner.
在这篇论文中,我们探讨了以下问题:通用基础模型(如CLIP、BLIP、LLaVa、DINO)与专用于人脸识别的领域特定模型(例如AdaFace或ArcFace)在人脸识别任务上的表现如何?通过一系列涉及多个基础模型和基准数据集的实验,我们得出了如下发现: (a) 在所有考虑的数据集中,领域特定模型的表现优于零样本基础模型。 (b) 零样本通用基础模型在处理过度分割的人脸图像时比紧密裁剪的人脸图像表现出更好的性能,这表明上下文线索的重要性。例如,在错误匹配率为0.01%的情况下,OpenCLIP在LFW数据集上的真匹配率从64.97%提升到了81.73%,当面部裁剪尺寸由112x112增加到250x250时;相比之下,领域特定的AdaFace模型的真匹配率却从99.09%下降到了77.31%。 (c) 将基础模型与专有的人脸识别模型进行简单的分数级融合可以提高低错误匹配率下的准确性。例如,在IJB-B数据集上,当AdaFace与BLIP融合后,以0.0001%的FMR计算时,真匹配率从72.64%提升到了83.31%,而在IJB-C数据集上的真匹配率则从73.17%提高到85.81%。 (d) 像ChatGPT这样的基础模型可以用于为人脸识别管道赋予解释性(例如,“尽管光线和头部角度略有不同,两张侧面照在前额斜度、鼻形和下巴轮廓方面表现出高度一致性……”)。在某些情况下,基础模型甚至能够解决AdaFace做出的低置信度决策(例如,“虽然AdaFace给出的相似分数为0.21较低,但两幅图像具有视觉上的相似性……这对可能是同一个人的照片。”),这进一步强调了以明智的方式结合领域特定的人脸识别模型与通用基础模型的重要性。
https://arxiv.org/abs/2507.03541
Face identification systems operating in the ciphertext domain have garnered significant attention due to increasing privacy concerns and the potential recovery of original facial data. However, as the size of ciphertext template libraries grows, the face retrieval process becomes progressively more time-intensive. To address this challenge, we propose a novel and efficient scheme for face retrieval in the ciphertext domain, termed Privacy-Preserving Preselection for Face Identification Based on Packing (PFIP). PFIP incorporates an innovative preselection mechanism to reduce computational overhead and a packing module to enhance the flexibility of biometric systems during the enrollment stage. Extensive experiments conducted on the LFW and CASIA datasets demonstrate that PFIP preserves the accuracy of the original face recognition model, achieving a 100% hit rate while retrieving 1,000 ciphertext face templates within 300 milliseconds. Compared to existing approaches, PFIP achieves a nearly 50x improvement in retrieval efficiency.
基于加密域中的人脸识别系统由于隐私保护需求和防止原始面部数据恢复的潜在风险而备受关注。然而,随着加密模板库大小的增长,人脸检索过程变得越来越耗时。为解决这一挑战,我们提出了一种新颖且高效的方案——基于包装(Packing)的身份认证隐私保护预选方案(PFIP),用于加密域中的面部识别。该方案集成了创新的预选择机制以减少计算开销,并引入了一个打包模块来增强生物特征系统在注册阶段的灵活性。 在LFW和CASIA数据集中进行的广泛实验表明,PFIP能够保持原始人脸识别模型的准确性,在检索1,000个加密人脸模板时实现100%命中率且耗时不超过300毫秒。与现有方法相比,PFIP在检索效率上实现了近50倍的提升。
https://arxiv.org/abs/2507.02414
The widespread use of deep learning face recognition raises several security concerns. Although prior works point at existing vulnerabilities, DNN backdoor attacks against real-life, unconstrained systems dealing with images captured in the wild remain a blind spot of the literature. This paper conducts the first system-level study of backdoors in deep learning-based face recognition systems. This paper yields four contributions by exploring the feasibility of DNN backdoors on these pipelines in a holistic fashion. We demonstrate for the first time two backdoor attacks on the face detection task: face generation and face landmark shift attacks. We then show that face feature extractors trained with large margin losses also fall victim to backdoor attacks. Combining our models, we then show using 20 possible pipeline configurations and 15 attack cases that a single backdoor enables an attacker to bypass the entire function of a system. Finally, we provide stakeholders with several best practices and countermeasures.
深度学习面部识别的广泛应用引发了几项安全问题。尽管先前的研究指出了现有的漏洞,但针对实际环境中捕获图像的真实无约束系统进行DNN后门攻击仍然是文献中的盲点。本文首次对基于深度学习的面部识别系统的后门进行了系统级研究。本文通过全面探索这些管道中DNN后门的可能性,在四个方面做出了贡献。我们首次展示了在面部检测任务上的两个后门攻击:面部生成和面部关键点偏移攻击。接着,我们还表明使用大边距损失训练的面部特征提取器同样容易受到后门攻击的影响。结合我们的模型,我们通过20种可能的管道配置和15个攻击案例展示了一个单一的后门就足以使攻击者绕过整个系统的功能。最后,本文为利益相关者提供了几种最佳实践和对策。
https://arxiv.org/abs/2507.01607
This study presents a novel classroom surveillance system that integrates multiple modalities, including drowsiness, tracking of mobile phone usage, and face recognition,to assess student attentiveness with enhanced this http URL system leverages the YOLOv8 model to detect both mobile phone and sleep usage,(Ghatge et al., 2024) while facial recognition is achieved through LResNet Occ FC body tracking using YOLO and MTCNN.(Durai et al., 2024) These models work in synergy to provide comprehensive, real-time monitoring, offering insights into student engagement and behavior.(S et al., 2023) The framework is trained on specialized datasets, such as the RMFD dataset for face recognition and a Roboflow dataset for mobile phone detection. The extensive evaluation of the system shows promising results. Sleep detection achieves 97. 42% mAP@50, face recognition achieves 86. 45% validation accuracy and mobile phone detection reach 85. 89% mAP@50. The system is implemented within a core PHP web application and utilizes ESP32-CAM hardware for seamless data capture.(Neto et al., 2024) This integrated approach not only enhances classroom monitoring, but also ensures automatic attendance recording via face recognition as students remain seated in the classroom, offering scalability for diverse educational environments.(Banada,2025)
这项研究提出了一种新型的课堂监控系统,该系统集成了多种模态,包括检测瞌睡、手机使用跟踪和面部识别等,以增强对学生注意力的评估。此系统的开发基于YOLOv8模型来检测手机使用和睡眠状态(Ghatge等人,2024年),同时通过LResNet Occ FC身体追踪结合YOLO和MTCNN实现了面部识别功能(Durai等人,2024年)。这些模型协同工作,提供了全面的实时监控,为了解学生参与度和行为模式提供深入了解。(S等,2023年) 该框架是在专门的数据集上进行训练的,例如用于面部识别的RMFD数据集和用于手机检测的Roboflow数据集。系统经过广泛的评估显示了令人鼓舞的结果:睡眠检测的平均精度(mAP@50)达到了97.42%,面部识别验证准确率为86.45%,而手机检测则达到85.89% mAP@50。 该系统的实现采用了一种核心PHP网络应用程序,并使用ESP32-CAM硬件来无缝采集数据。(Neto等人,2024年)这种综合方法不仅提高了课堂监控的效果,而且还通过面部识别自动记录学生的出勤情况,确保学生在座位上时即被系统捕捉到,从而为各种教育环境提供了可扩展性(Banada, 2025)。
https://arxiv.org/abs/2507.01590
In current practical face authentication systems, most face recognition (FR) algorithms are based on cosine similarity with softmax classification. Despite its reliable classification performance, this method struggles with hard samples. A popular strategy to improve FR performance is incorporating angular or cosine margins. However, it does not take face quality or recognition hardness into account, simply increasing the margin value and thus causing an overly uniform training strategy. To address this problem, a novel loss function is proposed, named Loss function for Hard High-quality Face (LH2Face). Firstly, a similarity measure based on the von Mises-Fisher (vMF) distribution is stated, specifically focusing on the logarithm of the Probability Density Function (PDF), which represents the distance between a probability distribution and a vector. Then, an adaptive margin-based multi-classification method using softmax, called the Uncertainty-Aware Margin Function, is implemented in the article. Furthermore, proxy-based loss functions are used to apply extra constraints between the proxy and sample to optimize their representation space distribution. Finally, a renderer is constructed that optimizes FR through face reconstruction and vice versa. Our LH2Face is superior to similiar schemes on hard high-quality face datasets, achieving 49.39% accuracy on the IJB-B dataset, which surpasses the second-place method by 2.37%.
在当前的实际面部认证系统中,大多数面部识别(FR)算法都是基于余弦相似度和softmax分类。尽管这种方法具有可靠的分类性能,但在处理难以识别的样本时表现不佳。一种常用的改进面部识别性能的方法是引入角度或余弦边距。然而,这种策略没有考虑面部质量或识别难度的问题,只是简单地增加边距值,从而导致了过于统一的训练策略。为了解决这个问题,提出了一种新的损失函数,名为“针对硬高质量人脸的损失函数”(Loss function for Hard High-quality Face, LH2Face)。 首先,该方法基于冯·米塞斯-费希尔(von Mises-Fisher, vMF)分布提出了一个相似度测量标准,并特别关注了概率密度函数(PDF)的对数值,这代表了一个概率分布与向量之间的距离。然后,在文章中实现了一种使用softmax的不确定性感知边距函数的多分类方法,这是一种自适应边距基的方法。此外,还采用了基于代理的损失函数来在代理和样本之间施加额外约束,以优化它们的表示空间分布。最后,构建了一个渲染器,通过面部重建和反向过程来进行面部识别的优化。 我们的LH2Face方案在硬高质量的人脸数据集上优于类似方案,在IJB-B数据集上的准确率为49.39%,比第二名的方法高出2.37%。
https://arxiv.org/abs/2506.23555
Burstiness, a phenomenon observed in text and image retrieval, refers to that particular elements appear more times in a set than a statistically independent model assumes. We argue that in the context of set-based face recognition (SFR), burstiness exists widely and degrades the performance in two aspects: Firstly, the bursty faces, where faces with particular attributes %exist frequently in a face set, dominate the training instances and dominate the training face sets and lead to poor generalization ability to unconstrained scenarios. Secondly, the bursty faces %dominating the evaluation sets interfere with the similarity comparison in set verification and identification when evaluation. To detect the bursty faces in a set, we propose three strategies based on Quickshift++, feature self-similarity, and generalized max-pooling (GMP). We apply the burst detection results on training and evaluation stages to enhance the sampling ratios or contributions of the infrequent faces. When evaluation, we additionally propose the quality-aware GMP that enables awareness of the face quality and robustness to the low-quality faces for the original GMP. We give illustrations and extensive experiments on the SFR benchmarks to demonstrate that burstiness is widespread and suppressing burstiness considerably improves the recognition performance.
突发性(Burstiness)是文本和图像检索中观察到的现象,指的是某些元素在集合中出现的次数比统计独立模型所假设的要多。我们提出,在基于集合的人脸识别(Set-Based Face Recognition, SFR)中,突发性普遍存在,并从两个方面影响性能:首先,具有特定属性的人脸频繁出现在人脸集中,主导了训练实例和训练集,导致在不受约束场景下的泛化能力较差;其次,在评估集中占主导地位的这些突发性人脸会干扰集合验证和识别过程中的相似度比较。 为了检测集合中突发性的面孔,我们提出了三种策略:基于Quickshift++、特征自相似性和广义最大池化(Generalized Max-Pooling, GMP)的方法。在训练和评估阶段应用突发检测结果以增加稀有面孔的采样比例或贡献值。此外,在评估过程中,我们还提出了一种质量感知GMP方法,使原始GMP能够识别面部质量并提高对低质量面部的鲁棒性。 我们在SFR基准测试上提供了详细的说明和广泛的实验来展示突发性的广泛存在,并表明抑制突发性可显著提升人脸识别性能。
https://arxiv.org/abs/2506.20312
Backdoor attacks embed a hidden functionality into deep neural networks, causing the network to display anomalous behavior when activated by a predetermined pattern in the input Trigger, while behaving well otherwise on public test data. Recent works have shown that backdoored face recognition (FR) systems can respond to natural-looking triggers like a particular pair of sunglasses. Such attacks pose a serious threat to the applicability of FR systems in high-security applications. We propose a novel technique to (1) detect whether an FR network is compromised with a natural, physically realizable trigger, and (2) identify such triggers given a compromised network. We demonstrate the effectiveness of our methods with a compromised FR network, where we are able to identify the trigger (e.g., green sunglasses or red hat) with a top-5 accuracy of 74%, whereas a naive brute force baseline achieves 56% accuracy.
后门攻击将隐藏功能嵌入深度神经网络,使得当输入触发器(Trigger)中包含预设模式时,该网络会表现出异常行为;而在其他情况下,则在网络的公开测试数据上表现良好。最近的研究表明,被植入后门的人脸识别系统会对一些自然存在的触发器做出响应,例如特定的一副太阳镜。此类攻击对人脸识别系统的高安全性应用构成了严重威胁。 我们提出了一种新的技术方法来: 1. 检测一个人脸识别网络是否被一种自然、可物理实现的触发器所侵入; 2. 在一个已经遭到入侵的人脸识别系统中,找出这样的触发器。 通过使用一个已被植入后门的人脸识别系统作为案例,我们能够以74%的准确率识别出具体的触发器(例如绿色太阳镜或红色帽子),而相比之下,一种简单的暴力破解基线方法只能达到56%的准确度。
https://arxiv.org/abs/2506.19533
Face identity provides a powerful signal for deepfake detection. Prior studies show that even when not explicitly modeled, classifiers often learn identity features implicitly. This has led to conflicting views: some suppress identity cues to reduce bias, while others rely on them as forensic evidence. To reconcile these views, we analyze two hypotheses: (1) whether face identity alone is discriminative for detecting deepfakes, and (2) whether such identity features generalize poorly across manipulation methods. Our experiments confirm that identity is informative but context-dependent. While some manipulations preserve identity-consistent artifacts, others distort identity cues and harm generalization. We argue that identity features should neither be blindly suppressed nor relied upon, but instead be explicitly modeled and adaptively controlled based on per-sample relevance. We propose \textbf{SELFI} (\textbf{SEL}ective \textbf{F}usion of \textbf{I}dentity), a generalizable detection framework that dynamically modulates identity usage. SELFI consists of: (1) a Forgery-Aware Identity Adapter (FAIA) that extracts identity embeddings from a frozen face recognition model and projects them into a forgery-relevant space via auxiliary supervision; and (2) an Identity-Aware Fusion Module (IAFM) that selectively integrates identity and visual features using a relevance-guided fusion mechanism. Experiments on four benchmarks show that SELFI improves cross-manipulation generalization, outperforming prior methods by an average of 3.1\% AUC. On the challenging DFDC dataset, SELFI exceeds the previous best by 6\%. Code will be released upon paper acceptance.
人脸身份为深度伪造检测提供了一个强大的信号。先前的研究表明,即使没有被明确建模,分类器也往往会隐式地学习到身份特征。这导致了相互冲突的观点:一些研究抑制身份线索以减少偏差,而另一些则依赖它们作为法医证据。为了调和这些观点,我们分析了两个假设:(1) 人脸身份本身是否足以用于检测深度伪造;(2) 这类身份特征在不同的操作方法下是否泛化能力较差。我们的实验确认身份信息是有用的但也是上下文相关的。虽然某些操纵保留了与身份一致的痕迹,但也有一些则扭曲了身份线索并损害了泛化性能。我们主张,不应盲目地抑制或依赖身份特征,而应该根据每个样本的相关性对其进行明确建模和自适应控制。 为此,我们提出了**SELFI**(Selective Fusion of Identity),这是一个可广泛应用于检测任务中的框架,能够动态调整对身份信息的使用方式。SELFI 包含两个主要部分:(1) 伪造感知的身份适配器 (FAIA),它从冻结的人脸识别模型中提取身份嵌入,并通过辅助监督将其投影到与伪造相关的空间;以及 (2) 身份感知融合模块 (IAFM),该模块使用基于相关性引导的机制选择性地结合身份和视觉特征。 在四个基准测试上的实验表明,SELFI 在跨操作泛化方面取得了提升,平均 AUC 比先前的方法提高了 3.1%。特别是在具有挑战性的 DFDC 数据集上,SELFI 的表现超过了之前的最佳结果 6%。待论文被接受后将公开代码。
https://arxiv.org/abs/2506.17592
Demographic bias in high-performance face recognition (FR) systems often eludes detection by existing metrics, especially with respect to subtle disparities in the tails of the score distribution. We introduce the Comprehensive Equity Index (CEI), a novel metric designed to address this limitation. CEI uniquely analyzes genuine and impostor score distributions separately, enabling a configurable focus on tail probabilities while also considering overall distribution shapes. Our extensive experiments (evaluating state-of-the-art FR systems, intentionally biased models, and diverse datasets) confirm CEI's superior ability to detect nuanced biases where previous methods fall short. Furthermore, we present CEI^A, an automated version of the metric that enhances objectivity and simplifies practical application. CEI provides a robust and sensitive tool for operational FR fairness assessment. The proposed methods have been developed particularly for bias evaluation in face biometrics but, in general, they are applicable for comparing statistical distributions in any problem where one is interested in analyzing the distribution tails.
高性能面部识别(FR)系统中的人口统计偏差往往被现有的度量标准所忽略,特别是在得分分布尾部的细微差异方面。我们引入了全面公平指数(Comprehensive Equity Index, CEI),这是一种旨在解决此限制的新颖度量标准。CEI独特地分别分析真实和冒名顶替者的分数分布,并允许对尾概率进行可配置的重点关注,同时考虑整体分布形态。我们的广泛实验(评估最先进的FR系统、故意带有偏见的模型以及多样化的数据集)证实了CEI在检测先前方法未能发现的细微偏差方面的卓越能力。此外,我们还提出了CEI^A,即该度量标准的自动化版本,它增强了客观性并简化了实际应用。CEI为操作层面的FR公平性评估提供了一个稳健且敏感的工具。所提出的方法特别为了面部生物识别中的偏见评估而开发,但一般来说,在任何关注分布尾部的统计分析问题中都适用。
https://arxiv.org/abs/2506.10564
This paper introduces FaceLiVT, a lightweight yet powerful face recognition model that integrates a hybrid Convolution Neural Network (CNN)-Transformer architecture with an innovative and lightweight Multi-Head Linear Attention (MHLA) mechanism. By combining MHLA alongside a reparameterized token mixer, FaceLiVT effectively reduces computational complexity while preserving competitive accuracy. Extensive evaluations on challenging benchmarks; including LFW, CFP-FP, AgeDB-30, IJB-B, and IJB-C; highlight its superior performance compared to state-of-the-art lightweight models. MHLA notably improves inference speed, allowing FaceLiVT to deliver high accuracy with lower latency on mobile devices. Specifically, FaceLiVT is 8.6 faster than EdgeFace, a recent hybrid CNN-Transformer model optimized for edge devices, and 21.2 faster than a pure ViT-Based model. With its balanced design, FaceLiVT offers an efficient and practical solution for real-time face recognition on resource-constrained platforms.
本文介绍了FaceLiVT,这是一种轻量级但强大的面部识别模型,它结合了混合卷积神经网络(CNN)-变压器架构,并且采用了创新的、轻量级的多头线性注意力(MHLA)机制。通过将MHLA与重新参数化的标记混合器结合使用,FaceLiVT有效地降低了计算复杂度,同时保持了竞争力的准确性。在包括LFW、CFP-FP、AgeDB-30、IJB-B和IJB-C在内的具有挑战性的基准测试中进行了广泛的评估,结果显示其性能优于最先进的轻量级模型。MHLA显著提高了推理速度,使FaceLiVT能够在移动设备上以较低的延迟提供高精度的结果。具体而言,FaceLiVT的速度比为边缘设备优化的最近混合CNN-Transformer模型EdgeFace快8.6倍,比基于纯ViT的模型快21.2倍。凭借其平衡的设计,FaceLiVT为资源受限平台上的实时面部识别提供了高效且实用的解决方案。
https://arxiv.org/abs/2506.10361
In this paper, we propose ScoreMix, a novel yet simple data augmentation strategy leveraging the score compositional properties of diffusion models to enhance discriminator performance, particularly under scenarios with limited labeled data. By convexly mixing the scores from different class-conditioned trajectories during diffusion sampling, we generate challenging synthetic samples that significantly improve discriminative capabilities in all studied benchmarks. We systematically investigate class-selection strategies for mixing and discover that greater performance gains arise when combining classes distant in the discriminator's embedding space, rather than close in the generator's condition space. Moreover, we empirically show that, under standard metrics, the correlation between the generator's learned condition space and the discriminator's embedding space is minimal. Our approach achieves notable performance improvements without extensive parameter searches, demonstrating practical advantages for training discriminative models while effectively mitigating problems regarding collections of large datasets. Paper website: this https URL
在这篇论文中,我们提出了ScoreMix,这是一种新颖且简单的数据增强策略,利用扩散模型的分数组合特性来提升判别器在标注数据有限场景下的性能。通过将不同类条件轨迹在扩散采样过程中得到的分数进行凸混合处理,我们生成了具有挑战性的合成样本,这些样本显著提升了所有研究基准中的判别能力。系统地调查了用于混合的类别选择策略,并发现当结合在判别器嵌入空间中距离较远而非生成器条件空间中接近的类时,可以获得更大的性能提升。此外,实证研究表明,在标准度量下,生成器学习到的条件空间与判别器的嵌入空间之间的相关性极小。我们的方法无需广泛的参数搜索即可实现显著的性能改进,展示了在训练判别模型方面的实用优势,并有效地缓解了关于大规模数据集收集的问题。 论文网站: [这里插入链接]
https://arxiv.org/abs/2506.10226
This study conducts an empirical examination of MLP networks investigated through a rigorous methodical experimentation process involving three diverse datasets: TinyFace, Heart Disease, and Iris. Study Overview: The study includes three key methods: a) a baseline training using the default settings for the Multi-Layer Perceptron (MLP), b) feature selection using Genetic Algorithm (GA) based refinement c) Principal Component Analysis (PCA) based dimension reduction. The results show important information on how such techniques affect performance. While PCA had showed benefits in low-dimensional and noise-free datasets GA consistently increased accuracy in complex datasets by accurately identifying critical features. Comparison reveals that feature selection and dimensionality reduction play interdependent roles in enhancing MLP performance. The study contributes to the literature on feature engineering and neural network parameter optimization, offering practical guidelines for a wide range of machine learning tasks
这项研究通过严格的系统实验过程对多层感知机(MLP)网络进行了实证考察,涉及三个不同的数据集:TinyFace、心脏病和鸢尾花。**研究概览**:该研究包括三种关键方法: a) 使用默认设置进行基线训练的Multi-Layer Perceptron (MLP) b) 采用遗传算法(GA)优化后的特征选择 c) 基于主成分分析(PCA)的降维 结果展示了这些技术如何影响性能的重要信息。在低维度和噪声较少的数据集中,PCA显示出其益处;而在复杂数据集上,基于GA的精确识别关键特征则持续提高了准确率。比较显示,特征选择与降维在增强MLP性能方面扮演着相互依赖的角色。 该研究为特征工程及神经网络参数优化领域的文献做出了贡献,并提供了适用于广泛机器学习任务的实际指导方针。
https://arxiv.org/abs/2506.10184
Reconstructing facial images from black-box recognition models poses a significant privacy threat. While many methods require access to embeddings, we address the more challenging scenario of model inversion using only similarity scores. This paper introduces DarkerBB, a novel approach that reconstructs color faces by performing zero-order optimization within a PCA-derived eigenface space. Despite this highly limited information, experiments on LFW, AgeDB-30, and CFP-FP benchmarks demonstrate that DarkerBB achieves state-of-the-art verification accuracies in the similarity-only setting, with competitive query efficiency.
从黑盒识别模型中重构面部图像构成了重大的隐私威胁。尽管许多方法需要访问嵌入(embeddings),但我们解决了仅使用相似度分数进行模型反转的更具挑战性的情况。本文介绍了 DarkerBB,这是一种新颖的方法,它通过在由 PCA 导出的特征脸空间内执行零阶优化来重建彩色面部图像。尽管信息极为有限,在 LFW、AgeDB-30 和 CFP-FP 基准测试上的实验表明,DarkerBB 在仅基于相似度的情景下达到了最先进的验证准确率,并且具有竞争力的查询效率。
https://arxiv.org/abs/2506.09777
Face recognition using 3D point clouds is gaining growing interest, while raw point clouds often contain a significant amount of noise due to imperfect sensors. In this paper, an end-to-end 3D face recognition on a noisy point cloud is proposed, which synergistically integrates the denoising and recognition modules. Specifically, a Conditional Generative Adversarial Network on Three Orthogonal Planes (cGAN-TOP) is designed to effectively remove the noise in the point cloud, and recover the underlying features for subsequent recognition. A Linked Dynamic Graph Convolutional Neural Network (LDGCNN) is then adapted to recognize faces from the processed point cloud, which hierarchically links both the local point features and neighboring features of multiple scales. The proposed method is validated on the Bosphorus dataset. It significantly improves the recognition accuracy under all noise settings, with a maximum gain of 14.81%.
基于三维点云的面部识别技术正日益受到关注,然而原始点云数据通常因传感器不完善而包含大量噪声。本文提出了一种针对含噪点云的端到端3D面部识别方法,该方法协同集成了去噪和识别模块。 具体而言,设计了一个基于三个正交平面的条件生成对抗网络(cGAN-TOP),以有效去除点云中的噪声,并恢复后续识别所需的底层特征。接着采用了一种链接动态图卷积神经网络(LDGCNN)来从处理后的点云中识别人脸,该方法层次化地连接了局部点特征及其多尺度邻域特征。 所提出的方法在博斯普鲁斯数据集上进行了验证,在所有噪声设置下均显著提高了识别准确率,最高提升了14.81%。
https://arxiv.org/abs/2506.06864
Surveillance systems play a critical role in security and reconnaissance, but their performance is often compromised by low-quality images and videos, leading to reduced accuracy in face recognition. Additionally, existing AI-based facial analysis models suffer from biases related to skin tone variations and partially occluded faces, further limiting their effectiveness in diverse real-world scenarios. These challenges are the results of data limitations and imbalances, where available training datasets lack sufficient diversity, resulting in unfair and unreliable facial recognition performance. To address these issues, we propose a data-driven platform that enhances surveillance capabilities by generating synthetic training data tailored to compensate for dataset biases. Our approach leverages deep learning-based facial attribute manipulation and reconstruction using autoencoders and Generative Adversarial Networks (GANs) to create diverse and high-quality facial datasets. Additionally, our system integrates an image enhancement module, improving the clarity of low-resolution or occluded faces in surveillance footage. We evaluate our approach using the CelebA dataset, demonstrating that the proposed platform enhances both training data diversity and model fairness. This work contributes to reducing bias in AI-based facial analysis and improving surveillance accuracy in challenging environments, leading to fairer and more reliable security applications.
监控系统在安全和侦察中扮演着关键角色,但其性能常常因图像和视频质量低劣而受损,导致面部识别的准确性降低。此外,现有的基于人工智能的面部分析模型受肤色差异及部分遮挡脸部的影响而产生偏差,进一步限制了它们在多样化现实场景中的有效性。这些挑战源于数据限制和不平衡问题,现有训练数据集缺乏足够的多样性,从而导致不公平且不可靠的面部识别性能。 为了解决这些问题,我们提出了一种基于数据驱动平台的方法,通过生成合成训练数据来增强监控能力,并弥补数据集偏差。我们的方法利用深度学习技术进行面部属性操作与重建,使用自动编码器和生成对抗网络(GAN)创建多样化的高质量面部图像数据集。此外,该系统还整合了一个图像增强模块,以提高低分辨率或部分遮挡的面部在监控录像中的清晰度。 我们通过CelebA数据集验证了这种方法的有效性,结果表明所提出的平台可以提升训练数据多样性并改善模型公平性。这项工作有助于减少基于人工智能的面部分析中的偏差,并在充满挑战的环境中提升监控系统的准确性,从而为更公正和可靠的安防应用提供支持。
https://arxiv.org/abs/2506.06578
Face recognition under extreme head poses is a challenging task. Ideally, a face recognition system should perform well across different head poses, which is known as pose-invariant face recognition. To achieve pose invariance, current approaches rely on sophisticated methods, such as face frontalization and various facial feature extraction model architectures. However, these methods are somewhat impractical in real-life settings and are typically evaluated on small scientific datasets, such as Multi-PIE. In this work, we propose the inverse method of face frontalization, called face defrontalization, to augment the training dataset of facial feature extraction model. The method does not introduce any time overhead during the inference step. The method is composed of: 1) training an adapted face defrontalization FFWM model on a frontal-profile pairs dataset, which has been preprocessed using our proposed face alignment method; 2) training a ResNet-50 facial feature extraction model based on ArcFace loss on a raw and randomly defrontalized large-scale dataset, where defrontalization was performed with our previously trained face defrontalization model. Our method was compared with the existing approaches on four open-access datasets: LFW, AgeDB, CFP, and Multi-PIE. Defrontalization shows improved results compared to models without defrontalization, while the proposed adjustments show clear superiority over the state-of-the-art face frontalization FFWM method on three larger open-access datasets, but not on the small Multi-PIE dataset for extreme poses (75 and 90 degrees). The results suggest that at least some of the current methods may be overfitted to small datasets.
极端姿态下的面部识别是一个具有挑战性的任务。理想情况下,一个面部识别系统应该能够在不同的头部姿势下表现良好,这被称为无姿态变化的面部识别。为了实现这一目标,目前的方法依赖于复杂的技术,如面部正面化和各种面部特征提取模型架构。然而,在实际应用中这些方法往往不够实用,并且通常在小规模科学数据集(如Multi-PIE)上进行评估。 我们提出了一种逆向技术——面部非正面化(defrontalization),用来增强面部特征提取模型的训练数据集。该方法不会增加推理步骤的时间开销。具体来说,我们的方法包括以下两个部分: 1. 在预先处理过的脸部对齐的数据集中(使用我们提出的面部对齐方法),通过训练一个自适应面部非正面化(FFWM)模型。 2. 基于ArcFace损失函数,在原始和随机非正面化的大型数据集上训练ResNet-50面部特征提取模型,其中非正面化过程由我们之前训练的面部非正面化模型执行。 我们将该方法与现有技术进行了比较,并在四个开源数据集中进行了测试:LFW、AgeDB、CFP以及Multi-PIE。实验结果显示,在三个更大的开放访问数据集上,我们的方法相对于没有进行非正面化的模型和最新的面部正面化FFWM方法表现出明显的优越性;但在极端姿势(75度和90度)下的小型Multi-PIE数据集中并没有显示出优势。 这些结果表明,至少部分当前的方法可能在小规模数据集上存在过拟合的问题。
https://arxiv.org/abs/2506.04496
Bias has been a constant in face recognition models. Over the years, researchers have looked at it from both the model and the data point of view. However, their approach to mitigation of data bias was limited and lacked insight on the real nature of the problem. Here, in this document, we propose to revise our use of ethnicity labels as a continuous variable instead of a discrete value per identity. We validate our formulation both experimentally and theoretically, showcasing that not all identities from one ethnicity contribute equally to the balance of the dataset; thus, having the same number of identities per ethnicity does not represent a balanced dataset. We further show that models trained on datasets balanced in the continuous space consistently outperform models trained on data balanced in the discrete space. We trained more than 65 different models, and created more than 20 subsets of the original datasets.
偏见一直是面部识别模型中的一个持续问题。多年来,研究人员从模型和数据两个角度对此进行了研究。然而,他们缓解数据偏差的方法是有限的,并且缺乏对问题本质的理解。在此文档中,我们提出将种族标签作为连续变量而非每个身份的一个离散值来使用。我们通过实验和理论验证了我们的方法,展示了来自同一族裔的不同身份对于数据集平衡的贡献并不相同;因此,拥有相同比例的各族裔身份并不代表一个均衡的数据集。此外,我们还证明在连续空间中平衡的数据集上训练出的模型始终优于在离散空间中平衡的数据集上训练出的模型。为了支持这一结论,我们训练了超过65种不同的模型,并创建了原始数据集的20多个子集。
https://arxiv.org/abs/2506.01532
We consider the problem of estimating a regularization parameter, or a shrinkage coefficient $\alpha \in (0,1)$ for Regularized Tyler's M-estimator (RTME). In particular, we propose to estimate an optimal shrinkage coefficient by setting $\alpha$ as the solution to a suitably chosen objective function; namely the leave-one-out cross-validated (LOOCV) log-likelihood loss. Since LOOCV is computationally prohibitive even for moderate sample size $n$, we propose a computationally efficient approximation for the LOOCV log-likelihood loss that eliminates the need for invoking the RTME procedure $n$ times for each sample left out during the LOOCV procedure. This approximation yields an $O(n)$ reduction in the running time complexity for the LOOCV procedure, which results in a significant speedup for computing the LOOCV estimate. We demonstrate the efficiency and accuracy of the proposed approach on synthetic high-dimensional data sampled from heavy-tailed elliptical distributions, as well as on real high-dimensional datasets for object recognition, face recognition, and handwritten digit's recognition. Our experiments show that the proposed approach is efficient and consistently more accurate than other methods in the literature for shrinkage coefficient estimation.
我们考虑了估计正则化参数或Regularized Tyler的M-估计量(RTME)中的收缩系数$\alpha \in (0,1)$的问题。具体而言,我们建议通过将$\alpha$设置为一个合适的目标函数的解来估计最优的收缩系数;即,leave-one-out交叉验证(LOOCV)对数似然损失函数。由于即使对于中等样本量$n$,LOOCV在计算上也是不可行的,我们提出了一种用于LOOCV对数似然损失的计算效率高的近似方法,该方法消除了在LOOCV过程中每次留出一个样本时调用RTME程序$n$次的需求。这种近似方法将LOOCV过程的时间复杂度降低了$O(n)$,从而大大加快了LOOCV估计的计算速度。我们在从重型长尾椭圆分布中采样的合成高维数据以及用于物体识别、人脸识别和手写数字识别的实际高维数据集上展示了所提出的方法的有效性和准确性。我们的实验表明,与文献中的其他收缩系数估算方法相比,所提出的方法不仅效率更高,而且始终更准确。
https://arxiv.org/abs/2505.24781