Face anti-spoofing (FAS) and adversarial detection (FAD) have been regarded as critical technologies to ensure the safety of face recognition systems. As a consequence of their limited practicality and generalization, some existing methods aim to devise a framework capable of concurrently detecting both threats to address the challenge. Nevertheless, these methods still encounter challenges of insufficient generalization and suboptimal robustness, potentially owing to the inherent drawback of discriminative models. Motivated by the rich structural and detailed features of face generative models, we propose FaceCat which utilizes the face generative model as a pre-trained model to improve the performance of FAS and FAD. Specifically, FaceCat elaborately designs a hierarchical fusion mechanism to capture rich face semantic features of the generative model. These features then serve as a robust foundation for a lightweight head, designed to execute FAS and FAD tasks simultaneously. As relying solely on single-modality data often leads to suboptimal performance, we further propose a novel text-guided multi-modal alignment strategy that utilizes text prompts to enrich feature representation, thereby enhancing performance. For fair evaluations, we build a comprehensive protocol with a wide range of 28 attack types to benchmark the performance. Extensive experiments validate the effectiveness of FaceCat generalizes significantly better and obtains excellent robustness against input transformations.
脸部抗伪造(FAS)和对抗检测(FAD)被认为是确保面部识别系统安全的关键技术。由于其有限的可行性和泛化能力,一些现有方法旨在设计一个能够同时检测这两种威胁的框架来应对挑战。然而,这些方法仍然遇到了泛化不足和劣劣的鲁棒性问题,可能是因为判别模型的固有缺陷。受到面部生成模型的丰富结构和详细特征的启发,我们提出了FaceCat,它利用预训练的生成模型作为基础来提高FAS和FAD的性能。具体来说,FaceCat详细设计了一个层次融合机制,以捕捉生成模型的丰富脸部语义特征。这些特征 then 作为轻量级头的稳健基础,旨在同时执行FAS和FAD任务。由于仅依赖单模态数据往往导致性能 suboptimal,我们还提出了一个新颖的基于文本的多模态对齐策略,利用文本提示来丰富特征表示,从而提高性能。为了进行公平评估,我们建立了一个包含28种攻击类型的全面协议来对比性能。 extensive实验证实,FaceCat的泛化效果显著更好,对输入变换具有出色的鲁棒性。
https://arxiv.org/abs/2404.09193
Face recognition systems are frequently subjected to a variety of physical and digital attacks of different types. Previous methods have achieved satisfactory performance in scenarios that address physical attacks and digital attacks, respectively. However, few methods are considered to integrate a model that simultaneously addresses both physical and digital attacks, implying the necessity to develop and maintain multiple models. To jointly detect physical and digital attacks within a single model, we propose an innovative approach that can adapt to any network architecture. Our approach mainly contains two types of data augmentation, which we call Simulated Physical Spoofing Clues augmentation (SPSC) and Simulated Digital Spoofing Clues augmentation (SDSC). SPSC and SDSC augment live samples into simulated attack samples by simulating spoofing clues of physical and digital attacks, respectively, which significantly improve the capability of the model to detect "unseen" attack types. Extensive experiments show that SPSC and SDSC can achieve state-of-the-art generalization in Protocols 2.1 and 2.2 of the UniAttackData dataset, respectively. Our method won first place in "Unified Physical-Digital Face Attack Detection" of the 5th Face Anti-spoofing Challenge@CVPR2024. Our final submission obtains 3.75% APCER, 0.93% BPCER, and 2.34% ACER, respectively. Our code is available at this https URL.
面部识别系统常常会受到各种类型的物理和数字攻击。之前的方法在处理物理攻击和数字攻击的场景方面都取得了相当不错的性能。然而,很少有方法被认为整合了一个同时处理物理和数字攻击的模型,这表明需要开发和维护多个模型。为了在单个模型中共同检测物理和数字攻击,我们提出了一个创新的方法,可以适应任何网络架构。我们主要包含两种数据增强类型,我们称之为模拟物理 spoofing 线索增强(SPSC)和模拟数字 spoofing 线索增强(SDSC)。SPSC和SDSC通过模拟物理和数字攻击的 spoofing 线索将实时样本转换为模拟攻击样本,显著提高了模型的检测“未见”攻击类型的能力。广泛的实验证明,SPSC和SDSC可以在UniAttackData数据集中的Protocol 2.1和2.2上实现最先进的泛化。我们的方法在2024年CVPR的“统一物理-数字面部攻击检测”挑战中获得第一。我们的最终提交获得了3.75%的APCER,0.93%的BPCER和2.34%的ACER。我们的代码可在此处下载:https://www.thunlock.org/thunlock/。
https://arxiv.org/abs/2404.08450
With the advent of social media, fun selfie filters have come into tremendous mainstream use affecting the functioning of facial biometric systems as well as image recognition systems. These filters vary from beautification filters and Augmented Reality (AR)-based filters to filters that modify facial landmarks. Hence, there is a need to assess the impact of such filters on the performance of existing face recognition systems. The limitation associated with existing solutions is that these solutions focus more on the beautification filters. However, the current AR-based filters and filters which distort facial key points are in vogue recently and make the faces highly unrecognizable even to the naked eye. Also, the filters considered are mostly obsolete with limited variations. To mitigate these limitations, we aim to perform a holistic impact analysis of the latest filters and propose an user recognition model with the filtered images. We have utilized a benchmark dataset for baseline images, and applied the latest filters over them to generate a beautified/filtered dataset. Next, we have introduced a model FaceFilterNet for beautified user recognition. In this framework, we also utilize our model to comment on various attributes of the person including age, gender, and ethnicity. In addition, we have also presented a filter-wise impact analysis on face recognition, age estimation, gender, and ethnicity prediction. The proposed method affirms the efficacy of our dataset with an accuracy of 87.25% and an optimal accuracy for facial attribute analysis.
随着社交媒体的出现,有趣的自拍滤镜已经进入了 mainstream 使用,对面部生物特征系统和图像识别系统产生了重大影响。这些滤镜从美颜滤镜和基于增强现实 (AR) 的滤镜到修改面部特征的滤镜。因此,有必要评估这类滤镜对现有面部识别系统性能的影响。现有解决方案的局限性在于,这些解决方案更关注美颜滤镜。然而,当前的 AR 基滤镜和扭曲面部关键点的滤镜最近很流行,使脸部高度难以识别,甚至对裸眼观察者来说也是如此。此外,考虑的滤镜大多是过时的,且变化有限。为了减轻这些限制,我们旨在对最新滤镜进行全面的评估,并提出了带有滤镜的用户识别模型。我们在基准图像上利用了基准数据集,并应用最新滤镜生成一个美颜/滤镜化数据集。接下来,我们引入了 FaceFilterNet 模型进行美颜用户识别。在这个框架下,我们还利用我们的模型来评论人员的各种属性,包括年龄、性别和种族。此外,我们还对面部识别、年龄估计、性别和种族预测进行了滤镜逐个影响分析。所提出的方法证实了我们的数据集的有效性,准确率为 87.25%,面部属性分析的最优准确率。
https://arxiv.org/abs/2404.08277
The advent of morphing attacks has posed significant security concerns for automated Face Recognition systems, raising the pressing need for robust and effective Morphing Attack Detection (MAD) methods able to effectively address this issue. In this paper, we focus on Differential MAD (D-MAD), where a trusted live capture, usually representing the criminal, is compared with the document image to classify it as morphed or bona fide. We show these approaches based on identity features are effective when the morphed image and the live one are sufficiently diverse; unfortunately, the effectiveness is significantly reduced when the same approaches are applied to look-alike subjects or in all those cases when the similarity between the two compared images is high (e.g. comparison between the morphed image and the accomplice). Therefore, in this paper, we propose ACIdA, a modular D-MAD system, consisting of a module for the attempt type classification, and two modules for the identity and artifacts analysis on input images. Successfully addressing this task would allow broadening the D-MAD applications including, for instance, the document enrollment stage, which currently relies entirely on human evaluation, thus limiting the possibility of releasing ID documents with manipulated images, as well as the automated gates to detect both accomplices and criminals. An extensive cross-dataset experimental evaluation conducted on the introduced scenario shows that ACIdA achieves state-of-the-art results, outperforming literature competitors, while maintaining good performance in traditional D-MAD benchmarks.
随着融合攻击的出现,自动人脸识别系统面临着显著的安全问题,这也使得我们迫切需要能够有效应对这一问题的强大和有效的融合攻击检测(FAD)方法。在本文中,我们重点关注差分FAD(D-FAD),其中可信的活捉通常代表犯罪者,与文档图像进行比较以分类它为融合或真实。我们证明了基于身份特征的方法在融合图像和活捉图像足够 diverse 时是有效的;然而,当同样的方法应用于模拟对象或所有相同图像的相似度很高时(例如,融合图像与同伙的比较),效果会显著降低(例如,融合图像与同伙的比较)。因此,在本文中,我们提出了ACIdA,一种模块化的D-FAD系统,包括一个尝试类型分类模块和一个用于输入图像的身份和 artifacts 分析模块。成功解决此任务将使D-FAD应用范围扩大,包括诸如文档入学阶段等,该阶段完全依赖于人工评估,因此限制了发布带有编辑图像的ID文件以及检测两者(犯罪者和同伙)的可能性。在一个介绍的场景进行的广泛跨数据集实验评估显示,ACIdA取得了最先进的结果,超越了文献竞争对手,同时保持传统D-FAD基准测试中的良好性能。
https://arxiv.org/abs/2404.07667
The illegal disposal of trash is a major public health and environmental concern. Disposing of trash in unplanned places poses serious health and environmental risks. We should try to restrict public trash cans as much as possible. This research focuses on automating the penalization of litterbugs, addressing the persistent problem of littering in public places. Traditional approaches relying on manual intervention and witness reporting suffer from delays, inaccuracies, and anonymity issues. To overcome these challenges, this paper proposes a fully automated system that utilizes surveillance cameras and advanced computer vision algorithms for litter detection, object tracking, and face recognition. The system accurately identifies and tracks individuals engaged in littering activities, attaches their identities through face recognition, and enables efficient enforcement of anti-littering policies. By reducing reliance on manual intervention, minimizing human error, and providing prompt identification, the proposed system offers significant advantages in addressing littering incidents. The primary contribution of this research lies in the implementation of the proposed system, leveraging advanced technologies to enhance surveillance operations and automate the penalization of litterbugs.
垃圾随意丢弃是一个重大的公共卫生和环境问题。在未经规划的地方丢弃垃圾会带来严重的健康和环境风险。我们应该尽可能地限制公共场所的垃圾桶。这项研究专注于自动惩处乱扔垃圾者,解决公共场所乱扔垃圾的长期问题。传统方法依赖于人工干预和目击报告,存在延迟、不准确和匿名性问题。为了克服这些挑战,本文提出了一个完全自动化的系统,利用摄像头和先进的计算机视觉算法进行垃圾检测、物体追踪和面部识别。该系统准确识别和跟踪进行乱扔垃圾活动的人,通过面部识别附上他们的身份,并能够有效地执行禁止乱扔垃圾政策。通过减少对人工干预的依赖,降低人为错误,并提供及时的身份识别,所提出的系统在解决乱扔垃圾事件方面具有显著优势。本研究的最大贡献在于实施所提出的系统,利用先进技术增强监视操作并自动惩处乱扔垃圾者。
https://arxiv.org/abs/2404.07467
Face morphing attacks present an emerging threat to the face recognition system. On top of that, printing and scanning the morphed images could obscure the artifacts generated during the morphing process, which makes morphed image detection even harder. In this work, we investigate the impact that printing and scanning has on morphing attacks through a series of heterogeneous tests. Our experiments show that we can increase the possibility of a false match by up to 5.64% for DiM and 16.00% for StyleGAN2 when providing an image that has been printed and scanned, regardless it is morphed or bona fide, to a Face Recognition (FR) system. Likewise, using Frechet Inception Distance (FID) metric, strictly print-scanned morph attacks performed on average 9.185% stronger than non-print-scanned digital morphs.
面部变形攻击对面部识别系统构成了一个新兴的威胁。此外,打印和扫描变形后的图像可能掩盖在变形过程中产生的伪影,这就使得变形图像检测变得更加困难。在这项工作中,我们通过一系列异质测试研究了打印和扫描对变形攻击的影响。我们的实验结果表明,在提供已打印和扫描的图像的情况下,我们可以将伪匹配可能性增加至DiM的5.64%和StyleGAN2的16.00%。同样,使用弗雷歇感知距离(FID)度量,平均而言,打印扫描的变形攻击比非打印扫描的数字变形攻击强9.185%。
https://arxiv.org/abs/2404.06559
Face Anti-Spoofing (FAS) is crucial to safeguard Face Recognition (FR) Systems. In real-world scenarios, FRs are confronted with both physical and digital attacks. However, existing algorithms often address only one type of attack at a time, which poses significant limitations in real-world scenarios where FR systems face hybrid physical-digital threats. To facilitate the research of Unified Attack Detection (UAD) algorithms, a large-scale UniAttackData dataset has been collected. UniAttackData is the largest public dataset for Unified Attack Detection, with a total of 28,706 videos, where each unique identity encompasses all advanced attack types. Based on this dataset, we organized a Unified Physical-Digital Face Attack Detection Challenge to boost the research in Unified Attack Detections. It attracted 136 teams for the development phase, with 13 qualifying for the final round. The results re-verified by the organizing team were used for the final ranking. This paper comprehensively reviews the challenge, detailing the dataset introduction, protocol definition, evaluation criteria, and a summary of published results. Finally, we focus on the detailed analysis of the highest-performing algorithms and offer potential directions for unified physical-digital attack detection inspired by this competition. Challenge Website: this https URL.
面对抗伪造(FAS)在保障人脸识别(FR)系统中具有至关重要的作用。在现实场景中,FR系统面临着实体和数字攻击。然而,现有的算法通常仅处理一种攻击类型,这使得FR系统在面临混合实体和数字威胁的现实场景中具有显著的限制。为了促进统一攻击检测(UAD)算法的研发,已经收集了一个大型统一攻击检测数据集。统一攻击检测数据集是统一攻击检测的最大公开数据集,共有28,706个视频,其中每个唯一身份涵盖了所有高级攻击类型。基于这个数据集,我们组织了一个统一实体-数字人脸攻击检测挑战,以促进统一攻击检测的研究。在开发阶段吸引了136个团队,其中13个团队晋级决赛。由组织团队确定的结果用于决赛排名。本文全面回顾了该挑战,详细介绍了数据集的介绍、协议定义、评估标准和已发布结果的总结。最后,我们重点分析了最高性能的算法,并从这一竞赛中提供了统一实体-数字攻击检测的可能方向。挑战网站:https://this URL。
https://arxiv.org/abs/2404.06211
Morphing attacks are an emerging threat to state-of-the-art Face Recognition (FR) systems, which aim to create a single image that contains the biometric information of multiple identities. Diffusion Morphs (DiM) are a recently proposed morphing attack that has achieved state-of-the-art performance for representation-based morphing attacks. However, none of the existing research on DiMs have leveraged the iterative nature of DiMs and left the DiM model as a black box, treating it no differently than one would a Generative Adversarial Network (GAN) or Varational AutoEncoder (VAE). We propose a greedy strategy on the iterative sampling process of DiM models which searches for an optimal step guided by an identity-based heuristic function. We compare our proposed algorithm against ten other state-of-the-art morphing algorithms using the open-source SYN-MAD 2022 competition dataset. We find that our proposed algorithm is unreasonably effective, fooling all of the tested FR systems with an MMPMR of 100%, outperforming all other morphing algorithms compared.
形态攻击是对最先进的Face Recognition(FR)系统构成的一种新兴威胁,这些系统旨在创建一个包含多个身份生物特征信息的单张图像。扩散形态(DiM)是一种最近提出的形态攻击,已经在基于表示的形态攻击上取得了与最先进技术相匹敌的性能。然而,现有的关于DiM的研究都没有利用DiM的迭代性质,将DiM模型视为一个黑盒,将其与生成对抗网络(GAN)或变分自编码器(VAE)等同对待。我们提出了一种在DiM模型迭代采样过程中采用贪心策略的算法,该算法通过基于身份的启发式函数寻找最优的步骤。我们使用开源的SYN-MAD 2022竞赛数据集比较我们提出的算法与十种最先进的形态算法。我们发现,我们提出的算法效果异常出色,在所有测试的FR系统上具有MMPMR 100%的假阳性率,优于所有其他形态算法。
https://arxiv.org/abs/2404.06025
Large-scale face recognition datasets are collected by crawling the Internet and without individuals' consent, raising legal, ethical, and privacy concerns. With the recent advances in generative models, recently several works proposed generating synthetic face recognition datasets to mitigate concerns in web-crawled face recognition datasets. This paper presents the summary of the Synthetic Data for Face Recognition (SDFR) Competition held in conjunction with the 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2024) and established to investigate the use of synthetic data for training face recognition models. The SDFR competition was split into two tasks, allowing participants to train face recognition systems using new synthetic datasets and/or existing ones. In the first task, the face recognition backbone was fixed and the dataset size was limited, while the second task provided almost complete freedom on the model backbone, the dataset, and the training pipeline. The submitted models were trained on existing and also new synthetic datasets and used clever methods to improve training with synthetic data. The submissions were evaluated and ranked on a diverse set of seven benchmarking datasets. The paper gives an overview of the submitted face recognition models and reports achieved performance compared to baseline models trained on real and synthetic datasets. Furthermore, the evaluation of submissions is extended to bias assessment across different demography groups. Lastly, an outlook on the current state of the research in training face recognition models using synthetic data is presented, and existing problems as well as potential future directions are also discussed.
大规模面部识别数据集是通过爬取互联网收集的,未经个人许可,引发法律、伦理和隐私问题。随着生成模型的最新进展,最近的一些工作提出了生成合成面部识别数据集以减轻爬取互联网面部识别数据集所带来的担忧。本文概述了与IEEE 18届国际面部和手势识别会议(FG 2024)共同举办的合成数据面部识别(SDFR)比赛的摘要,以研究合成数据在训练面部识别模型中的使用。SDFR比赛分为两个任务,允许参赛者使用新的合成数据集或现有数据集来训练面部识别系统。在第一任务中,面部识别骨架被固定,数据集大小有限,而第二任务为模型骨架、数据和训练流程提供了几乎完全的自由。提交的比赛模型在现有和新的合成数据集上进行训练,并采用了一些巧妙的方法来提高利用合成数据进行训练的效果。提交的模型在七个基准数据集上进行评估并排名。本文概述了提交的面部识别模型,并报告了与基于真实和合成数据进行训练的基线模型的性能。此外,评估提交模型的方法还扩展到了不同 demographic 群体之间的偏见评估。最后,本文概述了使用合成数据训练面部识别模型的当前研究进展,并讨论了现有问题和未来可能的发展方向。
https://arxiv.org/abs/2404.04580
Recent advances in deep face recognition have spurred a growing demand for large, diverse, and manually annotated face datasets. Acquiring authentic, high-quality data for face recognition has proven to be a challenge, primarily due to privacy concerns. Large face datasets are primarily sourced from web-based images, lacking explicit user consent. In this paper, we examine whether and how synthetic face data can be used to train effective face recognition models with reduced reliance on authentic images, thereby mitigating data collection concerns. First, we explored the performance gap among recent state-of-the-art face recognition models, trained with synthetic data only and authentic (scarce) data only. Then, we deepened our analysis by training a state-of-the-art backbone with various combinations of synthetic and authentic data, gaining insights into optimizing the limited use of the latter for verification accuracy. Finally, we assessed the effectiveness of data augmentation approaches on synthetic and authentic data, with the same goal in mind. Our results highlighted the effectiveness of FR trained on combined datasets, particularly when combined with appropriate augmentation techniques.
近年来在深度人脸识别方面的进步引发了对于大量、多样且手动标注的人脸数据的需求。获取真实、高质量的人脸数据对于人脸识别来说证明是一个挑战,主要原因是隐私问题。大型人脸数据集主要来源于网络图片,缺乏明确的用户许可。在本文中,我们研究了合成人脸数据是否可以用于训练效果更好的人脸识别模型,从而减轻数据收集担忧。首先,我们探讨了仅使用合成数据和真实数据训练的最近最先进的人脸识别模型的性能差距。然后,我们通过训练具有各种合成和真实数据组合的最先进骨架,深入探讨了优化后者的验证准确性。最后,我们评估了数据增强方法在合成和真实数据上的效果,目标是相同的。我们的结果表明,在结合数据集训练人脸识别模型时,FR的效果尤为显著,尤其是在与合适的数据增强技术相结合时。
https://arxiv.org/abs/2404.03537
The vulnerability of deep neural networks to adversarial patches has motivated numerous defense strategies for boosting model robustness. However, the prevailing defenses depend on single observation or pre-established adversary information to counter adversarial patches, often failing to be confronted with unseen or adaptive adversarial attacks and easily exhibiting unsatisfying performance in dynamic 3D environments. Inspired by active human perception and recurrent feedback mechanisms, we develop Embodied Active Defense (EAD), a proactive defensive strategy that actively contextualizes environmental information to address misaligned adversarial patches in 3D real-world settings. To achieve this, EAD develops two central recurrent sub-modules, i.e., a perception module and a policy module, to implement two critical functions of active vision. These models recurrently process a series of beliefs and observations, facilitating progressive refinement of their comprehension of the target object and enabling the development of strategic actions to counter adversarial patches in 3D environments. To optimize learning efficiency, we incorporate a differentiable approximation of environmental dynamics and deploy patches that are agnostic to the adversary strategies. Extensive experiments demonstrate that EAD substantially enhances robustness against a variety of patches within just a few steps through its action policy in safety-critical tasks (e.g., face recognition and object detection), without compromising standard accuracy. Furthermore, due to the attack-agnostic characteristic, EAD facilitates excellent generalization to unseen attacks, diminishing the averaged attack success rate by 95 percent across a range of unseen adversarial attacks.
深度神经网络对对抗性补丁的脆弱性激起了许多提高模型鲁棒性的防御策略。然而,现有的防御方法依赖于单个观察或预先确定的对抗性信息来对抗对抗性补丁,往往无法应对未见或自适应的对抗性攻击,并且在动态三维环境中表现出不令人满意的性能。受到人类主动感知和递归反馈机制的启发,我们开发了Embodied Active Defense(EAD),一种主动的防御策略,它积极地上下文化环境信息来解决三维现实场景中的错位对抗性补丁。为了实现这一目标,EAD开发了两个核心的循环子模块,即感知模块和策略模块,以实现主动视觉的两个关键功能。这些模型通过循环处理一系列的信念和观察,促进对目标对象的深入了解,并能够开发出针对三维环境中对抗性补丁的战略性行动。为了优化学习效率,我们引入了一种不同的环境动态的有条件近似,并部署对攻击策略无依赖的补丁。大量实验证明,EAD通过其动作策略在安全关键任务(如面部识别和物体检测)中显著增强了鲁棒性,而不会牺牲标准准确性。此外,由于攻击无关的特点,EAD有助于将未见攻击引导到很高的泛化水平,将未见攻击的平均成功率降低95%。
https://arxiv.org/abs/2404.00540
Cattle face recognition holds paramount significance in domains such as animal husbandry and behavioral research. Despite significant progress in confined environments, applying these accomplishments in wild settings remains challenging. Thus, we create the first large-scale cattle face recognition dataset, ICRWE, for wild environments. It encompasses 483 cattle and 9,816 high-resolution image samples. Each sample undergoes annotation for face features, light conditions, and face orientation. Furthermore, we introduce a novel parallel attention network, PANet. Comprising several cascaded Transformer modules, each module incorporates two parallel Position Attention Modules (PAM) and Feature Mapping Modules (FMM). PAM focuses on local and global features at each image position through parallel channel attention, and FMM captures intricate feature patterns through non-linear mappings. Experimental results indicate that PANet achieves a recognition accuracy of 88.03% on the ICRWE dataset, establishing itself as the current state-of-the-art approach. The source code is available in the supplementary materials.
牛 facedial recognition holds paramount significance in domains such as animal husbandry and behavioral research. Despite significant progress in confined environments, applying these accomplishments in wild settings remains challenging. Therefore, we create the first large-scale cattle face recognition dataset, ICRWE, for wild environments. It encompasses 483 cattle and 9,816 high-resolution image samples. Each sample undergoes annotation for face features, light conditions, and face orientation. Furthermore, we introduce a novel parallel attention network, PANet. Comprising several cascaded Transformer modules, each module incorporates two parallel Position Attention Modules (PAM) and Feature Mapping Modules (FMM). PAM focuses on local and global features at each image position through parallel channel attention, and FMM captures intricate feature patterns through non-linear mappings. Experimental results indicate that PANet achieves a recognition accuracy of 88.03% on the ICRWE dataset, establishing itself as the current state-of-the-art approach. The source code is available in the supplementary materials.
https://arxiv.org/abs/2403.19980
Atomic force microscopy (AFM or SPM) imaging is one of the best matches with machine learning (ML) analysis among microscopy techniques. The digital format of AFM images allows for direct utilization in ML algorithms without the need for additional processing. Additionally, AFM enables the simultaneous imaging of distributions of over a dozen different physicochemical properties of sample surfaces, a process known as multidimensional imaging. While this wealth of information can be challenging to analyze using traditional methods, ML provides a seamless approach to this task. However, the relatively slow speed of AFM imaging poses a challenge in applying deep learning methods broadly used in image recognition. This Prospective is focused on ML recognition/classification when using a relatively small number of AFM images, small database. We discuss ML methods other than popular deep-learning neural networks. The described approach has already been successfully used to analyze and classify the surfaces of biological cells. It can be applied to recognize medical images, specific material processing, in forensic studies, even to identify the authenticity of arts. A general template for ML analysis specific to AFM is suggested, with a specific example of the identification of cell phenotype. Special attention is given to the analysis of the statistical significance of the obtained results, an important feature that is often overlooked in papers dealing with machine learning. A simple method for finding statistical significance is also described.
原子力显微镜(AFM或SPM)成像是一种在显微镜技术中与机器学习(ML)分析的最佳匹配。AFM图像的数字格式允许直接应用于ML算法,而不需要进行额外处理。此外,AFM能够同时成像样本表面超过十几种不同的物理化学性质,这是一种称为多维成像的过程。虽然这些信息使用传统方法分析起来可能具有挑战性,但ML为这项任务提供了顺利的方法。然而,AFM成像的速度相对较慢,这使得广泛应用于图像识别的深度学习方法在应用上具有一定的限制。本研究专注于使用相对较小的AFM图像进行ML识别/分类。我们讨论了除流行深度学习神经网络之外的其他ML方法。已经成功应用于分析生物细胞的表面。该方法可用于识别医学图像、特定材料处理和法医研究,甚至可用于识别艺术品的原真性。针对AFM的ML分析的具体指导建议,并给出一个具体的细胞表型识别的例子。特别关注分析所得结果的统计显著性,这是在机器学习论文中经常被忽视的重要特征。还描述了一种简单的方法来查找统计显著性。
https://arxiv.org/abs/2403.16230
In this work, we propose Cell Variational Information Bottleneck Network (cellVIB), a convolutional neural network using information bottleneck mechanism, which can be combined with the latest feedforward network architecture in an end-to-end training method. Our Cell Variational Information Bottleneck Network is constructed by stacking VIB cells, which generate feature maps with uncertainty. As layers going deeper, the regularization effect will gradually increase, instead of directly adding excessive regular constraints to the output layer of the model as in Deep VIB. Under each VIB cell, the feedforward process learns an independent mean term and an standard deviation term, and predicts the Gaussian distribution based on them. The feedback process is based on reparameterization trick for effective training. This work performs an extensive analysis on MNIST dataset to verify the effectiveness of each VIB cells, and provides an insightful analysis on how the VIB cells affect mutual information. Experiments conducted on CIFAR-10 also prove that our cellVIB is robust against noisy labels during training and against corrupted images during testing. Then, we validate our method on PACS dataset, whose results show that the VIB cells can significantly improve the generalization performance of the basic model. Finally, in a more complex representation learning task, face recognition, our network structure has also achieved very competitive results.
在这项工作中,我们提出了Cell Variational Information Bottleneck Network (cellVIB),一种使用信息 bottleneck机制的卷积神经网络,可以与端到端训练方法中最新的前馈网络架构相结合。我们的cellVIB是由VIB单元格堆叠而成的,它们生成了具有不确定性的特征图。随着层数的加深,正则化效应将逐渐增加,而不是直接向模型的输出层添加过度的正则约束,就像在Deep VIB中一样。在每個VIB单元格中,前馈过程学习到一个独立均值和一个标准差,并基于它们预测高斯分布。反馈过程基于参数重排技巧,用于有效的训练。这项工作对MNIST数据集进行了广泛的分析,以验证每个VIB单元的有效性,并提供了关于VIB单元如何影响互信息的有洞察性的分析。在CIFAR-10数据集上进行实验也证明,我们的cellVIB在训练过程中对噪音标签和测试过程中的损坏图像具有鲁棒性。然后,我们在PACS数据集上验证了我们的方法,该数据集的结果表明,VIB单元可以显著提高基本模型的泛化性能。最后,在更加复杂的表示学习任务中,例如面部识别,我们的网络结构也取得了非常竞争力的结果。
https://arxiv.org/abs/2403.15082
In this paper, we address the challenge of making ViT models more robust to unseen affine transformations. Such robustness becomes useful in various recognition tasks such as face recognition when image alignment failures occur. We propose a novel method called KP-RPE, which leverages key points (e.g.~facial landmarks) to make ViT more resilient to scale, translation, and pose variations. We begin with the observation that Relative Position Encoding (RPE) is a good way to bring affine transform generalization to ViTs. RPE, however, can only inject the model with prior knowledge that nearby pixels are more important than far pixels. Keypoint RPE (KP-RPE) is an extension of this principle, where the significance of pixels is not solely dictated by their proximity but also by their relative positions to specific keypoints within the image. By anchoring the significance of pixels around keypoints, the model can more effectively retain spatial relationships, even when those relationships are disrupted by affine transformations. We show the merit of KP-RPE in face and gait recognition. The experimental results demonstrate the effectiveness in improving face recognition performance from low-quality images, particularly where alignment is prone to failure. Code and pre-trained models are available.
在本文中,我们解决了使ViT模型对未见到的平移变换更加鲁棒的问题。这种鲁棒性在各种识别任务中变得有用,例如在面部识别中,当图像对齐失败时。我们提出了一种名为KP-RPE的新方法,它利用关键点(例如~面部关键点)使ViT更加弹性,对平移和姿态变化具有鲁棒性。我们首先观察到,相对位置编码(RPE)是将平移变换推广到ViT的好的方法。然而,RPE只能将模型注入到附近像素比远距离像素更重要这一先验知识。关键点RPE(KP-RPE)是这一原则的扩展,其中像素的重要性不仅由其邻近性决定,还由其与图像中特定关键点之间的相对位置决定。通过将像素的重要性锚定在关键点上,模型可以在平移变换破坏时更有效地保留空间关系。我们在面部和步态识别中展示了KP-RPE的优点。实验结果表明,从低质量图像中提高面部识别性能的有效方法,特别是在对齐容易失败的地方。代码和预训练模型可获得。
https://arxiv.org/abs/2403.14852
The widespread adoption of face recognition has led to increasing privacy concerns, as unauthorized access to face images can expose sensitive personal information. This paper explores face image protection against viewing and recovery attacks. Inspired by image compression, we propose creating a visually uninformative face image through feature subtraction between an original face and its model-produced regeneration. Recognizable identity features within the image are encouraged by co-training a recognition model on its high-dimensional feature representation. To enhance privacy, the high-dimensional representation is crafted through random channel shuffling, resulting in randomized recognizable images devoid of attacker-leverageable texture details. We distill our methodologies into a novel privacy-preserving face recognition method, MinusFace. Experiments demonstrate its high recognition accuracy and effective privacy protection. Its code is available at this https URL.
随着人脸识别的广泛应用,隐私问题越来越引起人们的关注,因为未经授权地访问人脸图像会泄露敏感的个人信息。本文探讨了防止观看和恢复攻击的人脸图像保护方法。为了实现这一目标,我们提出了通过在原始人脸和其模型的特征下采样来创建具有视觉上无信息性的人脸图像的方法。通过在图像中鼓励可识别身份特征的识别模型在其高维特征表示上进行共同训练,我们促进了识别模型的可识别性。为了增强隐私,我们通过随机通道洗牌来制作高维表示,从而生成无攻击者利用的纹理细节的随机可识别图像。我们将我们的方法归类为一种新的隐私保护人脸识别方法,称为MinusFace。实验证明,其高识别准确性和有效的隐私保护功能。其代码可在此处访问:https://www. this URL。
https://arxiv.org/abs/2403.12457
Face inpainting, the technique of restoring missing or damaged regions in facial images, is pivotal for applications like face recognition in occluded scenarios and image analysis with poor-quality captures. This process not only needs to produce realistic visuals but also preserve individual identity characteristics. The aim of this paper is to inpaint a face given periocular region (eyes-to-face) through a proposed new Generative Adversarial Network (GAN)-based model called Eyes-to-Face Network (E2F-Net). The proposed approach extracts identity and non-identity features from the periocular region using two dedicated encoders have been used. The extracted features are then mapped to the latent space of a pre-trained StyleGAN generator to benefit from its state-of-the-art performance and its rich, diverse and expressive latent space without any additional training. We further improve the StyleGAN output to find the optimal code in the latent space using a new optimization for GAN inversion technique. Our E2F-Net requires a minimum training process reducing the computational complexity as a secondary benefit. Through extensive experiments, we show that our method successfully reconstructs the whole face with high quality, surpassing current techniques, despite significantly less training and supervision efforts. We have generated seven eyes-to-face datasets based on well-known public face datasets for training and verifying our proposed methods. The code and datasets are publicly available.
面部修复技术,即在面部图像中恢复缺失或受损区域的算法,对于应用如在遮挡场景下进行面部识别和低质量图像分析来说至关重要。这一过程不仅要产生逼真的视觉效果,还应保留个体的身份特征。本文旨在通过一种基于提出的新生成对抗网络(GAN)模型,即Eyes-to-Face Network(E2F-Net),对给定的外侧眼部区域(从眼睛到脸)进行修复。该方法通过使用两个专门编码器从外侧眼部区域提取身份和无关特征。提取的特征随后映射到预训练的StyleGAN生成器的潜在空间,以利用其最先进的性能和丰富的、多样化和表现力的潜在空间而无需额外训练。我们进一步通过新GAN反向优化技术对StyleGAN输出进行优化,以找到在潜在空间中最佳的代码。我们的E2F-Net需要最小训练过程,作为其次要好处,从而降低计算复杂度。通过广泛的实验,我们发现,我们的方法在高质量地重构整个面部,超越现有技术,尽管训练和监督 efforts大大减少。我们已经基于知名公共面部数据集生成七个眼睛-to-face数据集,用于训练和验证我们所提出的方法。代码和数据集都是公开可用的。
https://arxiv.org/abs/2403.12197
This paper presents Arc2Face, an identity-conditioned face foundation model, which, given the ArcFace embedding of a person, can generate diverse photo-realistic images with an unparalleled degree of face similarity than existing models. Despite previous attempts to decode face recognition features into detailed images, we find that common high-resolution datasets (e.g. FFHQ) lack sufficient identities to reconstruct any subject. To that end, we meticulously upsample a significant portion of the WebFace42M database, the largest public dataset for face recognition (FR). Arc2Face builds upon a pretrained Stable Diffusion model, yet adapts it to the task of ID-to-face generation, conditioned solely on ID vectors. Deviating from recent works that combine ID with text embeddings for zero-shot personalization of text-to-image models, we emphasize on the compactness of FR features, which can fully capture the essence of the human face, as opposed to hand-crafted prompts. Crucially, text-augmented models struggle to decouple identity and text, usually necessitating some description of the given face to achieve satisfactory similarity. Arc2Face, however, only needs the discriminative features of ArcFace to guide the generation, offering a robust prior for a plethora of tasks where ID consistency is of paramount importance. As an example, we train a FR model on synthetic images from our model and achieve superior performance to existing synthetic datasets.
本文介绍了一种名为Arc2Face的身份条件面部模型,它通过ArcFace嵌入一个人的形象,可以生成具有无与伦比的脸部相似度的多样化的照片现实图像。然而,尽管以前尝试将脸部识别特征编码为详细图像,我们发现常见的分辨率数据集(如FFHQ)缺乏足够的标识来重构任何主题。因此,我们仔细放大了WebFace42M数据库,这是面部识别(FR)领域公共数据中最大的一个。Arc2Face在预训练的Stable Diffusion模型基础上进行了调整,仅基于标识向量进行条件生成。我们强调了FR特征的紧凑性,可以完全捕捉到人脸的本质,而不是通过手动定制提示来获得相似度。至关重要的是,文本增强模型往往难以将标识和文本分离,通常需要对给定的脸进行描述以实现满意的相似度。然而,Arc2Face只需要ArcFace的区分特征来指导生成,为许多任务提供了至关重要的ID一致性先验,这些任务的ID一致性至关重要。例如,我们将FR模型从我们的模型上合成合成图像进行训练,实现了优于现有合成数据集的性能。
https://arxiv.org/abs/2403.11641
For privacy and security concerns, the need to erase unwanted information from pre-trained vision models is becoming evident nowadays. In real-world scenarios, erasure requests originate at any time from both users and model owners. These requests usually form a sequence. Therefore, under such a setting, selective information is expected to be continuously removed from a pre-trained model while maintaining the rest. We define this problem as continual forgetting and identify two key challenges. (i) For unwanted knowledge, efficient and effective deleting is crucial. (ii) For remaining knowledge, the impact brought by the forgetting procedure should be minimal. To address them, we propose Group Sparse LoRA (GS-LoRA). Specifically, towards (i), we use LoRA modules to fine-tune the FFN layers in Transformer blocks for each forgetting task independently, and towards (ii), a simple group sparse regularization is adopted, enabling automatic selection of specific LoRA groups and zeroing out the others. GS-LoRA is effective, parameter-efficient, data-efficient, and easy to implement. We conduct extensive experiments on face recognition, object detection and image classification and demonstrate that GS-LoRA manages to forget specific classes with minimal impact on other classes. Codes will be released on \url{this https URL}.
为了隐私和安全问题,现在需要从预训练视觉模型中清除不需要的信息变得越来越明显。在现实场景中,清除请求可能来自用户和模型所有者,通常会形成一个序列。因此,在这种情况下,我们期望在保持其余信息的同时,持续从预训练模型中移除特定的信息。我们将这个问题称为持续遗忘,并确定两个关键挑战。 (i) 对于不需要的知识,高效的删除至关重要。 (ii) 对于保留的知识,遗忘过程所带来的影响应最小化。 为了解决这些问题,我们提出了Group Sparse LoRA (GS-LoRA)。具体来说,对于(i),我们使用LoRA模块对每个遗忘任务独立微调Transformer块中的FFN层,而对于(ii),采用简单的组稀疏 regularization,使自动选择特定LoRA组并消除其他组。GS-LoRA有效、参数效率高、数据效率高,并且易于实现。我们在面部识别、目标检测和图像分类上进行广泛的实验,并证明GS-LoRA能够以最小的影响对待定类别的其他类别。代码发布在[这个链接](https:// this URL)。
https://arxiv.org/abs/2403.11530
Face morphing attacks circumvent face recognition systems (FRSs) by creating a morphed image that contains multiple identities. However, existing face morphing attack methods either sacrifice image quality or compromise the identity preservation capability. Consequently, these attacks fail to bypass FRSs verification well while still managing to deceive human observers. These methods typically rely on global information from contributing images, ignoring the detailed information from effective facial regions. To address the above issues, we propose a novel morphing attack method to improve the quality of morphed images and better preserve the contributing identities. Our proposed method leverages the hierarchical generative network to capture both local detailed and global consistency information. Additionally, a mask-guided image blending module is dedicated to removing artifacts from areas outside the face to improve the image's visual quality. The proposed attack method is compared to state-of-the-art methods on three public datasets in terms of FRSs' vulnerability, attack detectability, and image quality. The results show our method's potential threat of deceiving FRSs while being capable of passing multiple morphing attack detection (MAD) scenarios.
脸部变形攻击绕过了面部识别系统(FRSs),通过创建一个包含多个身份的变形图像来实现。然而,现有的面部变形攻击方法要么牺牲图像质量,要么破坏身份保留能力。因此,这些攻击在绕过FRSs验证的同时也无法欺骗人类观察者。这些方法通常依赖于来自贡献图像的全局信息,而忽略了有效面部区域的具体信息。为了应对上述问题,我们提出了一种新的变形攻击方法,以提高变形图像的质量并更好地保留贡献者的身份。我们的方法利用分层生成网络来捕获局部详细和全局一致性信息。此外,还专门针对脸部以外区域的修复图像质量模块,以提高图像的视觉质量。我们将在三个公共数据集上与最先进的攻击方法进行比较,比较结果表明,我们的方法在欺骗FRSs的同时,能够通过多个变形攻击检测(MAD)场景。
https://arxiv.org/abs/2403.11101