This work is unique in the use of discrete wavelets that were built from or derived from Chebyshev polynomials of the second and third kind, filter the Discrete Second Chebyshev Wavelets Transform (DSCWT), and derive two effective filters. The Filter Discrete Third Chebyshev Wavelets Transform (FDTCWT) is used in the process of analyzing color images and removing noise and impurities that accompany the image, as well as because of the large amount of data that makes up the image as it is taken. These data are massive, making it difficult to deal with each other during transmission. However to address this issue, the image compression technique is used, with the image not losing information due to the readings that were obtained, and the results were satisfactory. Mean Square Error (MSE), Peak Signal Noise Ratio (PSNR), Bit Per Pixel (BPP), and Compression Ratio (CR) Coronavirus is the initial treatment, while the processing stage is done with network training for Convolutional Neural Networks (CNN) with Discrete Second Chebeshev Wavelets Convolutional Neural Network (DSCWCNN) and Discrete Third Chebeshev Wavelets Convolutional Neural Network (DTCWCNN) to create an efficient algorithm for face recognition, and the best results were achieved in accuracy and in the least amount of time. Two samples of color images that were made or implemented were used. The proposed theory was obtained with fast and good results; the results are evident shown in the tables below.
这项工作的独特之处在于使用了从或基于Chebyshev多项式第二类和第三类的离散小波,过滤Discrete Second Chebyshev Wavelets Transform(DSCWT),并推导出两个有效的过滤器。滤波Discrete Third Chebyshev Wavelets Transform(FDTCWT)用于分析彩色图像,并去除伴随图像的噪声和杂质,以及由于在拍摄时构成图像的大量数据。这些数据是巨大的,因此在传输期间很难相互处理。然而,为了解决这一问题,使用图像压缩技术,由于图像没有因获得的阅读而丢失信息,结果令人满意。Mean Square Error(MSE)、Peak Signal-to-Noise Ratio(PSNR)、每个像素的比特数(BPP)和压缩比(CR)是新冠病毒的初始治疗,而处理阶段使用网络训练Convolutional Neural Networks(CNN)与Discrete Second Chebyshev Wavelets Convolutional Neural Network(DSCWCNN)和Discrete Third Chebyshev Wavelets Convolutional Neural Network(DTCWCNN)创建高效的人脸识别算法,并且最好的结果是在精度和最少的时间内实现。使用了制作或实现的彩色图像的两个样本。提出的理论以快速和良好的结果得出,结果在以下表格中显而易见。
https://arxiv.org/abs/2303.13158
Face recognition models embed a face image into a low-dimensional identity vector containing abstract encodings of identity-specific facial features that allow individuals to be distinguished from one another. We tackle the challenging task of inverting the latent space of pre-trained face recognition models without full model access (i.e. black-box setting). A variety of methods have been proposed in literature for this task, but they have serious shortcomings such as a lack of realistic outputs, long inference times, and strong requirements for the data set and accessibility of the face recognition model. Through an analysis of the black-box inversion problem, we show that the conditional diffusion model loss naturally emerges and that we can effectively sample from the inverse distribution even without an identity-specific loss. Our method, named identity denoising diffusion probabilistic model (ID3PM), leverages the stochastic nature of the denoising diffusion process to produce high-quality, identity-preserving face images with various backgrounds, lighting, poses, and expressions. We demonstrate state-of-the-art performance in terms of identity preservation and diversity both qualitatively and quantitatively. Our method is the first black-box face recognition model inversion method that offers intuitive control over the generation process and does not suffer from any of the common shortcomings from competing methods.
人脸识别模型将面部图像嵌入低维身份向量中,其中包含身份特定的面部特征抽象编码,以便能够区分个体。我们解决了没有全模型访问(即黑盒设置)的挑战,即反转训练后的面部识别模型的潜在空间。在文献中提出了多种方法来完成这项工作,但它们都有严重的缺陷,例如缺乏实际输出、推理时间长、以及对数据集和面部识别模型访问的强烈要求。通过对黑盒反转问题的分析,我们表明条件扩散模型损失自然地出现,并且即使没有身份特定的损失,我们仍然可以从逆分布中有效地采样。我们的方法名为身份去噪扩散概率模型(ID3PM),利用去噪扩散过程的随机性质,以产生各种背景、照明、姿势和面部表情的高质量、身份保留的面部图像。我们证明了身份保留和多样性的高水平、高定量表现。我们的方法是第一个黑盒面部识别模型反转方法,提供了直观的控制生成过程,并且没有与竞争方法的普遍缺点的任何影响。
https://arxiv.org/abs/2303.13006
A hard challenge in developing practical face recognition (FR) attacks is due to the black-box nature of the target FR model, i.e., inaccessible gradient and parameter information to attackers. While recent research took an important step towards attacking black-box FR models through leveraging transferability, their performance is still limited, especially against online commercial FR systems that can be pessimistic (e.g., a less than 50% ASR--attack success rate on average). Motivated by this, we present Sibling-Attack, a new FR attack technique for the first time explores a novel multi-task perspective (i.e., leveraging extra information from multi-correlated tasks to boost attacking transferability). Intuitively, Sibling-Attack selects a set of tasks correlated with FR and picks the Attribute Recognition (AR) task as the task used in Sibling-Attack based on theoretical and quantitative analysis. Sibling-Attack then develops an optimization framework that fuses adversarial gradient information through (1) constraining the cross-task features to be under the same space, (2) a joint-task meta optimization framework that enhances the gradient compatibility among tasks, and (3) a cross-task gradient stabilization method which mitigates the oscillation effect during attacking. Extensive experiments demonstrate that Sibling-Attack outperforms state-of-the-art FR attack techniques by a non-trivial margin, boosting ASR by 12.61% and 55.77% on average on state-of-the-art pre-trained FR models and two well-known, widely used commercial FR systems.
在开发实际人脸识别攻击方面,面临的一项难题是由于目标人脸识别模型的black-box性质,即攻击者难以获取梯度和参数信息。尽管最近的研究通过利用可移植性攻击 black-box 的人脸识别模型迈出了重要一步,但它们的表现仍然有限,特别是针对可能悲观的在线商业人脸识别系统(例如平均攻击成功率低于50%)。因此,我们提出了Sibling-Attack,这是一种新的人脸识别攻击技术,首次探索了一种新的多任务视角(即利用多相关任务增加攻击可移植性)。Intuitively,Sibling-Attack选择一组与人脸识别相关的任务,并根据理论分析和定量分析选择了在Sibling-Attack中使用的任务,即Attribute Recognition (AR) 任务。Sibling-Attack then开发了一个融合对抗梯度信息的优化框架,通过(1)限制交叉任务特征在同一空间内,(2)一个联合任务meta优化框架,增强任务之间的梯度兼容性,以及(3)一个交叉任务梯度稳定方法,在攻击期间减缓振荡效应。广泛的实验结果表明,Sibling-Attack在先进的人脸识别攻击技术中表现出了非同寻常的优势,平均提高了攻击成功率12.61%和55.77%。在两个知名且广泛应用的人脸识别商业系统中,Sibling-Attack的表现更是远远超过了最先进的技术。
https://arxiv.org/abs/2303.12512
The practical needs of the ``right to be forgotten'' and poisoned data removal call for efficient \textit{machine unlearning} techniques, which enable machine learning models to unlearn, or to forget a fraction of training data and its lineage. Recent studies on machine unlearning for deep neural networks (DNNs) attempt to destroy the influence of the forgetting data by scrubbing the model parameters. However, it is prohibitively expensive due to the large dimension of the parameter space. In this paper, we refocus our attention from the parameter space to the decision space of the DNN model, and propose Boundary Unlearning, a rapid yet effective way to unlearn an entire class from a trained DNN model. The key idea is to shift the decision boundary of the original DNN model to imitate the decision behavior of the model retrained from scratch. We develop two novel boundary shift methods, namely Boundary Shrink and Boundary Expanding, both of which can rapidly achieve the utility and privacy guarantees. We extensively evaluate Boundary Unlearning on CIFAR-10 and Vggface2 datasets, and the results show that Boundary Unlearning can effectively forget the forgetting class on image classification and face recognition tasks, with an expected speed-up of $17\times$ and $19\times$, respectively, compared with retraining from the scratch.
被遗忘权和毒化数据删除的实际需求需要高效的机器学习模型自学习技术,这可以使机器学习模型忘记或一小部分训练数据和其家族。最近关于深度学习神经网络(DNN)的机器学习模型自学习研究试图通过修改模型参数来消除忘记数据的影响。然而,由于参数空间的巨大维度,这变得非常昂贵。在本文中,我们将从参数空间转移到决策空间的注意力集中在DNN模型的决策空间上,并提出了边界学习,一种快速而有效的方法,从训练好的DNN模型中删除整个类。关键的想法是改变原始DNN模型的决策边界,以模仿重新训练模型的决策行为。我们开发了两个新的边界改变方法,分别是边界缩小和边界扩展,这两种方法都可以迅速实现实用性和隐私保障。我们对CIFAR-10和Vggface2数据集进行广泛的边界学习评估,结果表明,边界学习可以在图像分类和人脸识别任务中有效地忘记忘记类,与从头开始重新训练相比,速度提高了$17\times$和$19\times。
https://arxiv.org/abs/2303.11570
The function of constructing the hierarchy of objects is important to the visual process of the human brain. Previous studies have successfully adopted capsule networks to decompose the digits and faces into parts in an unsupervised manner to investigate the similar perception mechanism of neural networks. However, their descriptions are restricted to the 2D space, limiting their capacities to imitate the intrinsic 3D perception ability of humans. In this paper, we propose an Inverse Graphics Capsule Network (IGC-Net) to learn the hierarchical 3D face representations from large-scale unlabeled images. The core of IGC-Net is a new type of capsule, named graphics capsule, which represents 3D primitives with interpretable parameters in computer graphics (CG), including depth, albedo, and 3D pose. Specifically, IGC-Net first decomposes the objects into a set of semantic-consistent part-level descriptions and then assembles them into object-level descriptions to build the hierarchy. The learned graphics capsules reveal how the neural networks, oriented at visual perception, understand faces as a hierarchy of 3D models. Besides, the discovered parts can be deployed to the unsupervised face segmentation task to evaluate the semantic consistency of our method. Moreover, the part-level descriptions with explicit physical meanings provide insight into the face analysis that originally runs in a black box, such as the importance of shape and texture for face recognition. Experiments on CelebA, BP4D, and Multi-PIE demonstrate the characteristics of our IGC-Net.
构建对象层级的功能对于人类大脑的视觉过程非常重要。以往的研究成功采用了胶囊网络在不 unsupervised 的情况下将数字和人脸分解成部分来研究神经网络相似的感知机制。然而,他们的描述局限于 2D 空间,限制了他们模仿人类固有的 3D 感知能力的能力。在本文中,我们提出了一种Inverse Graphics Capsule Network (IGC-Net)来从大规模未标记图像中学习Hierarchical 3D 面部表示。IGC-Net的核心是一种新的胶囊,名为 graphics capsule,它在计算机图形中表示具有可解释参数的3D基本点,包括深度、反射率和3D姿态。具体来说,IGC-Net首先将对象分解成一组语义 consistent 的部分级描述,然后将它们组成对象级描述来构建层级。学习到的 graphics capsule 揭示了神经网络以视觉感知为导向理解人脸作为3D模型层级的重要性。此外,发现的部分可以用于 unsupervised 面部分割任务来评估我们方法的语义一致性。此外,具有明确物理意义的部分级描述提供了从原本运行在黑盒中的面容分析中得出的启示,例如形状和纹理对于人脸识别的重要性。在CelebA、BP4D 和 Multi-PIE 的实验中证明了我们的 IGC-Net 的特征。
https://arxiv.org/abs/2303.10896
Neural network-based image classifiers are powerful tools for computer vision tasks, but they inadvertently reveal sensitive attribute information about their classes, raising concerns about their privacy. To investigate this privacy leakage, we introduce the first Class Attribute Inference Attack (Caia), which leverages recent advances in text-to-image synthesis to infer sensitive attributes of individual classes in a black-box setting, while remaining competitive with related white-box attacks. Our extensive experiments in the face recognition domain show that Caia can accurately infer undisclosed sensitive attributes, such as an individual's hair color, gender and racial appearance, which are not part of the training labels. Interestingly, we demonstrate that adversarial robust models are even more vulnerable to such privacy leakage than standard models, indicating that a trade-off between robustness and privacy exists.
神经网络图像分类器是用于计算机视觉任务的强大工具,但它们无意中透露了它们所属的类别的敏感属性信息,引起了关于隐私的担忧。为了研究这种隐私泄漏,我们引入了第一个类属性推断攻击(Caia),它利用最近的文本到图像合成的进展,在黑盒环境下推断单个类别的敏感属性,但仍与相关的白盒攻击竞争。我们在人脸识别领域的大量实验表明,Caia可以准确推断未公开的敏感属性,例如个人的头发颜色、性别和种族外貌,这些属性不是训练标签的一部分。有趣的是,我们证明对抗性的稳健模型比标准模型更加容易受到这种隐私泄漏的影响,这表明稳健性和隐私之间存在权衡。
https://arxiv.org/abs/2303.09289
Low-resolution face recognition (LRFR) has become a challenging problem for modern deep face recognition systems. Existing methods mainly leverage prior information from high-resolution (HR) images by either reconstructing facial details with super-resolution techniques or learning a unified feature space. To address this issue, this paper proposes a novel approach which enforces the network to focus on the discriminative information stored in the low-frequency components of a low-resolution (LR) image. A cross-resolution knowledge distillation paradigm is first employed as the learning framework. An identity-preserving network, WaveResNet, and a wavelet similarity loss are then designed to capture low-frequency details and boost performance. Finally, an image degradation model is conceived to simulate more realistic LR training data. Consequently, extensive experimental results show that the proposed method consistently outperforms the baseline model and other state-of-the-art methods across a variety of image resolutions.
低分辨率人脸识别(LRFR)已成为现代深度人脸识别系统的挑战性问题。现有的方法主要利用高分辨率(HR)图像的先前信息,通过使用超分辨率技术或学习一个统一的特征空间来重建面部细节。为了解决这个问题,本文提出了一种新 approach,该方法强迫网络关注低分辨率(LR)图像中存储的区分信息,即低频率成分。一种跨分辨率知识蒸馏范式被用作学习框架。一个保持身份的网络、 WaveResNet 和小波相似度损失被设计用于捕获低频率细节并提高性能。最后,一个图像退化模型被构思以模拟更真实的LR训练数据。因此,广泛的实验结果显示,所提出的方法在不同图像分辨率下 consistently outperforms 基准模型和其他现代方法。
https://arxiv.org/abs/2303.08665
Facial Expression Recognition (FER) is an important task in computer vision and has wide applications in human-computer interaction, intelligent security, emotion analysis, and other fields. However, the limited size of FER datasets limits the generalization ability of expression recognition models, resulting in ineffective model performance. To address this problem, we propose a semi-supervised learning framework that utilizes unlabeled face data to train expression recognition models effectively. Our method uses a dynamic threshold module (\textbf{DTM}) that can adaptively adjust the confidence threshold to fully utilize the face recognition (FR) data to generate pseudo-labels, thus improving the model's ability to model facial expressions. In the ABAW5 EXPR task, our method achieved excellent results on the official validation set.
facial expression recognition (FER) 是计算机视觉中的一个重要任务,它在人机交互、智能安全、情感分析和其他领域有着广泛的应用。然而,FER数据集的局限性限制了表达识别模型的泛化能力,导致模型性能无效。为了解决这个问题,我们提出了一种半监督学习框架,利用未标记的面部数据来有效地训练表达识别模型。我们的方法使用了一个动态阈值模块(DTM),该模块可以自适应地调整信任阈值,充分利用面部识别(FR)数据生成伪标签,从而提高模型对面部表达建模的能力。在ABAW5 EXPR任务中,我们的方法在官方验证集上取得了出色的结果。
https://arxiv.org/abs/2303.08617
Machine learning and deep learning have been used extensively to classify physical surfaces through images and time-series contact data. However, these methods rely on human expertise and entail the time-consuming processes of data and parameter tuning. To overcome these challenges, we propose an easily implemented framework that can directly handle heterogeneous data sources for classification tasks. Our data-versus-data approach automatically quantifies distinctive differences in distributions in a high-dimensional space via kernel two-sample testing between two sets extracted from multimodal data (e.g., images, sounds, haptic signals). We demonstrate the effectiveness of our technique by benchmarking against expertly engineered classifiers for visual-audio-haptic surface recognition due to the industrial relevance, difficulty, and competitive baselines of this application; ablation studies confirm the utility of key components of our pipeline. As shown in our open-source code, we achieve 97.2% accuracy on a standard multi-user dataset with 108 surface classes, outperforming the state-of-the-art machine-learning algorithm by 6% on a more difficult version of the task. The fact that our classifier obtains this performance with minimal data processing in the standard algorithm setting reinforces the powerful nature of kernel methods for learning to recognize complex patterns.
机器学习和深度学习已经被广泛应用于通过图像和时序接触数据对物理表面进行分类。然而,这些方法依赖于人类专业知识,需要进行数据和分析参数的耗时过程。为了克服这些挑战,我们提出了一个易于实现的框架,可以直接处理分类任务中的异质数据源。我们的方法通过从多模态数据(如图像、声音和触觉信号)中提取的两个组之间的kernel二样本测试,在高维空间中自动量化分布之间的显著差异。我们基准了针对视觉-听觉-触觉表面识别 expertly Engineered Classifiers的性能,因为该应用 industrial relevance、难度和竞争力的基线; ablation研究确认了我们流程中的关键组件的有用性。在我们开源代码中,我们在一个具有108个表面类的标准多用户数据集上实现了97.2%的准确率,在更困难的任务版本中比最先进的机器学习算法高出6%。我们Classifier的性能在标准算法设置中 minimal 数据 processing的情况下实现,这进一步加强了kernel方法学习以识别复杂模式的强大性质。
https://arxiv.org/abs/2303.04930
In this study, we introduce a feature knowledge distillation framework to improve low-resolution (LR) face recognition performance using knowledge obtained from high-resolution (HR) images. The proposed framework transfers informative features from an HR-trained network to an LR-trained network by reducing the distance between them. A cosine similarity measure was employed as a distance metric to effectively align the HR and LR features. This approach differs from conventional knowledge distillation frameworks, which use the L_p distance metrics and offer the advantage of converging well when reducing the distance between features of different resolutions. Our framework achieved a 3% improvement over the previous state-of-the-art method on the AgeDB-30 benchmark without bells and whistles, while maintaining a strong performance on HR images. The effectiveness of cosine similarity as a distance metric was validated through statistical analysis, making our approach a promising solution for real-world applications in which LR images are frequently encountered. The code and pretrained models will be publicly available on GitHub.
在本研究中,我们介绍了一种特征知识蒸馏框架,通过利用高分辨率(HR)图像所获得的知识,来提高低分辨率(LR)人脸识别性能。该框架通过减少HR和LR特征之间的距离,将 informative 特征从HR训练网络转移到LR训练网络。我们采用了余弦相似度测量作为距离度量,有效地对齐了HR和LR特征。这种方法与传统的知识蒸馏框架不同,后者使用L_p距离度量,并在减少不同分辨率特征之间的距离时表现出更好的收敛性。我们框架在AgeDB-30基准测试中在没有花哨功能的情况下比先前的方法提高了3%,同时在HR图像方面表现出强大的性能。余弦相似度的测量有效性通过统计分析得到了验证,因此我们的方法对于经常遇到的低分辨率图像的实际应用场景来说是一个重要的解决方案。代码和预训练模型将放在GitHub上公开可用。
https://arxiv.org/abs/2303.04681
Recently, significant progress has been made in face presentation attack detection (PAD), which aims to secure face recognition systems against presentation attacks, owing to the availability of several face PAD datasets. However, all available datasets are based on privacy and legally-sensitive authentic biometric data with a limited number of subjects. To target these legal and technical challenges, this work presents the first synthetic-based face PAD dataset, named SynthASpoof, as a large-scale PAD development dataset. The bona fide samples in SynthASpoof are synthetically generated and the attack samples are collected by presenting such synthetic data to capture systems in a real attack scenario. The experimental results demonstrate the feasibility of using SynthASpoof for the development of face PAD. Moreover, we boost the performance of such a solution by incorporating the domain generalization tool MixStyle into the PAD solutions. Additionally, we showed the viability of using synthetic data as a supplement to enrich the diversity of limited authentic training data and consistently enhance PAD performances. The SynthASpoof dataset, containing 25,000 bona fide and 78,800 attack samples, the implementation, and the pre-trained weights are made publicly available.
最近,在面部展示攻击检测(PAD)方面取得了重要进展,旨在确保面部识别系统免受展示攻击。由于有几个面部PAD数据集的可用性,这篇论文提出了第一个基于合成的面部PAD数据集,名为 SynthASpoof,作为大规模PAD开发数据集。在SynthASpoof中,真阳性样本是通过合成数据捕获系统的 real-attack 场景收集的。实验结果表明,使用 SynthASpoof 开发面部 PAD 是可行的。此外,我们还可以通过将领域泛化工具 MixStyle 添加到 PAD 解决方案中来提高该解决方案的性能。此外,我们还展示了使用合成数据作为补充以丰富有限 authentic 训练数据的多样性,并持续提高 PAD 性能的可能性。 SynthASpoof 数据集包含 25,000 个真阳性样本和 78,800 个攻击样本,实现和预训练权重已公开可用。
https://arxiv.org/abs/2303.02660
Evaluating the quality of facial images is essential for operating face recognition systems with sufficient accuracy. The recent advances in face quality standardisation (ISO/IEC WD 29794-5) recommend the usage of component quality measures for breaking down face quality into its individual factors, hence providing valuable feedback for operators to re-capture low-quality images. In light of recent advances in 3D-aware generative adversarial networks, we propose a novel dataset, "Syn-YawPitch", comprising 1,000 identities with varying yaw-pitch angle combinations. Utilizing this dataset, we demonstrate that pitch angles beyond 30 degrees have a significant impact on the biometric performance of current face recognition systems. Furthermore, we propose a lightweight and efficient pose quality predictor that adheres to the standards of ISO/IEC WD 29794-5 and is freely available for use at this https URL.
评估面部图像的质量对于实现足够准确的面部识别系统至关重要。近期的面部质量标准化进展(ISO/IEC WD 29794-5)建议将面部质量分解成个体因素,并使用组件质量指标进行衡量,因此为操作员提供了有价值的反馈,以重新捕获低质量图像。考虑到3Daware生成对抗网络的最新进展,我们提出了一个独特的数据集,“Syn-YawPitch”,包括不同 Yaw-Pitch 角度的组合的1,000个身份。利用这个数据集,我们证明超过30度的 Yaw-Pitch 角度对当前面部识别系统的生物特征性能产生了重大影响。此外,我们提出了一个轻量级高效的姿态质量预测器,符合ISO/IEC WD 29794-5的标准,并在这个https URL上免费可用。
https://arxiv.org/abs/2303.00491
Recent deep-learning-based compression methods have achieved superior performance compared with traditional approaches. However, deep learning models have proven to be vulnerable to backdoor attacks, where some specific trigger patterns added to the input can lead to malicious behavior of the models. In this paper, we present a novel backdoor attack with multiple triggers against learned image compression models. Motivated by the widely used discrete cosine transform (DCT) in existing compression systems and standards, we propose a frequency-based trigger injection model that adds triggers in the DCT domain. In particular, we design several attack objectives for various attacking scenarios, including: 1) attacking compression quality in terms of bit-rate and reconstruction quality; 2) attacking task-driven measures, such as down-stream face recognition and semantic segmentation. Moreover, a novel simple dynamic loss is designed to balance the influence of different loss terms adaptively, which helps achieve more efficient training. Extensive experiments show that with our trained trigger injection models and simple modification of encoder parameters (of the compression model), the proposed attack can successfully inject several backdoors with corresponding triggers in a single image compression model.
最近,基于深度学习的压缩方法已经实现了比传统方法更好的性能。然而,深度学习模型已经证明是受后门攻击的弱点,其中某些特定的触发模式添加到输入可以导致模型的恶意行为。在本文中,我们提出了一种基于多个触发器的多个后门攻击,针对学习的图像压缩模型。鉴于现有的压缩系统和规范中广泛应用的离散余弦变换(DCT),我们提出了一种基于频率的触发注入模型,在DCT域中添加触发器。特别是,我们为各种攻击场景设计了几个攻击目标,包括:1)攻击比特率和重建质量的压缩质量;2)攻击任务驱动措施,如后续面部识别和语义分割。此外,我们设计了一种新的简单动态损失,旨在自适应地平衡不同损失 terms的影响,帮助实现更高效的训练。广泛的实验表明,结合我们训练的触发注入模型和简单的编码器参数修改(压缩模型),这种攻击可以在单个图像压缩模型中成功注入与相应的触发器的几个后门。
https://arxiv.org/abs/2302.14677
Lossy face image compression can degrade the image quality and the utility for the purpose of face recognition. This work investigates the effect of lossy image compression on a state-of-the-art face recognition model, and on multiple face image quality assessment models. The analysis is conducted over a range of specific image target sizes. Four compression types are considered, namely JPEG, JPEG 2000, downscaled PNG, and notably the new JPEG XL format. Frontal color images from the ColorFERET database were used in a Region Of Interest (ROI) variant and a portrait variant. We primarily conclude that JPEG XL allows for superior mean and worst case face recognition performance especially at lower target sizes, below approximately 5kB for the ROI variant, while there appears to be no critical advantage among the compression types at higher target sizes. Quality assessments from modern models correlate well overall with the compression effect on face recognition performance.
有损人脸识别压缩会对图像质量和人脸识别功能产生不利影响。这项工作研究了有损图像压缩对最先进的人脸识别模型以及多个面部图像质量评估模型的影响。分析涵盖了具体的图像目标大小范围。考虑了四种压缩类型:JPEG、JPEG 2000、downscaled PNG以及新生成的JPEG XL格式。从ColorFERET数据库中获取的前方颜色图像被用于ROI变异体和肖像变异体。我们的主要结论是:JPEG XL可以在较低的目标大小下提供更好的平均和最坏人脸识别性能,特别是在ROI变异体目标大小低于约5kB的情况下,而其他压缩类型在更高的目标大小下似乎没有显著的竞争优势。现代模型的质量评估与压缩对人脸识别性能的影响Overall很好地相关。
https://arxiv.org/abs/2302.12593
The overall objective of the main project is to propose and develop a system of facial authentication in unlocking phones or applications in phones using facial recognition. The system will include four separate architectures: face detection, face recognition, face spoofing, and classification of closed eyes. In which, we consider the problem of face recognition to be the most important, determining the true identity of the person standing in front of the screen with absolute accuracy is what facial recognition systems need to achieve. Along with the development of the face recognition problem, the problem of the anti-fake face is also gradually becoming popular and equally important. Our goal is to propose and develop two loss functions: LMCot and Double Loss. Then apply them to the face authentication process.
该项目的总体目标是提出和开发一种在解锁手机或在手机上使用面部识别的面部验证系统。该系统将包括四个独立的架构:面部检测、面部识别、面部模拟和闭眼分类。我们认为面部识别的问题是最重要的,以绝对的准确性确定站在屏幕前的人的真实身份是面部识别系统需要实现的。随着面部识别问题的发展,反伪造面部也 gradually变得越来越流行和同样重要。我们的的目标是提出和开发两个损失函数:LMCot和Double Loss,并将它们应用于面部验证过程。
https://arxiv.org/abs/2302.11427
Face attribute research has so far used only simple binary attributes for facial hair; e.g., beard / no beard. We have created a new, more descriptive facial hair annotation scheme and applied it to create a new facial hair attribute dataset, FH37K. Face attribute research also so far has not dealt with logical consistency and completeness. For example, in prior research, an image might be classified as both having no beard and also having a goatee (a type of beard). We show that the test accuracy of previous classification methods on facial hair attribute classification drops significantly if logical consistency of classifications is enforced. We propose a logically consistent prediction loss, LCPLoss, to aid learning of logical consistency across attributes, and also a label compensation training strategy to eliminate the problem of no positive prediction across a set of related attributes. Using an attribute classifier trained on FH37K, we investigate how facial hair affects face recognition accuracy, including variation across demographics. Results show that similarity and difference in facial hairstyle have important effects on the impostor and genuine score distributions in face recognition.
面部特征研究目前仅使用简单的二进制特征对面部毛发进行分类,例如胡须/无胡须。我们创造了一个新的更详细的面部毛发标注方案,并将其应用于创建一个新的面部毛发属性数据集FH37K。面部特征研究还尚未处理逻辑一致性和完整性。例如,在先前的研究中,一个图像可能被分类为既有无胡须又有 goatee(一种胡须类型)。我们表明,如果逻辑一致性的分类结果得以强制,先前的面部毛发属性分类方法的性能将大幅下降。我们提出了逻辑一致性预测损失LCPLoss,以帮助学习跨属性的逻辑一致性,并提出了标签补偿训练策略,以消除一组相关属性中不存在积极预测的问题。使用训练在FH37K上的 attribute classifier,我们研究面部毛发对面部识别准确性的影响,包括年龄组的变化。结果表明,面部毛发的发型相似性和差异对人脸识别中的冒牌者和真实得分分布具有重要的影响。
https://arxiv.org/abs/2302.11102
With the rise of handy smart phones in the recent years, the trend of capturing selfie images is observed. Hence efficient approaches are required to be developed for recognising faces in selfie images. Due to the short distance between the camera and face in selfie images, and the different visual effects offered by the selfie apps, face recognition becomes more challenging with existing approaches. A dataset is needed to be developed to encourage the study to recognize faces in selfie images. In order to alleviate this problem and to facilitate the research on selfie face images, we develop a challenging Wild Selfie Dataset (WSD) where the images are captured from the selfie cameras of different smart phones, unlike existing datasets where most of the images are captured in controlled environment. The WSD dataset contains 45,424 images from 42 individuals (i.e., 24 female and 18 male subjects), which are divided into 40,862 training and 4,562 test images. The average number of images per subject is 1,082 with minimum and maximum number of images for any subject are 518 and 2,634, respectively. The proposed dataset consists of several challenges, including but not limited to augmented reality filtering, mirrored images, occlusion, illumination, scale, expressions, view-point, aspect ratio, blur, partial faces, rotation, and alignment. We compare the proposed dataset with existing benchmark datasets in terms of different characteristics. The complexity of WSD dataset is also observed experimentally, where the performance of the existing state-of-the-art face recognition methods is poor on WSD dataset, compared to the existing datasets. Hence, the proposed WSD dataset opens up new challenges in the area of face recognition and can be beneficial to the community to study the specific challenges related to selfie images and develop improved methods for face recognition in selfie images.
近年来,智能手机的普及,导致了拍摄自拍照图像的趋势。因此,需要开发高效的方法来识别自拍照图像中的人脸。由于自拍照图像中的相机与人脸之间的距离很短,以及自拍应用程序提供的不同视觉效果,现有的方法对人脸识别变得更具挑战性。因此,需要开发一个数据集来鼓励研究识别自拍照图像中的人脸。为了减轻这个问题并促进自拍照图像人脸识别的研究,我们开发了一项挑战性的野生自拍照数据集(WSD),这些数据集是从不同智能手机的自拍照相机中收集的,与现有的数据集,其中大多数图像是在控制环境中收集的相反。WSD数据集包含45,424张图像,来自42个人(即24个女性和18个男性),并将其分为40,862训练和4,562测试图像。平均每人图像的数量为1,082,最小和最大值为518和2,634。 proposed dataset包括多个挑战,包括但不限于增强现实过滤、镜像图像、遮挡、照明、尺度、表情、视角、 aspect ratio、模糊、部分人脸、旋转和对齐。我们将 proposed dataset与现有的基准数据集进行比较,并发现WSD数据集的复杂性实验时也观察到,现有的最先进的人脸识别方法在WSD数据集上的性能比现有的数据集差。因此,提议的WSD数据集为人脸识别领域开启了新的机遇,可能对社区研究自拍照图像中的特定挑战和开发改进的自拍照图像人脸识别方法有益。
https://arxiv.org/abs/2302.07245
Typical diffusion models are trained to accept a particular form of conditioning, most commonly text, and cannot be conditioned on other modalities without retraining. In this work, we propose a universal guidance algorithm that enables diffusion models to be controlled by arbitrary guidance modalities without the need to retrain any use-specific components. We show that our algorithm successfully generates quality images with guidance functions including segmentation, face recognition, object detection, and classifier signals. Code is available at this https URL.
典型的扩散模型被训练接受特定的 conditioning 形式,最常见的是文字,并且无法在没有重新训练的情况下接受其他形式的训练。在本文中,我们提出了一种通用的指导算法,它使扩散模型能够以任意的指导模式来控制,而无需重新训练任何特定的组件。我们展示了我们的算法成功地生成包括 segmentation、人脸识别、物体检测和分类信号的指导函数高质量的图像。代码在此 https URL 可用。
https://arxiv.org/abs/2302.07121
Deep neural networks (DNNs) and, in particular, convolutional neural networks (CNNs) have brought significant advances in a wide range of modern computer application problems. However, the increasing availability of large amounts of datasets as well as the increasing available computational power of modern computers lead to a steady growth in the complexity and size of DNN and CNN models, and thus, to longer training times. Hence, various methods and attempts have been developed to accelerate and parallelize the training of complex network architectures. In this work, a novel CNN-DNN architecture is proposed that naturally supports a model parallel training strategy and that is loosely inspired by two-level domain decomposition methods (DDM). First, local CNN models, that is, subnetworks, are defined that operate on overlapping or nonoverlapping parts of the input data, for example, sub-images. The subnetworks can be trained completely in parallel. Each subnetwork outputs a local decision for the given machine learning problem which is exclusively based on the respective local input data. Subsequently, an additional DNN model is trained which evaluates the local decisions of the local subnetworks and generates a final, global decision. With respect to the analogy to DDM, the DNN can be interpreted as a coarse problem and hence, the new approach can be interpreted as a two-level domain decomposition. In this paper, solely image classification problems using CNNs are considered. Experimental results for different 2D image classification problems are provided as well as a face recognition problem, and a classification problem for 3D computer tomography (CT) scans. The results show that the proposed approach can significantly accelerate the required training time compared to the global model and, additionally, can also help to improve the accuracy of the underlying classification problem.
深度学习(DNN)和特别是卷积神经网络(CNN)在解决许多现代计算机应用问题方面取得了重大进展。然而,大量数据的普及以及现代计算机可用的计算能力的增长导致DNN和CNN模型的复杂性和大小不断增加,因此训练时间也逐渐延长。因此,开发了许多方法和技术,以加速和并行化复杂网络架构的训练。在本文中,我们提出了一种全新的CNN-DNN架构,该架构自然支持模型并行训练策略,并的灵感来源于两层次域分解方法(DDM)。首先,我们将本地CNN模型称为子网络,这些子网络操作输入数据的重叠或非重叠部分,例如子图像。子网络可以完全并行地训练。每个子网络输出为给定机器学习问题的唯一基于相应本地输入数据的建议决策。随后,我们训练了额外的CNN模型,以评估本地子网络的建议决策,并生成最终的全局决策。与DDM类比而言,DNN可以被解释为一个问题,因此,新的 approach 可以被解释为两个层次域分解方法。在本文中,仅考虑使用CNN的图像分类问题。对不同2D图像分类问题的实验结果以及人脸识别和3D计算机断层扫描(CT)扫描分类问题进行了提供。结果显示,与全球模型相比, proposed 方法可以显著加速所需的训练时间,此外,还可以帮助改善底层分类问题的准确性。
https://arxiv.org/abs/2302.06564
In unconstrained scenarios, face recognition and person re-identification are subject to distortions such as motion blur, atmospheric turbulence, or upsampling artifacts. To improve robustness in these scenarios, we propose a methodology called Distortion-Adaptive Learned Invariance for Identification (DaliID) models. We contend that distortion augmentations, which degrade image quality, can be successfully leveraged to a greater degree than has been shown in the literature. Aided by an adaptive weighting schedule, a novel distortion augmentation is applied at severe levels during training. This training strategy increases feature-level invariance to distortions and decreases domain shift to unconstrained scenarios. At inference, we use a magnitude-weighted fusion of features from parallel models to retain robustness across the range of images. DaliID models achieve state-of-the-art (SOTA) for both face recognition and person re-identification on seven benchmark datasets, including IJB-S, TinyFace, DeepChange, and MSMT17. Additionally, we provide recaptured evaluation data at a distance of 750+ meters and further validate on real long-distance face imagery.
在无约束场景中,人脸识别和人名重识别会受到诸如运动模糊、大气湍流或超采样失真等影响。为了提高在这些场景中的鲁棒性,我们提出了一种方法,称为“失真自适应学习变异性识别模型”(DaliID)。我们声称,失真增强剂,会降低图像质量,比文献中显示出的要成功地利用得多。借助自适应权重计划,在训练期间,一种新的失真增强剂被应用到严重的水平上。这种训练策略增加了特征级别的不变性,并减少了向无约束场景的域转移。在推理中,我们使用并行模型中特征的量级加权融合来保留在整个图像范围内的鲁棒性。DaliID模型在包括IJB-S、tinyFace、DeepChange和 MSMT17七个基准数据集上实现了人脸识别和人名重识别的最先进的性能(SOTA)。此外,我们还提供了距离超过750米重新捕获的评价数据,并进一步验证了真实的长距离人脸图像。
https://arxiv.org/abs/2302.05753