Contemporary face detection algorithms have to deal with many challenges such as variations in pose, illumination, and scale. A subclass of the face detection problem that has recently gained increasing attention is occluded face detection, or more specifically, the detection of masked faces. Three years on since the advent of the COVID-19 pandemic, there is still a complete lack of evidence regarding how well existing face detection algorithms perform on masked faces. This article first offers a brief review of state-of-the-art face detectors and detectors made for the masked face problem, along with a review of the existing masked face datasets. We evaluate and compare the performances of a well-representative set of face detectors at masked face detection and conclude with a discussion on the possible contributing factors to their performance.
当代人脸识别算法必须处理许多挑战，例如姿势、照明和大小的变化。最近越来越引人关注的是遮挡人脸识别，或更具体地说，是识别口罩面容的问题。自 COVID-19 大流行开始以来已经三年了，但仍然存在没有任何证据表明现有人脸识别算法在口罩面容识别方面表现如何的问题。本文首先简要介绍了为口罩面容问题开发的最先进的人脸识别算法和算法，并回顾了现有的口罩面容数据集。我们评估和比较了一组代表性的人脸识别算法在口罩面容识别方面的性能，并最后讨论了可能影响其性能的可能影响因素。
CNN-based face detection methods have achieved significant progress in recent years. In addition to the strong representation ability of CNN, post-processing methods are also very important for the performance of face detection. In general, the face detection method predicts several candidate bounding-boxes for one face. NMS is used to filter out inaccurate candidate boxes to get the most accurate box. The principle of NMS is to select the box with a higher score as the basic box and then delete the box which has a large overlapping area with the basic box but has a lower score. However, the current NMS method and its improved versions do not perform well when face image quality is poor or faces are in a cluster. In these situations, even after NMS filtering, there is often a face corresponding to multiple predicted boxes. To reduce this kind of negative result, in this paper, we propose a new NMS method that operates in the reverse order of other NMS methods. Our method performs well on low-quality and tiny face samples. Experiments demonstrate that our method is effective as a post-processor for different face detection methods.
The extensive utilization of biometric authentication systems have emanated attackers / imposters to forge user identity based on morphed images. In this attack, a synthetic image is produced and merged with genuine. Next, the resultant image is user for authentication. Numerous deep neural convolutional architectures have been proposed in literature for face Morphing Attack Detection (MADs) to prevent such attacks and lessen the risks associated with them. Although, deep learning models achieved optimal results in terms of performance, it is difficult to understand and analyse these networks since they are black box/opaque in nature. As a consequence, incorrect judgments may be made. There is, however, a dearth of literature that explains decision-making methods of black box deep learning models for biometric Presentation Attack Detection (PADs) or MADs that can aid the biometric community to have trust in deep learning-based biometric systems for identification and authentication in various security applications such as border control, criminal database establishment etc. In this work, we present a novel visual explanation approach named Ensemble XAI integrating Saliency maps, Class Activation Maps (CAM) and Gradient-CAM (Grad-CAM) to provide a more comprehensive visual explanation for a deep learning prognostic model (EfficientNet-B1) that we have employed to predict whether the input presented to a biometric authentication system is morphed or genuine. The experimentations have been performed on three publicly available datasets namely Face Research Lab London Set, Wide Multi-Channel Presentation Attack (WMCA), and Makeup Induced Face Spoofing (MIFS). The experimental evaluations affirms that the resultant visual explanations highlight more fine-grained details of image features/areas focused by EfficientNet-B1 to reach decisions along with appropriate reasoning.
大量利用生物识别 authentication 系统导致了攻击者/伪造者基于变形图像Forge 用户身份。在这种类型的攻击中，合成图像被产生并与真品合并。随后，该图像被用作用户身份验证。在许多文献中，提出了许多深度神经网络卷积架构，以预防面部变形攻击(MADs)，以减少与之相关的风险。虽然深度学习模型在性能方面取得了最佳结果，但由于它们是黑盒/不透明的，因此难以理解和分析这些网络。因此，可能会出现错误的判断。然而，文献中缺乏解释黑盒深度学习模型用于生物识别演示攻击检测(PADs)或Mads的决策方法的书籍，这可以帮助生物识别社区相信基于深度学习的生物识别系统，用于身份验证和安全检查，如边境控制、犯罪数据库建立等。在本文中，我们提出了一种名为Ensemble XAI的新视觉解释方法，该方法集成了亮度图、类激活图(CAM)和梯度CAM(Grad-CAM)，为我们所使用的深度学习预测模型(EfficientNet-B1)提供了更全面的视觉解释。实验在三个公开数据集上进行了实施，分别是Face Research Lab伦敦组数据集、 wide 多通道演示攻击(WMCA)和化妆导致的面部欺骗(MIFS)。实验评估确认，结果的视觉解释突出了EfficientNet-B1关注的图像特征/区域的更精细细节，以通过适当的推理做出决策。
Adversarial attacks aim to disturb the functionality of a target system by adding specific noise to the input samples, bringing potential threats to security and robustness when applied to facial recognition systems. Although existing defense techniques achieve high accuracy in detecting some specific adversarial faces (adv-faces), new attack methods especially GAN-based attacks with completely different noise patterns circumvent them and reach a higher attack success rate. Even worse, existing techniques require attack data before implementing the defense, making it impractical to defend newly emerging attacks that are unseen to defenders. In this paper, we investigate the intrinsic generality of adv-faces and propose to generate pseudo adv-faces by perturbing real faces with three heuristically designed noise patterns. We are the first to train an adv-face detector using only real faces and their self-perturbations, agnostic to victim facial recognition systems, and agnostic to unseen attacks. By regarding adv-faces as out-of-distribution data, we then naturally introduce a novel cascaded system for adv-face detection, which consists of training data self-perturbations, decision boundary regularization, and a max-pooling-based binary classifier focusing on abnormal local color aberrations. Experiments conducted on LFW and CelebA-HQ datasets with eight gradient-based and two GAN-based attacks validate that our method generalizes to a variety of unseen adversarial attacks.
对抗攻击的目标是通过在输入样本中添加特定的噪声来干扰目标系统的功能和，当应用于人脸识别系统时，可能带来安全和鲁棒性的潜在威胁。尽管现有的防御技术能够在一些特定的对抗 Faces(adv-faces) 的检测方面实现高精度(adv-faces)，但新的攻击方法特别是基于GAN的攻击方法，具有完全不同的噪声模式，绕过了这些防御技术并实现了更高的攻击成功率。更加糟糕的是，现有技术需要在实施防御之前收集攻击数据，因此无法有效地防御那些对防御者未知的新攻击。在本文中，我们研究了adv-faces 的固有一般性，并提出了通过对真实人脸进行三个启发式的噪声模式扰动来生成伪adv-faces 的方法。我们是第一位使用仅真实人脸及其自扰训练 an adv-face 检测器的人，并对受害者人脸识别系统及未知的攻击进行gnostic。将adv-faces 视为非分布数据，因此我们自然地引入了一个 novel 的级联系统以进行adv-face 检测，它包括训练数据自扰、决策边界 Regularization 和基于最大池化的分类器，专注于异常局部颜色变异。在 LFW 和CelebA-HQ 数据集上，使用八项梯度攻击和两个 GAN 攻击进行实验，证明了我们的方法可以适应各种不同的未知对抗攻击。
In this work, we investigate the potential threat of adversarial examples to the security of face recognition systems. Although previous research has explored the adversarial risk to individual components of FRSs, our study presents an initial exploration of an adversary simultaneously fooling multiple components: the face detector and feature extractor in an FRS pipeline. We propose three multi-objective attacks on FRSs and demonstrate their effectiveness through a preliminary experimental analysis on a target system. Our attacks achieved up to 100% Attack Success Rates against both the face detector and feature extractor and were able to manipulate the face detection probability by up to 50% depending on the adversarial objective. This research identifies and examines novel attack vectors against FRSs and suggests possible ways to augment the robustness by leveraging the attack vector's knowledge during training of an FRS's components.
Real-time eyeblink detection in the wild can widely serve for fatigue detection, face anti-spoofing, emotion analysis, etc. The existing research efforts generally focus on single-person cases towards trimmed video. However, multi-person scenario within untrimmed videos is also important for practical applications, which has not been well concerned yet. To address this, we shed light on this research field for the first time with essential contributions on dataset, theory, and practices. In particular, a large-scale dataset termed MPEblink that involves 686 untrimmed videos with 8748 eyeblink events is proposed under multi-person conditions. The samples are captured from unconstrained films to reveal "in the wild" characteristics. Meanwhile, a real-time multi-person eyeblink detection method is also proposed. Being different from the existing counterparts, our proposition runs in a one-stage spatio-temporal way with end-to-end learning capacity. Specifically, it simultaneously addresses the sub-tasks of face detection, face tracking, and human instance-level eyeblink detection. This paradigm holds 2 main advantages: (1) eyeblink features can be facilitated via the face's global context (e.g., head pose and illumination condition) with joint optimization and interaction, and (2) addressing these sub-tasks in parallel instead of sequential manner can save time remarkably to meet the real-time running requirement. Experiments on MPEblink verify the essential challenges of real-time multi-person eyeblink detection in the wild for untrimmed video. Our method also outperforms existing approaches by large margins and with a high inference speed.
在野外实时监测 eyeblink 可以广泛用于疲劳检测、面部防伪造、情感分析等。现有的研究努力一般集中在剪辑视频的单人案例上。然而，在未剪辑的视频内多人场景也对实际应用至关重要，这一点尚未得到足够关注。为了解决这个问题，我们首次在数据集、理论和实践中做出了重要贡献，特别是提出了一个名为 MPEblink 的大型数据集，该数据集涉及 686 个未剪辑视频和 8748 个 eyeblink 事件，从不受限制的电影中采集样本，以揭示“在野外”的特征。同时，我们也提出了一种实时多人 eyeblink 检测方法。与现有的对应方法不同，我们的提议采用了一个单一的阶段空间方式，并具有端到端学习能力。具体来说，它同时解决了人脸检测、人脸跟踪和人实例级 eyeblink 检测的任务。这个范式有两个主要优势：(1) eyeblink 特征可以通过人脸的全球上下文(例如，头部姿势和照明条件)进行优化和交互，以促进；(2) 解决这些任务并行而不是Sequentially 的方式可以节省大量时间，以满足实时运行需求。MPEblink 数据集的实验验证了在野外实时监测未剪辑视频的多人 eyeblink 检测的关键挑战。我们的方法还以显著优势超越了现有的方法，并具有快速推理速度。
The performance of convolutional neural networks has continued to improve over the last decade. At the same time, as model complexity grows, it becomes increasingly more difficult to explain model decisions. Such explanations may be of critical importance for reliable operation of human-machine pairing setups, or for model selection when the "best" model among many equally-accurate models must be established. Saliency maps represent one popular way of explaining model decisions by highlighting image regions models deem important when making a prediction. However, examining salience maps at scale is not practical. In this paper, we propose five novel methods of leveraging model salience to explain a model behavior at scale. These methods ask: (a) what is the average entropy for a model's salience maps, (b) how does model salience change when fed out-of-set samples, (c) how closely does model salience follow geometrical transformations, (d) what is the stability of model salience across independent training runs, and (e) how does model salience react to salience-guided image degradations. To assess the proposed measures on a concrete and topical problem, we conducted a series of experiments for the task of synthetic face detection with two types of models: those trained traditionally with cross-entropy loss, and those guided by human salience when training to increase model generalizability. These two types of models are characterized by different, interpretable properties of their salience maps, which allows for the evaluation of the correctness of the proposed measures. We offer source codes for each measure along with this paper.
卷积神经网络的性能在过去十年中继续 improve。同时,随着模型复杂性的增加,解释模型决策变得越来越困难。这些解释可能对可靠地人类机器配对 setup 的正常运行至关重要,或者当必须在许多同样准确的模型中选择一个“最好的”模型时,必须建立模型选择。注意力地图是一种 popular 的方法,通过突出在预测时认为重要的图像区域,解释模型决策。然而,在尺度上检查注意力地图并不实际。在本文中,我们提出了五个利用模型注意力来解释模型行为的新方法。这些方法询问:(a) 模型注意力地图的平均熵是多少,(b) 如何处理超出范围样本时模型注意力的变化,(c) 模型注意力如何与几何变换密切跟随,(d) 模型注意力在不同独立训练轮上的稳定程度是多少,(e) 模型注意力如何响应由注意力引导的图像恶化。为了评估提出的措施的正确性,我们进行了一项涉及合成面部检测任务的系列实验,使用两种模型类型:那些传统上使用交叉熵损失训练的模型,以及在训练时由人类注意力指导以提高模型泛化能力的模型。这两种模型的特点是它们的注意力地图具有不同可解释的特性,这允许评估提出的措施的正确性。我们与本文一起提供了每个措施的源代码。
Recently, conditional diffusion models have gained popularity in numerous applications due to their exceptional generation ability. However, many existing methods are training-required. They need to train a time-dependent classifier or a condition-dependent score estimator, which increases the cost of constructing conditional diffusion models and is inconvenient to transfer across different conditions. Some current works aim to overcome this limitation by proposing training-free solutions, but most can only be applied to a specific category of tasks and not to more general conditions. In this work, we propose a training-Free conditional Diffusion Model (FreeDoM) used for various conditions. Specifically, we leverage off-the-shelf pre-trained networks, such as a face detection model, to construct time-independent energy functions, which guide the generation process without requiring training. Furthermore, because the construction of the energy function is very flexible and adaptable to various conditions, our proposed FreeDoM has a broader range of applications than existing training-free methods. FreeDoM is advantageous in its simplicity, effectiveness, and low cost. Experiments demonstrate that FreeDoM is effective for various conditions and suitable for diffusion models of diverse data domains, including image and latent code domains.
近年来,条件扩散模型因其出色的生成能力而在多个应用中获得了 popular 。然而,许多现有方法需要训练。它们需要训练一个时间依赖性分类器或条件依赖得分估计器,这增加了构建条件扩散模型的成本,并不利于在不同条件下的转移。一些当前工作旨在提出不需要训练的解决方案,但大多数只能应用于特定类别的任务,而不是更一般的条件。在本工作中,我们提出了一种不需要训练的训练-free条件扩散模型(FreeDoM),适用于各种条件。具体来说,我们利用现有预训练网络,如人脸识别模型,构建时间 independent 的能量函数,无需训练指导生成过程。此外,由于能量函数的构建非常灵活和适应各种条件,我们提出的 FreeDoM 比现有的不需要训练的方法应用范围更广。FreeDoM 的优点在于它的简单性、有效性和低成本。实验表明,FreeDoM 适用于各种条件,适用于包括图像和隐含代码 domains 等多种数据 domains 的条件扩散模型。
With the improvement of sensor technology and significant algorithmic advances, the accuracy of remote heart rate monitoring technology has been significantly improved. Despite of the significant algorithmic advances, the performance of rPPG algorithm can degrade in the long-term, high-intensity continuous work occurred in evenings or insufficient light environments. One of the main challenges is that the lost facial details and low contrast cause the failure of detection and tracking. Also, insufficient lighting in video capturing hurts the quality of physiological signal. In this paper, we collect a large-scale dataset that was designed for remote heart rate estimation recorded with various illumination variations to evaluate the performance of the rPPG algorithm (Green, ICA, and POS). We also propose a low-light enhancement solution (technical solution) for remote heart rate estimation under the low-light condition. Using collected dataset, we found 1) face detection algorithm cannot detect faces in video captured in low light conditions; 2) A decrease in the amplitude of the pulsatile signal will lead to the noise signal to be in the dominant position; and 3) the chrominance-based method suffers from the limitation in the assumption about skin-tone will not hold, and Green and ICA method receive less influence than POS in dark illuminance environment. The proposed solution for rPPG process is effective to detect and improve the signal-to-noise ratio and precision of the pulsatile signal.
随着传感器技术和算法的重大进展,远程心率监测技术的精度已经得到了显著提高。尽管出现了重大算法进展,但RPG算法的长期性能可能会下降,尤其是在傍晚或光线不足的环境下进行高强度连续工作。其中一个重要的挑战是,失去面部细节和低对比度会导致检测和跟踪失败。此外,视频捕获中的照明不足会影响生理信号的质量。在本文中,我们收集了一个大规模的数据集,该数据集设计用于远程心率估计,并采用各种照明变化来评估RPG算法的性能。我们还提出了在低光照条件下的远程心率估计技术的技术解决方案。通过收集的数据集,我们发现1) 面部检测算法在低光照条件下无法检测视频中的面部;2) 脉冲信号的振幅减少会导致噪声信号处于主导地位;3)基于颜色映射的方法受到肤色假设的限制,无法保持稳定,而Green和ICA方法在黑暗照明环境下的影响力比POS低。RPG算法提出的解决方案能够有效地检测和改善脉冲信号的信号到噪声比率和精度。
As fatigue is normally revealed in the eyes and mouth of a person's face, this paper tried to construct a XGBoost Algorithm-Based fatigue recognition model using the two indicators, EAR (Eye Aspect Ratio) and MAR(Mouth Aspect Ratio). With an accuracy rate of 87.37% and sensitivity rate of 89.14%, the model was proved to be efficient and valid for further applications.
疲劳通常从人面部表情的眼部和嘴巴表现中显露出来,因此本文尝试使用两个指标,即 ear (眼部比例)和 MAR (口部比例)构建基于XGBoost算法的疲劳识别模型。该模型的准确性为87.37%,灵敏度为89.14%,证明了其高效和有效的进一步应用价值。
Nowadays, forgery faces pose pressing security concerns over fake news, fraud, impersonation, etc. Despite the demonstrated success in intra-domain face forgery detection, existing detection methods lack generalization capability and tend to suffer from dramatic performance drops when deployed to unforeseen domains. To mitigate this issue, this paper designs a more general fake face detection model based on the vision transformer(ViT) architecture. In the training phase, the pretrained ViT weights are freezed, and only the Low-Rank Adaptation(LoRA) modules are updated. Additionally, the Single Center Loss(SCL) is applied to supervise the training process, further improving the generalization capability of the model. The proposed method achieves state-of-the-arts detection performances in both cross-manipulation and cross-dataset evaluations.
现如今,伪造的面孔对假新闻、欺诈、模仿等现实问题提出了紧急的安全担忧。尽管在域内面部伪造检测方面已经取得了成功,但现有的检测方法缺乏泛化能力,并且往往在未知域中表现不佳。为了缓解这个问题,本文基于视觉transformer(ViT)架构设计了一个更为通用的伪造面检测模型。在训练阶段,预先训练的ViT权重被冻结,仅LoRA模块进行更新。此外,Single Center Loss(SCL)用于监督训练过程,进一步提高了模型的泛化能力。该方法在跨操纵和跨数据集评估中都取得了最先进的检测性能。
Deep learning-based models generalize better to unknown data samples after being guided "where to look" by incorporating human perception into training strategies. We made an observation that the entropy of the model's salience trained in that way is lower when compared to salience entropy computed for models training without human perceptual intelligence. Thus the question: does further increase of model's focus, by lowering the entropy of model's class activation map, help in further increasing the performance? In this paper we propose and evaluate several entropy-based new loss function components controlling the model's focus, covering the full range of the level of such control, from none to its "aggressive" minimization. We show, using a problem of synthetic face detection, that improving the model's focus, through lowering entropy, leads to models that perform better in an open-set scenario, in which the test samples are synthesized by unknown generative models. We also show that optimal performance is obtained when the model's loss function blends three aspects: regular classification, low-entropy of the model's focus, and human-guided saliency.
在通过将人类感知纳入训练策略来引导“看哪里”时,使用深度学习模型对未知数据样本的泛化能力更好。我们观察到,与没有人类感知 intelligence 训练的模型计算的感知熵相比,这种训练方法训练的模型熵更低。因此,问题变成了:进一步增加模型的关注程度,通过降低模型的类激活图熵,是否能够进一步增加性能?在本文中,我们提出了和评估了几个基于熵的新损失函数组件,用于控制模型的关注,涵盖了从没有控制到“激进”最小化的所有水平。我们使用合成人脸检测问题来证明,通过降低熵,改善模型的关注会导致在开放场景下测试样本表现更好的模型。我们还证明,当模型的损失函数融合了三个方面:常规分类、模型关注熵的较低熵以及人类指导的感知熵时,能够获得最佳性能。
In recent years, deep convolutional neural networks (CNN) have significantly advanced face detection. In particular, lightweight CNNbased architectures have achieved great success due to their lowcomplexity structure facilitating real-time detection tasks. However, current lightweight CNN-based face detectors trading accuracy for efficiency have inadequate capability in handling insufficient feature representation, faces with unbalanced aspect ratios and occlusion. Consequently, they exhibit deteriorated performance far lagging behind the deep heavy detectors. To achieve efficient face detection without sacrificing accuracy, we design an efficient deep face detector termed EfficientFace in this study, which contains three modules for feature enhancement. To begin with, we design a novel cross-scale feature fusion strategy to facilitate bottom-up information propagation, such that fusing low-level and highlevel features is further strengthened. Besides, this is conducive to estimating the locations of faces and enhancing the descriptive power of face features. Secondly, we introduce a Receptive Field Enhancement module to consider faces with various aspect ratios. Thirdly, we add an Attention Mechanism module for improving the representational capability of occluded faces. We have evaluated EfficientFace on four public benchmarks and experimental results demonstrate the appealing performance of our method. In particular, our model respectively achieves 95.1% (Easy), 94.0% (Medium) and 90.1% (Hard) on validation set of WIDER Face dataset, which is competitive with heavyweight models with only 1/15 computational costs of the state-of-the-art MogFace detector.
近年来,深度卷积神经网络(CNN)已经显著推动了人脸识别技术的进步。特别是,基于轻量级CNN架构的设计已经取得了巨大的成功,因为其低复杂性结构 facilitate 实时检测任务。然而,当前基于轻量级CNN的人脸识别技术在处理不足特征表示、具有不均衡 aspect ratios 和遮挡面容等问题时存在不足。因此,它们的表现远远落后于深度重载检测器。为了在不牺牲准确性的前提下实现高效的人脸识别,我们在本研究中设计了名为EfficientFace 高效的深度人脸识别器,它包含三个特征增强模块。首先,我们设计了一种跨尺度特征融合策略,以促进自下而上的信息传播,从而进一步加强了低级别和高级别特征的融合。此外,这有助于估计面部的位置和增强面部特征的描述能力。其次,我们引入了一个感知场增强模块,考虑各种 aspect ratios 的面部。第三,我们添加了一个注意力机制模块,以提高遮挡面容的表示能力。我们评估了EfficientFace 在不同公共基准上的性能,实验结果证明了我们方法的 appealing 表现。特别是,我们的模型在WIDE Face 数据集的验证集上分别实现了95.1%(简单)、94.0%(中等)、90.1%(困难),这比最先进的mogFace检测器的计算成本仅高出1/15。
The overall objective of the main project is to propose and develop a system of facial authentication in unlocking phones or applications in phones using facial recognition. The system will include four separate architectures: face detection, face recognition, face spoofing, and classification of closed eyes. In which, we consider the problem of face recognition to be the most important, determining the true identity of the person standing in front of the screen with absolute accuracy is what facial recognition systems need to achieve. Along with the development of the face recognition problem, the problem of the anti-fake face is also gradually becoming popular and equally important. Our goal is to propose and develop two loss functions: LMCot and Double Loss. Then apply them to the face authentication process.
该项目的总体目标是提出和开发一种在解锁手机或在手机上使用面部识别的面部验证系统。该系统将包括四个独立的架构:面部检测、面部识别、面部模拟和闭眼分类。我们认为面部识别的问题是最重要的,以绝对的准确性确定站在屏幕前的人的真实身份是面部识别系统需要实现的。随着面部识别问题的发展,反伪造面部也 gradually变得越来越流行和同样重要。我们的的目标是提出和开发两个损失函数:LMCot和Double Loss,并将它们应用于面部验证过程。
In order to protect vulnerable road users (VRUs), such as pedestrians or cyclists, it is essential that intelligent transportation systems (ITS) accurately identify them. Therefore, datasets used to train perception models of ITS must contain a significant number of vulnerable road users. However, data protection regulations require that individuals are anonymized in such datasets. In this work, we introduce a novel deep learning-based pipeline for face anonymization in the context of ITS. In contrast to related methods, we do not use generative adversarial networks (GANs) but build upon recent advances in diffusion models. We propose a two-stage method, which contains a face detection model followed by a latent diffusion model to generate realistic face in-paintings. To demonstrate the versatility of anonymized images, we train segmentation methods on anonymized data and evaluate them on non-anonymized data. Our experiment reveal that our pipeline is better suited to anonymize data for segmentation than naive methods and performes comparably with recent GAN-based methods. Moreover, face detectors achieve higher mAP scores for faces anonymized by our method compared to naive or recent GAN-based methods.
为了确保保护行人和骑自行车的人等脆弱道路使用者(VRUs),例如行人和骑自行车的人,智能交通系统(ITS)必须准确地识别他们。因此,用于训练 ITS 感知模型的数据集必须包含大量脆弱道路使用者。然而,数据保护法规要求在这些数据集中对个体进行匿名化。在这个项目中,我们介绍了一种基于深度学习的新的通道,用于 face 匿名化 in the context of ITS。与相关的方法不同,我们不使用生成对抗网络(GANs)而是基于扩散模型的最新进展。我们提出了一个两阶段的方法,其中包含一个面部检测模型,以及一个隐扩散模型,以生成逼真的面部绘画。为了展示匿名图像的多功能性,我们训练了分割方法,并将其应用于匿名数据集。我们的实验表明,我们的通道更适合用于分割数据,而不是简单的方法,并且与最近的 GAN 基方法相当。此外,我们的面部检测方法对于使用我们的方法匿名化的人脸得分更高,相比简单的或最近的 GAN 基方法。
This paper explores automated face and facial landmark detection of neonates, which is an important first step in many video-based neonatal health applications, such as vital sign estimation, pain assessment, sleep-wake classification, and jaundice detection. Utilising three publicly available datasets of neonates in the clinical environment, 366 images (258 subjects) and 89 (66 subjects) were annotated for training and testing, respectively. Transfer learning was applied to two YOLO-based models, with input training images augmented with random horizontal flipping, photo-metric colour distortion, translation and scaling during each training epoch. Additionally, the re-orientation of input images and fusion of trained deep learning models was explored. Our proposed model based on YOLOv7Face outperformed existing methods with a mean average precision of 84.8% for face detection, and a normalised mean error of 0.072 for facial landmark detection. Overall, this will assist in the development of fully automated neonatal health assessment algorithms.
本 paper 探讨了自动检测新生儿的面部和面部地标,这在许多基于视频的新生儿健康应用中是一个重要的的第一步,例如估计生命体征、评估疼痛、睡眠-清醒分类和检测黄疸。利用临床环境中公开可用的三个新生儿数据集,共进行了 366 张照片(258 名受试者)和 89 张照片(66 名受试者)的标注,用于训练和测试。 Transfer learning 应用于两个基于 YOLO 的模型,在每个训练 epoch 中,输入训练图像随机地进行水平翻转、photo-metric 颜色扭曲、旋转和缩放。此外,探索了输入图像的重新定向和训练深度神经网络的融合。我们提出的基于 YOLOv7Face 的模型在面部检测方面表现更好,面部地标检测的平均精度为 84.8%,均值误差为 0.072。Overall,这将协助开发完全自动化的新生儿健康评估算法。
Many outdoor autonomous mobile platforms require more human identity anonymized data to power their data-driven algorithms. The human identity anonymization should be robust so that less manual intervention is needed, which remains a challenge for current face detection and anonymization systems. In this paper, we propose to use the skeleton generated from the state-of-the-art human pose estimation model to help localize human heads. We develop criteria to evaluate the performance and compare it with the face detection approach. We demonstrate that the proposed algorithm can reduce missed faces and thus better protect the identity information for the pedestrians. We also develop a confidence-based fusion method to further improve the performance.
Face manipulation detection has been receiving a lot of attention for the reliability and security of the face images. Recent studies focus on using auxiliary information or prior knowledge to capture robust manipulation traces, which are shown to be promising. As one of the important face features, the face depth map, which has shown to be effective in other areas such as the face recognition or face detection, is unfortunately paid little attention to in literature for detecting the manipulated face images. In this paper, we explore the possibility of incorporating the face depth map as auxiliary information to tackle the problem of face manipulation detection in real world applications. To this end, we first propose a Face Depth Map Transformer (FDMT) to estimate the face depth map patch by patch from a RGB face image, which is able to capture the local depth anomaly created due to manipulation. The estimated face depth map is then considered as auxiliary information to be integrated with the backbone features using a Multi-head Depth Attention (MDA) mechanism that is newly designed. Various experiments demonstrate the advantage of our proposed method for face manipulation detection.
As the quality of optical sensors improves, there is a need for processing large-scale images. In particular, the ability of devices to capture ultra-high definition (UHD) images and video places new demands on the image processing pipeline. In this paper, we consider the task of low-light image enhancement (LLIE) and introduce a large-scale database consisting of images at 4K and 8K resolution. We conduct systematic benchmarking studies and provide a comparison of current LLIE algorithms. As a second contribution, we introduce LLFormer, a transformer-based low-light enhancement method. The core components of LLFormer are the axis-based multi-head self-attention and cross-layer attention fusion block, which significantly reduces the linear complexity. Extensive experiments on the new dataset and existing public datasets show that LLFormer outperforms state-of-the-art methods. We also show that employing existing LLIE methods trained on our benchmark as a pre-processing step significantly improves the performance of downstream tasks, e.g., face detection in low-light conditions. The source code and pre-trained models are available at this https URL.
The management of cattle over a huge area is still a challenging problem in the farming sector. With evolution in technology, Unmanned aerial vehicles (UAVs) with consumer level digital cameras are becoming a popular alternative to manual animal censuses for livestock estimation since they are less risky and expensive.This paper evaluated and compared the cutting-edge object detection algorithms, YOLOv7,RetinaNet with ResNet50 backbone, RetinaNet with EfficientNet and mask RCNN. It aims to improve the occlusion problem that is to detect hidden cattle from a huge dataset captured by drones using deep learning algorithms for accurate cattle detection. Experimental results showed YOLOv7 was superior with precision of 0.612 when compared to the other two algorithms. The proposed method proved superior to the usual competing algorithms for cow face detection, especially in very difficult cases.