As fatigue is normally revealed in the eyes and mouth of a person's face, this paper tried to construct a XGBoost Algorithm-Based fatigue recognition model using the two indicators, EAR (Eye Aspect Ratio) and MAR(Mouth Aspect Ratio). With an accuracy rate of 87.37% and sensitivity rate of 89.14%, the model was proved to be efficient and valid for further applications.
疲劳通常从人面部表情的眼部和嘴巴表现中显露出来,因此本文尝试使用两个指标,即 ear (眼部比例)和 MAR (口部比例)构建基于XGBoost算法的疲劳识别模型。该模型的准确性为87.37%,灵敏度为89.14%,证明了其高效和有效的进一步应用价值。
https://arxiv.org/abs/2303.12727
Nowadays, forgery faces pose pressing security concerns over fake news, fraud, impersonation, etc. Despite the demonstrated success in intra-domain face forgery detection, existing detection methods lack generalization capability and tend to suffer from dramatic performance drops when deployed to unforeseen domains. To mitigate this issue, this paper designs a more general fake face detection model based on the vision transformer(ViT) architecture. In the training phase, the pretrained ViT weights are freezed, and only the Low-Rank Adaptation(LoRA) modules are updated. Additionally, the Single Center Loss(SCL) is applied to supervise the training process, further improving the generalization capability of the model. The proposed method achieves state-of-the-arts detection performances in both cross-manipulation and cross-dataset evaluations.
现如今,伪造的面孔对假新闻、欺诈、模仿等现实问题提出了紧急的安全担忧。尽管在域内面部伪造检测方面已经取得了成功,但现有的检测方法缺乏泛化能力,并且往往在未知域中表现不佳。为了缓解这个问题,本文基于视觉transformer(ViT)架构设计了一个更为通用的伪造面检测模型。在训练阶段,预先训练的ViT权重被冻结,仅LoRA模块进行更新。此外,Single Center Loss(SCL)用于监督训练过程,进一步提高了模型的泛化能力。该方法在跨操纵和跨数据集评估中都取得了最先进的检测性能。
https://arxiv.org/abs/2303.00917
Deep learning-based models generalize better to unknown data samples after being guided "where to look" by incorporating human perception into training strategies. We made an observation that the entropy of the model's salience trained in that way is lower when compared to salience entropy computed for models training without human perceptual intelligence. Thus the question: does further increase of model's focus, by lowering the entropy of model's class activation map, help in further increasing the performance? In this paper we propose and evaluate several entropy-based new loss function components controlling the model's focus, covering the full range of the level of such control, from none to its "aggressive" minimization. We show, using a problem of synthetic face detection, that improving the model's focus, through lowering entropy, leads to models that perform better in an open-set scenario, in which the test samples are synthesized by unknown generative models. We also show that optimal performance is obtained when the model's loss function blends three aspects: regular classification, low-entropy of the model's focus, and human-guided saliency.
在通过将人类感知纳入训练策略来引导“看哪里”时,使用深度学习模型对未知数据样本的泛化能力更好。我们观察到,与没有人类感知 intelligence 训练的模型计算的感知熵相比,这种训练方法训练的模型熵更低。因此,问题变成了:进一步增加模型的关注程度,通过降低模型的类激活图熵,是否能够进一步增加性能?在本文中,我们提出了和评估了几个基于熵的新损失函数组件,用于控制模型的关注,涵盖了从没有控制到“激进”最小化的所有水平。我们使用合成人脸检测问题来证明,通过降低熵,改善模型的关注会导致在开放场景下测试样本表现更好的模型。我们还证明,当模型的损失函数融合了三个方面:常规分类、模型关注熵的较低熵以及人类指导的感知熵时,能够获得最佳性能。
https://arxiv.org/abs/2303.00818
In recent years, deep convolutional neural networks (CNN) have significantly advanced face detection. In particular, lightweight CNNbased architectures have achieved great success due to their lowcomplexity structure facilitating real-time detection tasks. However, current lightweight CNN-based face detectors trading accuracy for efficiency have inadequate capability in handling insufficient feature representation, faces with unbalanced aspect ratios and occlusion. Consequently, they exhibit deteriorated performance far lagging behind the deep heavy detectors. To achieve efficient face detection without sacrificing accuracy, we design an efficient deep face detector termed EfficientFace in this study, which contains three modules for feature enhancement. To begin with, we design a novel cross-scale feature fusion strategy to facilitate bottom-up information propagation, such that fusing low-level and highlevel features is further strengthened. Besides, this is conducive to estimating the locations of faces and enhancing the descriptive power of face features. Secondly, we introduce a Receptive Field Enhancement module to consider faces with various aspect ratios. Thirdly, we add an Attention Mechanism module for improving the representational capability of occluded faces. We have evaluated EfficientFace on four public benchmarks and experimental results demonstrate the appealing performance of our method. In particular, our model respectively achieves 95.1% (Easy), 94.0% (Medium) and 90.1% (Hard) on validation set of WIDER Face dataset, which is competitive with heavyweight models with only 1/15 computational costs of the state-of-the-art MogFace detector.
近年来,深度卷积神经网络(CNN)已经显著推动了人脸识别技术的进步。特别是,基于轻量级CNN架构的设计已经取得了巨大的成功,因为其低复杂性结构 facilitate 实时检测任务。然而,当前基于轻量级CNN的人脸识别技术在处理不足特征表示、具有不均衡 aspect ratios 和遮挡面容等问题时存在不足。因此,它们的表现远远落后于深度重载检测器。为了在不牺牲准确性的前提下实现高效的人脸识别,我们在本研究中设计了名为EfficientFace 高效的深度人脸识别器,它包含三个特征增强模块。首先,我们设计了一种跨尺度特征融合策略,以促进自下而上的信息传播,从而进一步加强了低级别和高级别特征的融合。此外,这有助于估计面部的位置和增强面部特征的描述能力。其次,我们引入了一个感知场增强模块,考虑各种 aspect ratios 的面部。第三,我们添加了一个注意力机制模块,以提高遮挡面容的表示能力。我们评估了EfficientFace 在不同公共基准上的性能,实验结果证明了我们方法的 appealing 表现。特别是,我们的模型在WIDE Face 数据集的验证集上分别实现了95.1%(简单)、94.0%(中等)、90.1%(困难),这比最先进的mogFace检测器的计算成本仅高出1/15。
https://arxiv.org/abs/2302.11816
The overall objective of the main project is to propose and develop a system of facial authentication in unlocking phones or applications in phones using facial recognition. The system will include four separate architectures: face detection, face recognition, face spoofing, and classification of closed eyes. In which, we consider the problem of face recognition to be the most important, determining the true identity of the person standing in front of the screen with absolute accuracy is what facial recognition systems need to achieve. Along with the development of the face recognition problem, the problem of the anti-fake face is also gradually becoming popular and equally important. Our goal is to propose and develop two loss functions: LMCot and Double Loss. Then apply them to the face authentication process.
该项目的总体目标是提出和开发一种在解锁手机或在手机上使用面部识别的面部验证系统。该系统将包括四个独立的架构:面部检测、面部识别、面部模拟和闭眼分类。我们认为面部识别的问题是最重要的,以绝对的准确性确定站在屏幕前的人的真实身份是面部识别系统需要实现的。随着面部识别问题的发展,反伪造面部也 gradually变得越来越流行和同样重要。我们的的目标是提出和开发两个损失函数:LMCot和Double Loss,并将它们应用于面部验证过程。
https://arxiv.org/abs/2302.11427
In order to protect vulnerable road users (VRUs), such as pedestrians or cyclists, it is essential that intelligent transportation systems (ITS) accurately identify them. Therefore, datasets used to train perception models of ITS must contain a significant number of vulnerable road users. However, data protection regulations require that individuals are anonymized in such datasets. In this work, we introduce a novel deep learning-based pipeline for face anonymization in the context of ITS. In contrast to related methods, we do not use generative adversarial networks (GANs) but build upon recent advances in diffusion models. We propose a two-stage method, which contains a face detection model followed by a latent diffusion model to generate realistic face in-paintings. To demonstrate the versatility of anonymized images, we train segmentation methods on anonymized data and evaluate them on non-anonymized data. Our experiment reveal that our pipeline is better suited to anonymize data for segmentation than naive methods and performes comparably with recent GAN-based methods. Moreover, face detectors achieve higher mAP scores for faces anonymized by our method compared to naive or recent GAN-based methods.
为了确保保护行人和骑自行车的人等脆弱道路使用者(VRUs),例如行人和骑自行车的人,智能交通系统(ITS)必须准确地识别他们。因此,用于训练 ITS 感知模型的数据集必须包含大量脆弱道路使用者。然而,数据保护法规要求在这些数据集中对个体进行匿名化。在这个项目中,我们介绍了一种基于深度学习的新的通道,用于 face 匿名化 in the context of ITS。与相关的方法不同,我们不使用生成对抗网络(GANs)而是基于扩散模型的最新进展。我们提出了一个两阶段的方法,其中包含一个面部检测模型,以及一个隐扩散模型,以生成逼真的面部绘画。为了展示匿名图像的多功能性,我们训练了分割方法,并将其应用于匿名数据集。我们的实验表明,我们的通道更适合用于分割数据,而不是简单的方法,并且与最近的 GAN 基方法相当。此外,我们的面部检测方法对于使用我们的方法匿名化的人脸得分更高,相比简单的或最近的 GAN 基方法。
https://arxiv.org/abs/2302.08931
This paper explores automated face and facial landmark detection of neonates, which is an important first step in many video-based neonatal health applications, such as vital sign estimation, pain assessment, sleep-wake classification, and jaundice detection. Utilising three publicly available datasets of neonates in the clinical environment, 366 images (258 subjects) and 89 (66 subjects) were annotated for training and testing, respectively. Transfer learning was applied to two YOLO-based models, with input training images augmented with random horizontal flipping, photo-metric colour distortion, translation and scaling during each training epoch. Additionally, the re-orientation of input images and fusion of trained deep learning models was explored. Our proposed model based on YOLOv7Face outperformed existing methods with a mean average precision of 84.8% for face detection, and a normalised mean error of 0.072 for facial landmark detection. Overall, this will assist in the development of fully automated neonatal health assessment algorithms.
本 paper 探讨了自动检测新生儿的面部和面部地标,这在许多基于视频的新生儿健康应用中是一个重要的的第一步,例如估计生命体征、评估疼痛、睡眠-清醒分类和检测黄疸。利用临床环境中公开可用的三个新生儿数据集,共进行了 366 张照片(258 名受试者)和 89 张照片(66 名受试者)的标注,用于训练和测试。 Transfer learning 应用于两个基于 YOLO 的模型,在每个训练 epoch 中,输入训练图像随机地进行水平翻转、photo-metric 颜色扭曲、旋转和缩放。此外,探索了输入图像的重新定向和训练深度神经网络的融合。我们提出的基于 YOLOv7Face 的模型在面部检测方面表现更好,面部地标检测的平均精度为 84.8%,均值误差为 0.072。Overall,这将协助开发完全自动化的新生儿健康评估算法。
https://arxiv.org/abs/2302.04341
Many outdoor autonomous mobile platforms require more human identity anonymized data to power their data-driven algorithms. The human identity anonymization should be robust so that less manual intervention is needed, which remains a challenge for current face detection and anonymization systems. In this paper, we propose to use the skeleton generated from the state-of-the-art human pose estimation model to help localize human heads. We develop criteria to evaluate the performance and compare it with the face detection approach. We demonstrate that the proposed algorithm can reduce missed faces and thus better protect the identity information for the pedestrians. We also develop a confidence-based fusion method to further improve the performance.
https://arxiv.org/abs/2301.04243
Face manipulation detection has been receiving a lot of attention for the reliability and security of the face images. Recent studies focus on using auxiliary information or prior knowledge to capture robust manipulation traces, which are shown to be promising. As one of the important face features, the face depth map, which has shown to be effective in other areas such as the face recognition or face detection, is unfortunately paid little attention to in literature for detecting the manipulated face images. In this paper, we explore the possibility of incorporating the face depth map as auxiliary information to tackle the problem of face manipulation detection in real world applications. To this end, we first propose a Face Depth Map Transformer (FDMT) to estimate the face depth map patch by patch from a RGB face image, which is able to capture the local depth anomaly created due to manipulation. The estimated face depth map is then considered as auxiliary information to be integrated with the backbone features using a Multi-head Depth Attention (MDA) mechanism that is newly designed. Various experiments demonstrate the advantage of our proposed method for face manipulation detection.
https://arxiv.org/abs/2212.14230
As the quality of optical sensors improves, there is a need for processing large-scale images. In particular, the ability of devices to capture ultra-high definition (UHD) images and video places new demands on the image processing pipeline. In this paper, we consider the task of low-light image enhancement (LLIE) and introduce a large-scale database consisting of images at 4K and 8K resolution. We conduct systematic benchmarking studies and provide a comparison of current LLIE algorithms. As a second contribution, we introduce LLFormer, a transformer-based low-light enhancement method. The core components of LLFormer are the axis-based multi-head self-attention and cross-layer attention fusion block, which significantly reduces the linear complexity. Extensive experiments on the new dataset and existing public datasets show that LLFormer outperforms state-of-the-art methods. We also show that employing existing LLIE methods trained on our benchmark as a pre-processing step significantly improves the performance of downstream tasks, e.g., face detection in low-light conditions. The source code and pre-trained models are available at this https URL.
https://arxiv.org/abs/2212.11548
The management of cattle over a huge area is still a challenging problem in the farming sector. With evolution in technology, Unmanned aerial vehicles (UAVs) with consumer level digital cameras are becoming a popular alternative to manual animal censuses for livestock estimation since they are less risky and expensive.This paper evaluated and compared the cutting-edge object detection algorithms, YOLOv7,RetinaNet with ResNet50 backbone, RetinaNet with EfficientNet and mask RCNN. It aims to improve the occlusion problem that is to detect hidden cattle from a huge dataset captured by drones using deep learning algorithms for accurate cattle detection. Experimental results showed YOLOv7 was superior with precision of 0.612 when compared to the other two algorithms. The proposed method proved superior to the usual competing algorithms for cow face detection, especially in very difficult cases.
https://arxiv.org/abs/2212.11418
Several face de-identification methods have been proposed to preserve users' privacy by obscuring their faces. These methods, however, can degrade the quality of photos, and they usually do not preserve the utility of faces, e.g., their age, gender, pose, and facial expression. Recently, advanced generative adversarial network models, such as StyleGAN, have been proposed, which generate realistic, high-quality imaginary faces. In this paper, we investigate the use of StyleGAN in generating de-identified faces through style mixing, where the styles or features of the target face and an auxiliary face get mixed to generate a de-identified face that carries the utilities of the target face. We examined this de-identification method with respect to preserving utility and privacy, by implementing several face detection, verification, and identification attacks. Through extensive experiments and also comparing with two state-of-the-art face de-identification methods, we show that StyleGAN preserves the quality and utility of the faces much better than the other approaches and also by choosing the style mixing levels correctly, it can preserve the privacy of the faces much better than other methods.
https://arxiv.org/abs/2212.02611
Facial analysis systems have been deployed by large companies and critiqued by scholars and activists for the past decade. Many existing algorithmic audits examine the performance of these systems on later stage elements of facial analysis systems like facial recognition and age, emotion, or perceived gender prediction; however, a core component to these systems has been vastly understudied from a fairness perspective: face detection, sometimes called face localization. Since face detection is a pre-requisite step in facial analysis systems, the bias we observe in face detection will flow downstream to the other components like facial recognition and emotion prediction. Additionally, no prior work has focused on the robustness of these systems under various perturbations and corruptions, which leaves open the question of how various people are impacted by these phenomena. We present the first of its kind detailed benchmark of face detection systems, specifically examining the robustness to noise of commercial and academic models. We use both standard and recently released academic facial datasets to quantitatively analyze trends in face detection robustness. Across all the datasets and systems, we generally find that photos of individuals who are $\textit{masculine presenting}$, $\textit{older}$, of $\textit{darker skin type}$, or have $\textit{dim lighting}$ are more susceptible to errors than their counterparts in other identities.
https://arxiv.org/abs/2211.15937
The presence of bias in deep models leads to unfair outcomes for certain demographic subgroups. Research in bias focuses primarily on facial recognition and attribute prediction with scarce emphasis on face detection. Existing studies consider face detection as binary classification into 'face' and 'non-face' classes. In this work, we investigate possible bias in the domain of face detection through facial region localization which is currently unexplored. Since facial region localization is an essential task for all face recognition pipelines, it is imperative to analyze the presence of such bias in popular deep models. Most existing face detection datasets lack suitable annotation for such analysis. Therefore, we web-curate the Fair Face Localization with Attributes (F2LA) dataset and manually annotate more than 10 attributes per face, including facial localization information. Utilizing the extensive annotations from F2LA, an experimental setup is designed to study the performance of four pre-trained face detectors. We observe (i) a high disparity in detection accuracies across gender and skin-tone, and (ii) interplay of confounding factors beyond demography. The F2LA data and associated annotations can be accessed at this http URL.
https://arxiv.org/abs/2211.03588
Facial recognition is fundamental for a wide variety of security systems operating in real-time applications. In video surveillance based face recognition, face images are typically captured over multiple frames in uncontrolled conditions; where head pose, illumination, shadowing, motion blur and focus change over the sequence. We can generalize that the three fundamental operations involved in the facial recognition tasks: face detection, face alignment and face recognition. This study presents comparative benchmark tables for the state-of-art face recognition methods by testing them with same backbone architecture in order to focus only on the face recognition solution instead of network architecture. For this purpose, we constructed a video surveillance dataset of face IDs that has high age variance, intra-class variance (face make-up, beard, etc.) with native surveillance facial imagery data for evaluation. On the other hand, this work discovers the best recognition methods for different conditions like non-masked faces, masked faces, and faces with glasses.
https://arxiv.org/abs/2211.02952
With the demand for standardized large-scale livestock farming and the development of artificial intelligence technology, a lot of research in area of animal face recognition were carried on pigs, cattle, sheep and other livestock. Face recognition consists of three sub-task: face detection, face normalizing and face identification. Most of animal face recognition study focuses on face detection and face identification. Animals are often uncooperative when taking photos, so the collected animal face images are often in arbitrary directions. The use of non-standard images may significantly reduce the performance of face recognition system. However, there is no study on normalizing of the animal face image with arbitrary directions. In this study, we developed a light-weight angle detection and region-based convolutional network (LAD-RCNN) containing a new rotation angle coding method that can detect the rotation angle and the location of animal face in one-stage. LAD-RCNN has a frame rate of 72.74 FPS (including all steps) on a single GeForce RTX 2080 Ti GPU. LAD-RCNN has been evaluated on multiple dataset including goat dataset and gaot infrared image. Evaluation result show that the AP of face detection was more than 95% and the deviation between the detected rotation angle and the ground-truth rotation angle were less than 0.036 (i.e. 6.48°) on all the test dataset. This shows that LAD-RCNN has excellent performance on livestock face and its direction detection, and therefore it is very suitable for livestock face detection and Normalizing. Code is available at this https URL
https://arxiv.org/abs/2210.17146
Recent developments in machine learning have shown that successful models do not rely only on huge amounts of data but the right kind of data. We show in this paper how this data-centric approach can be facilitated in a decentralized manner to enable efficient data collection for algorithms. Face detectors are a class of models that suffer heavily from bias issues as they have to work on a large variety of different data. We also propose a face detection and anonymization approach using a hybrid MultiTask Cascaded CNN with FaceNet Embeddings to benchmark multiple datasets to describe and evaluate the bias in the models towards different ethnicities, gender, and age groups along with ways to enrich fairness in a decentralized system of data labeling, correction, and verification by users to create a robust pipeline for model retraining.
https://arxiv.org/abs/2210.16024
Facial expressions vary from person to person, and the brightness, contrast, and resolution of every random image are different. This is why recognizing facial expressions is very difficult. This article proposes an efficient system for facial emotion recognition for the seven basic human emotions (angry, disgust, fear, happy, sad, surprise, and neutral), using a convolution neural network (CNN), which predicts and assigns probabilities to each emotion. Since deep learning models learn from data, thus, our proposed system processes each image with various pre-processing steps for better prediction. Every image was first passed through the face detection algorithm to include in the training dataset. As CNN requires a large amount of data, we duplicated our data using various filters on each image. Pre-processed images of size 80*100 are passed as input to the first layer of CNN. Three convolutional layers were used, followed by a pooling layer and three dense layers. The dropout rate for the dense layer was 20%. The model was trained by combining two publicly available datasets, JAFFE and KDEF. 90% of the data was used for training, while 10% was used for testing. We achieved maximum accuracy of 78.1 % using the combined dataset. Moreover, we designed an application of the proposed system with a graphical user interface that classifies emotions in real-time.
https://arxiv.org/abs/2001.01456
3D flash LIDAR is an alternative to the traditional scanning LIDAR systems, promising precise depth imaging in a compact form factor, and free of moving parts, for applications such as self-driving cars, robotics and augmented reality (AR). Typically implemented using single-photon, direct time-of-flight (dToF) receivers in image sensor format, the operation of the devices can be hindered by the large number of photon events needing to be processed and compressed in outdoor scenarios, limiting frame rates and scalability to larger arrays. We here present a 64x32 pixel (256x128 SPAD) dToF imager that overcomes these limitations by using pixels with embedded histogramming, which lock onto and track the return signal. This reduces the size of output data frames considerably, enabling maximum frame rates in the 10 kFPS range or 100 kFPS for direct depth readings. The sensor offers selective readout of pixels detecting surfaces, or those sensing motion, leading to reduced power consumption and off-chip processing requirements. We demonstrate the application of the sensor in mid-range LIDAR.
https://arxiv.org/abs/2209.11772
Designing Deep Neural Networks (DNNs) running on edge hardware remains a challenge. Standard designs have been adopted by the community to facilitate the deployment of Neural Network models. However, not much emphasis is put on adapting the network topology to fit hardware constraints. In this paper, we adapt one of the most widely used architectures for mobile hardware platforms, MobileNetV2, and study the impact of changing its topology and applying post-training quantization. We discuss the impact of the adaptations and the deployment of the model on an embedded hardware platform for face detection.
https://arxiv.org/abs/2208.11011