Face_Detection

A direct time-of-flight image sensor with in-pixel surface detection and dynamic vision

2022-09-23 14:38:00

Istvan Gyongy, Ahmet T. Erdogan, Neale A.W. Dutton, Germán Mora Martín, Alistair Gorman, Hanning Mai, Francesco Mattioli Della Rocca, Robert K. Henderson

arXiv_CV

arXiv_CV Detection Face Face_Detection 3D Robot
Abstract

3D flash LIDAR is an alternative to the traditional scanning LIDAR systems, promising precise depth imaging in a compact form factor, and free of moving parts, for applications such as self-driving cars, robotics and augmented reality (AR). Typically implemented using single-photon, direct time-of-flight (dToF) receivers in image sensor format, the operation of the devices can be hindered by the large number of photon events needing to be processed and compressed in outdoor scenarios, limiting frame rates and scalability to larger arrays. We here present a 64x32 pixel (256x128 SPAD) dToF imager that overcomes these limitations by using pixels with embedded histogramming, which lock onto and track the return signal. This reduces the size of output data frames considerably, enabling maximum frame rates in the 10 kFPS range or 100 kFPS for direct depth readings. The sensor offers selective readout of pixels detecting surfaces, or those sensing motion, leading to reduced power consumption and off-chip processing requirements. We demonstrate the application of the sensor in mid-range LIDAR.

Abstract (translated)

URL

https://arxiv.org/abs/2209.11772

PDF

https://arxiv.org/pdf/2209.11772.pdf
Read All
Adaptation of MobileNetV2 for Face Detection on Ultra-Low Power Platform

2022-08-23 14:47:06

Simon Narduzzi, Engin Türetken, Jean-Philippe Thiran, L. Andrea Dunbar

arXiv_CV

arXiv_CV Detection Face Face_Detection Quantization
Abstract

Designing Deep Neural Networks (DNNs) running on edge hardware remains a challenge. Standard designs have been adopted by the community to facilitate the deployment of Neural Network models. However, not much emphasis is put on adapting the network topology to fit hardware constraints. In this paper, we adapt one of the most widely used architectures for mobile hardware platforms, MobileNetV2, and study the impact of changing its topology and applying post-training quantization. We discuss the impact of the adaptations and the deployment of the model on an embedded hardware platform for face detection.

Abstract (translated)

URL

https://arxiv.org/abs/2208.11011

PDF

https://arxiv.org/pdf/2208.11011.pdf
Read All
The Value of AI Guidance in Human Examination of Synthetically-Generated Faces

2022-08-22 18:45:53

Aidan Boyd, Patrick Tinsley, Kevin Bowyer, Adam Czajka

arXiv_AI

arXiv_AI GAN Detection Object_Detection Adversarial Face Face_Detection Salient
Abstract

Face image synthesis has progressed beyond the point at which humans can effectively distinguish authentic faces from synthetically generated ones. Recently developed synthetic face image detectors boast "better-than-human" discriminative ability, especially those guided by human perceptual intelligence during the model's training process. In this paper, we investigate whether these human-guided synthetic face detectors can assist non-expert human operators in the task of synthetic image detection when compared to models trained without human-guidance. We conducted a large-scale experiment with more than 1,560 subjects classifying whether an image shows an authentic or synthetically-generated face, and annotate regions that supported their decisions. In total, 56,015 annotations across 3,780 unique face images were collected. All subjects first examined samples without any AI support, followed by samples given (a) the AI's decision ("synthetic" or "authentic"), (b) class activation maps illustrating where the model deems salient for its decision, or (c) both the AI's decision and AI's saliency map. Synthetic faces were generated with six modern Generative Adversarial Networks. Interesting observations from this experiment include: (1) models trained with human-guidance offer better support to human examination of face images when compared to models trained traditionally using cross-entropy loss, (2) binary decisions presented to humans offers better support than saliency maps, (3) understanding the AI's accuracy helps humans to increase trust in a given model and thus increase their overall accuracy. This work demonstrates that although humans supported by machines achieve better-than-random accuracy of synthetic face detection, the ways of supplying humans with AI support and of building trust are key factors determining high effectiveness of the human-AI tandem.

Abstract (translated)

URL

https://arxiv.org/abs/2208.10544

PDF

https://arxiv.org/pdf/2208.10544.pdf
Read All
Modeling Biological Face Recognition with Deep Convolutional Neural Networks

2022-08-13 16:45:30

Leonard E. van Dyck, Walter R. Gruber

arXiv_CV

arXiv_CV GAN CNN Recognition Detection Review Face Face_Detection Face_Recognition Pose
Abstract

Deep Convolutional Neural Networks (DCNNs) have become the state-of-the-art computational models of biological object recognition. Their remarkable success has helped vision science break new ground. Consequently, recent efforts have started to transfer this achievement to the domain of biological face recognition. In this regard, face detection can be investigated through comparisons of face-selective biological areas and neurons to artificial layers and units. Similarly, face identification can be examined through comparisons of in vivo and in silico face space representations. In this mini-review, we summarize the first studies with this aim. We argue that DCNNs are useful models, which follow the general hierarchical organization of biological face recognition. In two spotlights, we emphasize unique scientific contributions of these models. Firstly, studies on face detection in DCNNs propose that elementary face-selectivity emerges automatically through feedforward processes. Secondly, studies on face identification in DCNNs suggest that experience and additional generative mechanisms are required for this challenge. Taken together, as this novel computational approach enables close control of predisposition (i.e., architecture) and experience (i.e., training data), this could also inform longstanding debates on the substrates of biological face recognition.

Abstract (translated)

URL

https://arxiv.org/abs/2208.06681

PDF

https://arxiv.org/pdf/2208.06681.pdf
Read All
Visual Heart Rate Estimation from RGB Facial Video using Spectral Reflectance

2022-08-09 04:34:04

Bharath Ramakrishnan, Ruijia Deng, Hassan Ali

arXiv_CV

arXiv_CV Detection Deep_Learning Face Face_Detection Pose Medical
Abstract

Estimation of the Heart rate from the facial video has a number of applications in the medical and fitness industries. Additionally, it has become useful in the field of gaming as well. Several approaches have been proposed to seamlessly obtain the Heart rate from the facial video, but these approaches have had issues in dealing with motion and illumination artifacts. In this work, we propose a reliable HR estimation framework using the spectral reflectance of the user, which makes it robust to motion and illumination disturbances. We employ deep learning-based frameworks such as Faster RCNNs to perform face detection as opposed to the Viola Jones algorithm employed by previous approaches. We evaluate our method on the MAHNOB HCI dataset and found that the proposed method is able to outperform previous approaches.Estimation of the Heart rate from facial video has a number of applications in the medical and the fitness industries. Additionally, it has become useful in the field of gaming as well. Several approaches have been proposed to seamlessly obtain the Heart rate from the facial video, but these approaches have had issues in dealing with motion and illumination artifacts. In this work, we propose a reliable HR estimation framework using the spectral reflectance of the user, which makes it robust to motion and illumination disturbances. We employ deep learning-based frameworks such as Faster RCNNs to perform face detection as opposed to the Viola-Jones algorithm employed by previous approaches. We evaluate our method on the MAHNOB HCI dataset and found that the proposed method is able to outperform previous approaches.

Abstract (translated)

URL

https://arxiv.org/abs/2208.04947

PDF

https://arxiv.org/pdf/2208.04947.pdf
Read All
The Importance of the Instantaneous Phase in Detecting Faces with Convolutional Neural Networks

2022-08-03 17:10:54

Luis Sanchez Tapia

arXiv_CV

arXiv_CV CNN Detection Face Face_Detection Transfer_Learning Action
Abstract

Convolutional Neural Networks (CNN) have provided new and accurate methods for processing digital images and videos. Yet, training CNNs is extremely demanding in terms of computational resources. Also, for specific applications, the standard use of transfer learning also tends to require far more resources than what may be needed. Furthermore, the final systems tend to operate as black boxes that are difficult to interpret. The current thesis considers the problem of detecting faces from the AOLME video dataset. The AOLME dataset consists of a large video collection of group interactions that are recorded in unconstrained classroom environments. For the thesis, still image frames were extracted at every minute from 18 24-minute videos. Then, each video frame was divided into 9x5 blocks with 50x50 pixels each. For each of the 19440 blocks, the percentage of face pixels was set as ground truth. Face detection was then defined as a regression problem for determining the face pixel percentage for each block. For testing different methods, 12 videos were used for training and validation. The remaining 6 videos were used for testing. The thesis examines the impact of using the instantaneous phase for the AOLME block-based face detection application. For comparison, the thesis compares the use of the Frequency Modulation image based on the instantaneous phase, the use of the instantaneous amplitude, and the original gray scale image. To generate the FM and AM inputs, the thesis uses dominant component analysis that aims to decrease the training overhead while maintaining interpretability.

Abstract (translated)

URL

https://arxiv.org/abs/2208.01638

PDF

https://arxiv.org/pdf/2208.01638.pdf
Read All
YOLO-FaceV2: A Scale and Occlusion Aware Face Detector

2022-08-03 12:40:00

Ziping Yu, Hongbo Huang, Weijun Chen, Yongxin Su, Yahui Liu, Xiuying Wang

arXiv_CV

arXiv_CV Detection Object_Detection Deep_Learning Face Face_Detection Attention Pose Enhancement
Abstract

In recent years, face detection algorithms based on deep learning have made great progress. These algorithms can be generally divided into two categories, i.e. two-stage detector like Faster R-CNN and one-stage detector like YOLO. Because of the better balance between accuracy and speed, one-stage detectors have been widely used in many applications. In this paper, we propose a real-time face detector based on the one-stage detector YOLOv5, named YOLO-FaceV2. We design a Receptive Field Enhancement module called RFE to enhance receptive field of small face, and use NWD Loss to make up for the sensitivity of IoU to the location deviation of tiny objects. For face occlusion, we present an attention module named SEAM and introduce Repulsion Loss to solve it. Moreover, we use a weight function Slide to solve the imbalance between easy and hard samples and use the information of the effective receptive field to design the anchor. The experimental results on WiderFace dataset show that our face detector outperforms YOLO and its variants can be find in all easy, medium and hard subsets. Source code in this https URL

Abstract (translated)

URL

https://arxiv.org/abs/2208.02019

PDF

https://arxiv.org/pdf/2208.02019.pdf
Read All
Low-complexity Approximate Convolutional Neural Networks

2022-07-29 21:59:29

R. J. Cintra, S. Duffner, C. Garcia, A. Leite

arXiv_CV

arXiv_CV CNN Detection Classification Face Face_Detection
Abstract

In this paper, we present an approach for minimizing the computational complexity of trained Convolutional Neural Networks (ConvNet). The idea is to approximate all elements of a given ConvNet and replace the original convolutional filters and parameters (pooling and bias coefficients; and activation function) with efficient approximations capable of extreme reductions in computational complexity. Low-complexity convolution filters are obtained through a binary (zero-one) linear programming scheme based on the Frobenius norm over sets of dyadic rationals. The resulting matrices allow for multiplication-free computations requiring only addition and bit-shifting operations. Such low-complexity structures pave the way for low-power, efficient hardware designs. We applied our approach on three use cases of different complexity: (i) a "light" but efficient ConvNet for face detection (with around 1000 parameters); (ii) another one for hand-written digit classification (with more than 180000 parameters); and (iii) a significantly larger ConvNet: AlexNet with $\approx$1.2 million matrices. We evaluated the overall performance on the respective tasks for different levels of approximations. In all considered applications, very low-complexity approximations have been derived maintaining an almost equal classification performance.

Abstract (translated)

URL

https://arxiv.org/abs/2208.00087

PDF

https://arxiv.org/pdf/2208.00087.pdf
Read All
LPYOLO: Low Precision YOLO for Face Detection on FPGA

2022-07-21 13:54:52

Bestami Günay, Sefa Burak Okcu, Hasan Şakir Bilge

arXiv_CV

arXiv_CV CNN Detection Object_Detection Face Face_Detection Surveillance
Abstract

In recent years, number of edge computing devices and artificial intelligence applications on them have advanced excessively. In edge computing, decision making processes and computations are moved from servers to edge devices. Hence, cheap and low power devices are required. FPGAs are very low power, inclined to do parallel operations and deeply suitable devices for running Convolutional Neural Networks (CNN) which are the fundamental unit of an artificial intelligence application. Face detection on surveillance systems is the most expected application on the security market. In this work, TinyYolov3 architecture is redesigned and deployed for face detection. It is a CNN based object detection method and developed for embedded systems. PYNQ-Z2 is selected as a target board which has low-end Xilinx Zynq 7020 System-on-Chip (SoC) on it. Redesigned TinyYolov3 model is defined in numerous bit width precisions with Brevitas library which brings fundamental CNN layers and activations in integer quantized form. Then, the model is trained in a quantized structure with WiderFace dataset. In order to decrease latency and power consumption, onchip memory of the FPGA is configured as a storage of whole network parameters and the last activation function is modified as rescaled HardTanh instead of Sigmoid. Also, high degree of parallelism is applied to logical resources of the FPGA. The model is converted to an HLS based application with using FINN framework and FINN-HLS library which includes the layer definitions in C++. Later, the model is synthesized and deployed. CPU of the SoC is employed with multithreading mechanism and responsible for preprocessing, postprocessing and TCP/IP streaming operations. Consequently, 2.4 Watt total board power consumption, 18 Frames-Per-Second (FPS) throughput and 0.757 mAP accuracy rate on Easy category of the WiderFace are achieved with 4 bits precision model.

Abstract (translated)

URL

https://arxiv.org/abs/2207.10482

PDF

https://arxiv.org/pdf/2207.10482.pdf
Read All
An Efficient Method for Face Quality Assessment on the Edge

2022-07-19 18:29:43

Sefa Burak Okcu, Burak Oğuz Özkalaycı, Cevahir Çığla

arXiv_CV

arXiv_CV Recognition Detection Tracking Face Face_Detection Face_Recognition Pose Action Surveillance
Abstract

Face recognition applications in practice are composed of two main steps: face detection and feature extraction. In a sole vision-based solution, the first step generates multiple detection for a single identity by ingesting a camera stream. A practical approach on edge devices should prioritize these detection of identities according to their conformity to recognition. In this perspective, we propose a face quality score regression by just appending a single layer to a face landmark detection network. With almost no additional cost, face quality scores are obtained by training this single layer to regress recognition scores with surveillance like augmentations. We implemented the proposed approach on edge GPUs with all face detection pipeline steps, including detection, tracking, and alignment. Comprehensive experiments show the proposed approach's efficiency through comparison with SOTA face quality regression models on different data sets and real-life scenarios.

Abstract (translated)

URL

https://arxiv.org/abs/2207.09505

PDF

https://arxiv.org/pdf/2207.09505.pdf
Read All
Convolutional Neural Network Based Partial Face Detection

2022-06-29 01:26:40

Md. Towfiqul Islam, Tanzim Ahmed, A.B.M. Raihanur Rashid, Taminul Islam, Md. Sadekur Rahman, Md. Tarek Habib

arXiv_CV

arXiv_CV CNN Recognition Detection Face Face_Detection
Abstract

Due to the massive explanation of artificial intelligence, machine learning technology is being used in various areas of our day-to-day life. In the world, there are a lot of scenarios where a simple crime can be prevented before it may even happen or find the person responsible for it. A face is one distinctive feature that we have and can differentiate easily among many other species. But not just different species, it also plays a significant role in determining someone from the same species as us, humans. Regarding this critical feature, a single problem occurs most often nowadays. When the camera is pointed, it cannot detect a person's face, and it becomes a poor image. On the other hand, where there was a robbery and a security camera installed, the robber's identity is almost indistinguishable due to the low-quality camera. But just making an excellent algorithm to work and detecting a face reduces the cost of hardware, and it doesn't cost that much to focus on that area. Facial recognition, widget control, and such can be done by detecting the face correctly. This study aims to create and enhance a machine learning model that correctly recognizes faces. Total 627 Data have been collected from different Bangladeshi people's faces on four angels. In this work, CNN, Harr Cascade, Cascaded CNN, Deep CNN & MTCNN are these five machine learning approaches implemented to get the best accuracy of our dataset. After creating and running the model, Multi-Task Convolutional Neural Network (MTCNN) achieved 96.2% best model accuracy with training data rather than other machine learning models.

Abstract (translated)

URL

https://arxiv.org/abs/2206.14350

PDF

https://arxiv.org/pdf/2206.14350.pdf
Read All
iExam: A Novel Online Exam Monitoring and Analysis System Based on Face Detection and Recognition

2022-06-27 15:03:25

Xu Yang, Daoyuan Wu, Xiao Yi, Jimmy H. M. Lee, Tan Lee

arXiv_CV

arXiv_CV Recognition Detection Face Face_Detection Face_Recognition OCR Optimization Pose Optical_Character
Abstract

Online exams via video conference software like Zoom have been adopted in many schools due to COVID-19. While it is convenient, it is challenging for teachers to supervise online exams from simultaneously displayed student Zoom windows. In this paper, we propose iExam, an intelligent online exam monitoring and analysis system that can not only use face detection to assist invigilators in real-time student identification, but also be able to detect common abnormal behaviors (including face disappearing, rotating faces, and replacing with a different person during the exams) via a face recognition-based post-exam video analysis. To build such a novel system in its first kind, we overcome three challenges. First, we discover a lightweight approach to capturing exam video streams and analyzing them in real time. Second, we utilize the left-corner names that are displayed on each student's Zoom window and propose an improved OCR (optical character recognition) technique to automatically gather the ground truth for the student faces with dynamic positions. Third, we perform several experimental comparisons and optimizations to efficiently shorten the training and testing time required on teachers' PC. Our evaluation shows that iExam achieves high accuracy, 90.4% for real-time face detection and 98.4% for post-exam face recognition, while maintaining acceptable runtime performance. We have made iExam's source code available at this https URL.

Abstract (translated)

URL

https://arxiv.org/abs/2206.13356

PDF

https://arxiv.org/pdf/2206.13356.pdf
Read All
Depth-aware Glass Surface Detection with Cross-modal Context Mining

2022-06-22 17:56:09

Jiaying Lin, Yuen Hei Yeung, Rynson W.H. Lau

arXiv_CV

arXiv_CV Detection Drone Face Face_Detection Attention Pose Autonomous 3D Robot
Abstract

Glass surfaces are becoming increasingly ubiquitous as modern buildings tend to use a lot of glass panels. This however poses substantial challenges on the operations of autonomous systems such as robots, self-driving cars and drones, as the glass panels can become transparent obstacles to the navigation.Existing works attempt to exploit various cues, including glass boundary context or reflections, as a prior. However, they are all based on input RGB images.We observe that the transmission of 3D depth sensor light through glass surfaces often produces blank regions in the depth maps, which can offer additional insights to complement the RGB image features for glass surface detection. In this paper, we propose a novel framework for glass surface detection by incorporating RGB-D information, with two novel modules: (1) a cross-modal context mining (CCM) module to adaptively learn individual and mutual context features from RGB and depth information, and (2) a depth-missing aware attention (DAA) module to explicitly exploit spatial locations where missing depths occur to help detect the presence of glass surfaces. In addition, we propose a large-scale RGB-D glass surface detection dataset, called \textit{RGB-D GSD}, for RGB-D glass surface detection. Our dataset comprises 3,009 real-world RGB-D glass surface images with precise annotations. Extensive experimental results show that our proposed model outperforms state-of-the-art methods.

Abstract (translated)

URL

https://arxiv.org/abs/2206.11250

PDF

https://arxiv.org/pdf/2206.11250.pdf
Read All
Guiding Visual Attention in Deep Convolutional Neural Networks Based on Human Eye Movements

2022-06-21 17:59:23

Leonard E. van Dyck, Sebastian J. Denzler, Walter R. Gruber

arXiv_CV

arXiv_CV CNN Recognition Detection Deep_Learning Tracking Face Face_Detection Attention Salient Pose
Abstract

Deep Convolutional Neural Networks (DCNNs) were originally inspired by principles of biological vision, have evolved into best current computational models of object recognition, and consequently indicate strong architectural and functional parallelism with the ventral visual pathway throughout comparisons with neuroimaging and neural time series data. As recent advances in deep learning seem to decrease this similarity, computational neuroscience is challenged to reverse-engineer the biological plausibility to obtain useful models. While previous studies have shown that biologically inspired architectures are able to amplify the human-likeness of the models, in this study, we investigate a purely data-driven approach. We use human eye tracking data to directly modify training examples and thereby guide the models' visual attention during object recognition in natural images either towards or away from the focus of human fixations. We compare and validate different manipulation types (i.e., standard, human-like, and non-human-like attention) through GradCAM saliency maps against human participant eye tracking data. Our results demonstrate that the proposed guided focus manipulation works as intended in the negative direction and non-human-like models focus on significantly dissimilar image parts compared to humans. The observed effects were highly category-specific, enhanced by animacy and face presence, developed only after feedforward processing was completed, and indicated a strong influence on face detection. With this approach, however, no significantly increased human-likeness was found. Possible applications of overt visual attention in DCNNs and further implications for theories of face detection are discussed.

Abstract (translated)

URL

https://arxiv.org/abs/2206.10587

PDF

https://arxiv.org/pdf/2206.10587.pdf
Read All
Efficiency Comparison of AI classification algorithms for Image Detection and Recognition in Real-time

2022-06-12 21:31:40

Musarrat Saberin Nipun, Rejwan Bin Sulaiman, Amer Kareem

arXiv_AI

arXiv_AI Recognition Detection Classification Face Face_Detection
Abstract

Face detection and identification is the most difficult and often used task in Artificial Intelligence systems. The goal of this study is to present and compare the results of several face detection and recognition algorithms used in the system. This system begins with a training image of a human, then continues on to the test image, identifying the face, comparing it to the trained face, and finally classifying it using OpenCV classifiers. This research will discuss the most effective and successful tactics used in the system, which are implemented using Python, OpenCV, and Matplotlib. It may also be used in locations with CCTV, such as public spaces, shopping malls, and ATM booths.

Abstract (translated)

URL

https://arxiv.org/abs/2206.05842

PDF

https://arxiv.org/pdf/2206.05842.pdf
Read All
Blind Surveillance Image Quality Assessment via Deep Neural Network Combined with the Visual Saliency

2022-06-09 07:30:32

Wei Lu, Wei Sun, Wenhan Zhu, Xiongkuo Min, Zicheng Zhang, Tao Wang, Guangtao Zhai

arXiv_CV

arXiv_CV CNN Recognition Detection Face Face_Detection QA Salient Pose Surveillance
Abstract

The intelligent video surveillance system (IVSS) can automatically analyze the content of the surveillance image (SI) and reduce the burden of the manual labour. However, the SIs may suffer quality degradations in the procedure of acquisition, compression, and transmission, which makes IVSS hard to understand the content of SIs. In this paper, we first conduct an example experiment (i.e. the face detection task) to demonstrate that the quality of the SIs has a crucial impact on the performance of the IVSS, and then propose a saliency-based deep neural network for the blind quality assessment of the SIs, which helps IVSS to filter the low-quality SIs and improve the detection and recognition performance. Specifically, we first compute the saliency map of the SI to select the most salient local region since the salient regions usually contain rich semantic information for machine vision and thus have a great impact on the overall quality of the SIs. Next, the convolutional neural network (CNN) is adopted to extract quality-aware features for the whole image and local region, which are then mapped into the global and local quality scores through the fully connected (FC) network respectively. Finally, the overall quality score is computed as the weighted sum of the global and local quality scores. Experimental results on the SI quality database (SIQD) show that the proposed method outperforms all compared state-of-the-art BIQA methods.

Abstract (translated)

URL

https://arxiv.org/abs/2206.04318

PDF

https://arxiv.org/pdf/2206.04318.pdf
Read All
Analysis of face detection, face landmarking, and face recognition performance with masked face images

2022-06-03 15:16:58

Ožbej Golob

arXiv_CV

arXiv_CV Recognition Detection Object_Detection Face Face_Detection Face_Recognition
Abstract

Face recognition has become an essential task in our lives. However, the current COVID-19 pandemic has led to the widespread use of face masks. The effect of wearing face masks is currently an understudied issue. The aim of this paper is to analyze face detection, face landmarking, and face recognition performance with masked face images. HOG and CNN face detectors are used for face detection in combination with 5-point and 68-point face landmark predictors and VGG16 face recognition model is used for face recognition on masked and unmasked images. We found that the performance of face detection, face landmarking, and face recognition is negatively impacted by face masks

Abstract (translated)

URL

https://arxiv.org/abs/2207.06478

PDF

https://arxiv.org/pdf/2207.06478.pdf
Read All
Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts

2022-05-24 16:48:07

Debjoy Saha, Shravan Nayak, Timo Baumann

arXiv_CL

arXiv_CL Recognition Detection Face Face_Detection Knowledge Pose Speech
Abstract

We introduce the Merkel Podcast Corpus, an audio-visual-text corpus in German collected from 16 years of (almost) weekly Internet podcasts of former German chancellor Angela Merkel. To the best of our knowledge, this is the first single speaker corpus in the German language consisting of audio, visual and text modalities of comparable size and temporal extent. We describe the methods used with which we have collected and edited the data which involves downloading the videos, transcripts and other metadata, forced alignment, performing active speaker recognition and face detection to finally curate the single speaker dataset consisting of utterances spoken by Angela Merkel. The proposed pipeline is general and can be used to curate other datasets of similar nature, such as talk show contents. Through various statistical analyses and applications of the dataset in talking face generation and TTS, we show the utility of the dataset. We argue that it is a valuable contribution to the research community, in particular, due to its realistic and challenging material at the boundary between prepared and spontaneous speech.

Abstract (translated)

URL

https://arxiv.org/abs/2205.12194

PDF

https://arxiv.org/pdf/2205.12194.pdf
Read All
A Framework for Event-based Computer Vision on a Mobile Device

2022-05-13 18:06:20

Gregor Lenz, Serge Picaud, Sio-Hoi Ieng

arXiv_CV

arXiv_CV Recognition Detection Tracking Face Face_Detection Gesture Embedding Optical_Flow Reconstruction
Abstract

We present the first publicly available Android framework to stream data from an event camera directly to a mobile phone. Today's mobile devices handle a wider range of workloads than ever before and they incorporate a growing gamut of sensors that make devices smarter, more user friendly and secure. Conventional cameras in particular play a central role in such tasks, but they cannot record continuously, as the amount of redundant information recorded is costly to process. Bio-inspired event cameras on the other hand only record changes in a visual scene and have shown promising low-power applications that specifically suit mobile tasks such as face detection, gesture recognition or gaze tracking. Our prototype device is the first step towards embedding such an event camera into a battery-powered handheld device. The mobile framework allows us to stream events in real-time and opens up the possibilities for always-on and on-demand sensing on mobile phones. To liaise the asynchronous event camera output with synchronous von Neumann hardware, we look at how buffering events and processing them in batches can benefit mobile applications. We evaluate our framework in terms of latency and throughput and show examples of computer vision tasks that involve both event-by-event and pre-trained neural network methods for gesture recognition, aperture robust optical flow and grey-level image reconstruction from events. The code is available at this https URL

Abstract (translated)

URL

https://arxiv.org/abs/2205.06836

PDF

https://arxiv.org/pdf/2205.06836.pdf
Read All
Open-Eye: An Open Platform to Study Human Performance on Identifying AI-Synthesized Faces

2022-05-13 14:30:59

Hui Guo, Shu Hu, Xin Wang, Ming-Ching Chang, Siwei Lyu

arXiv_CV

arXiv_CV Detection Face Face_Detection
Abstract

AI-synthesized faces are visually challenging to discern from real ones. They have been used as profile images for fake social media accounts, which leads to high negative social impacts. Although progress has been made in developing automatic methods to detect AI-synthesized faces, there is no open platform to study the human performance of AI-synthesized faces detection. In this work, we develop an online platform called Open-eye to study the human performance of AI-synthesized face detection. We describe the design and workflow of the Open-eye in this paper.

Abstract (translated)

URL

https://arxiv.org/abs/2205.06680

PDF

https://arxiv.org/pdf/2205.06680.pdf
Read All

Content

Face_Detection (20)

Face_Detection

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF