Paper Reading AI Learner

Facial Landmark Detection Evaluation on MOBIO Database


Abstract

MOBIO is a bi-modal database that was captured almost exclusively on mobile phones. It aims to improve research into deploying biometric techniques to mobile devices. Research has been shown that face and speaker recognition can be performed in a mobile environment. Facial landmark localization aims at finding the coordinates of a set of pre-defined key points for 2D face images. A facial landmark usually has specific semantic meaning, e.g. nose tip or eye centre, which provides rich geometric information for other face analysis tasks such as face recognition, emotion estimation and 3D face reconstruction. Pretty much facial landmark detection methods adopt still face databases, such as 300W, AFW, AFLW, or COFW, for evaluation, but seldomly use mobile data. Our work is first to perform facial landmark detection evaluation on the mobile still data, i.e., face images from MOBIO database. About 20,600 face images have been extracted from this audio-visual database and manually labeled with 22 landmarks as the groundtruth. Several state-of-the-art facial landmark detection methods are adopted to evaluate their performance on these data. The result shows that the data from MOBIO database is pretty challenging. This database can be a new challenging one for facial landmark detection evaluation.

Abstract (translated)

MOBIO是一个双模数据库,几乎被手机捕获的唯一方式。它旨在改进将生物特征技术应用于移动设备的研究。研究表明,在移动环境中进行面部和语音识别是可以实现的。面部地标定位旨在找到2D面部图像中的预先定义的关键点坐标。一个面部地标通常具有特定的语义含义,例如鼻子尖端或眼睛中心,为其他面部分析任务,如面部识别、情感估计和3D面部重建提供丰富的几何信息。几乎所有的面部地标检测方法都采用静态面部数据库,如300W、AFW、AFLW或COFW,进行评估,但很少使用移动数据。我们的工作是首先在MOBIO数据库中的移动静态数据上进行面部地标检测评估,即从MOBIO数据库中获取的面部图像。大约20,600张面部图像从该音频视频数据库中提取出来,并手动标注为基准值22个地标。采用几种最先进的面部地标检测方法来评估它们在这些数据上的性能。结果显示,MOBIO数据库中的数据相当具有挑战性。这个数据库可以成为面部地标检测评估的新挑战。

URL

https://arxiv.org/abs/2307.03329

PDF

https://arxiv.org/pdf/2307.03329.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot