Paper Reading AI Learner

On the Capacity of Face Representation

2019-04-11 19:45:51
Sixue Gong, Vishnu Naresh Boddeti, Anil K. Jain

Abstract

In this paper we address the following question, given a face representation, how many identities can it resolve? In other words, what is the capacity of the face representation? A scientific basis for estimating the capacity of a given face representation will not only benefit the evaluation and comparison of different representation methods, but will also establish an upper bound on the scalability of an automatic face recognition system. We cast the face capacity problem in terms of packing bounds on a low-dimensional manifold embedded within a deep representation space. By explicitly accounting for the manifold structure of the representation as well two different sources of representational noise: epistemic (model) uncertainty and aleatoric (data) variability, our approach is able to estimate the capacity of a given face representation. To demonstrate the efficacy of our approach, we estimate the capacity of two deep neural network based face representations, namely 128-dimensional FaceNet and 512-dimensional SphereFace. Numerical experiments on unconstrained faces (IJB-C) provides a capacity upper bound of $2.7\times10^4$ for FaceNet and $8.4\times10^4$ for SphereFace representation at a false acceptance rate (FAR) of 1%. As expected, capacity reduces drastically at lower FARs. The capacity at FAR of 0.1% and 0.001% is $2.2\times10^3$ and $1.6\times10^{1}$, respectively for FaceNet and $3.6\times10^3$ and $6.0\times10^0$, respectively for SphereFace.

Abstract (translated)

在本文中,我们讨论了以下问题,给定一个面表示,它能解决多少个标识?换句话说,面部表情的能力是什么?科学地估计给定人脸表示的能力不仅有利于评价和比较不同的表示方法,而且还将为自动人脸识别系统的可扩展性建立一个上限。我们将面容量问题用深表示空间中嵌入的低维流形上的填充边界来表示。通过明确说明表示的多种结构以及两种不同的表示噪声源:认知(模型)不确定性和Aleatoric(数据)可变性,我们的方法能够估计给定面表示的能力。为了证明我们的方法的有效性,我们估计了两个基于深度神经网络的人脸表示,即128维facenet和512维sphereface的能力。对非约束面(ijb-c)进行的数值实验提供了一个容量上限:facenet为$2.7 ime10^4$和sphereFace表示为$8.4 ime10^4%,错误接受率(far)为1%。正如预期的那样,容量在较低的FAR时急剧下降。FaceNet的容量分别为2.2美元、10^3美元和1.6美元、10^ 1美元,SphereFace的容量分别为3.6美元、10^3美元和6.0美元、10^0美元,分别为0.1%和0.001%。

URL

https://arxiv.org/abs/1709.10433

PDF

https://arxiv.org/pdf/1709.10433.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot