Paper Reading AI Learner

An Efficient Training Approach for Very Large Scale Face Recognition

2021-05-21 14:34:00
Kai Wang, Shuo Wang, Zhipeng Zhou, Xiaobo Wang, Xiaojiang Peng, Baigui Sun, Hao Li, Yang You

Abstract

Face recognition has achieved significant progress in deep-learning era due to the ultra-large-scale and well-labeled datasets. However, training on ultra-large-scale datasets is time-consuming and takes up a lot of hardware resource. Therefore, how to design an appropriate training approach is very crucial and indispensable. The computational and hardware cost of training ultra-large-scale datasets mainly focuses on the Fully-Connected (FC) layer rather than convolutional layers. To this end, we propose a novel training approach for ultra-large-scale face datasets, termed Faster Face Classification (F$^2$C). In F$^2$C, we first define a Gallery Net and a Probe Net that are used to generate identities' centers and extract faces' features for face recognition, respectively. Gallery Net has the same structure as Probe Net and inherits the parameters from Probe Net with a moving average paradigm. After that, to reduce the training time and hardware resource occupancy of the FC layer, we propose the Dynamic Class Pool that stores the features from Gallery Net and calculates the inner product (logits) with positive samples (its identities appear in Dynamic Class Pool) in each mini-batch. Dynamic Class Pool can be regarded as a substitute for the FC layer and its size is much smaller than FC, which is the reason why Dynamic Class Pool can largely reduce the time and resource cost. For negative samples (its identities are not appear in the Dynamic Class Pool), we minimize the cosine similarities between negative samples and Dynamic Class Pool. Then, to improve the update efficiency and speed of Dynamic Class Pool's parameters, we design the Dual Loaders including Identity-based and Instance-based Loaders. Dual Loaders load images from given dataset by instances and identities to generate batches for training.

Abstract (translated)

URL

https://arxiv.org/abs/2105.10375

PDF

https://arxiv.org/pdf/2105.10375.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot