Paper Reading AI Learner

MHLR: Moving Haar Learning Rate Scheduler for Large-scale Face Recognition Training with One GPU

2024-04-17 07:06:22
Xueyuan Gong, Yain-whar Si, Zheng Zhang, Xiaochen Yuan, Ke Wang, Xinyuan Zhang, Cong Lin, Xiaoxiang Liu

Abstract

Face recognition (FR) has seen significant advancements due to the utilization of large-scale datasets. Training deep FR models on large-scale datasets with multiple GPUs is now a common practice. In fact, computing power has evolved into a foundational and indispensable resource in the area of deep learning. It is nearly impossible to train a deep FR model without holding adequate hardware resources. Recognizing this challenge, some FR approaches have started exploring ways to reduce the time complexity of the fully-connected layer in FR models. Unlike other approaches, this paper introduces a simple yet highly effective approach, Moving Haar Learning Rate (MHLR) scheduler, for scheduling the learning rate promptly and accurately in the training process. MHLR supports large-scale FR training with only one GPU, which is able to accelerate the model to 1/4 of its original training time without sacrificing more than 1% accuracy. More specifically, MHLR only needs $30$ hours to train the model ResNet100 on the dataset WebFace12M containing more than 12M face images with 0.6M identities. Extensive experiments validate the efficiency and effectiveness of MHLR.

Abstract (translated)

面部识别(FR)在利用大规模数据集的情况下取得了显著的进步。现在,在大型数据集上使用多个GPU训练深度FR模型是一种常见的做法。事实上,计算能力在深度学习领域已经发展成为一个基本和不可或缺的资源。如果没有足够的硬件资源,训练深度FR模型几乎是不可能的。认识到这个挑战,一些FR方法已经开始探索如何减少FR模型中全连接层的时间复杂度。与其他方法不同,本文介绍了一种简单而高效的方法——移动 Haar 学习率(MHLR)调度程序,用于在训练过程中及时、准确地降低学习率。MHLR支持使用单个GPU进行大规模FR训练,能够在不牺牲超过1%的准确度的情况下将模型加速到原训练时间的1/4。具体来说,MHLR只需要30个小时来训练包含12M个面部图像且每个图像有0.6M个唯一身份的WebFace12M数据集的ResNet100模型。大量实验证实了MHLR的高效和有效性。

URL

https://arxiv.org/abs/2404.11118

PDF

https://arxiv.org/pdf/2404.11118.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot