Abstract
Cosine-based softmax losses significantly improve the performance of deep face recognition networks. However, these losses always include sensitive hyper-parameters which can make training process unstable, and it is very tricky to set suitable hyper parameters for a specific dataset. This paper addresses this challenge by directly designing the gradients for adaptively training deep neural networks. We first investigate and unify previous cosine softmax losses by analyzing their gradients. This unified view inspires us to propose a novel gradient called P2SGrad (Probability-to-Similarity Gradient), which leverages a cosine similarity instead of classification probability to directly update the testing metrics for updating neural network parameters. P2SGrad is adaptive and hyper-parameter free, which makes the training process more efficient and faster. We evaluate our P2SGrad on three face recognition benchmarks, LFW, MegaFace, and IJB-C. The results show that P2SGrad is stable in training, robust to noise, and achieves state-of-the-art performance on all the three benchmarks.
Abstract (translated)
基于余弦的SoftMax损失显著提高了人脸识别网络的性能。然而,这些损失总是包含敏感的超参数,这会使训练过程不稳定,并且很难为特定的数据集设置合适的超参数。本文通过直接设计自适应训练深神经网络的梯度来解决这一挑战。我们首先通过分析cosine-softmax损失的梯度来研究和统一以前的cosine-softmax损失。这种统一的观点启发我们提出一种新的梯度称为p2sgrad(概率到相似性梯度),它利用余弦相似性而不是分类概率直接更新测试指标来更新神经网络参数。P2SGRAD具有自适应性和超参数自由,使训练过程更加高效、快速。我们在三个人脸识别基准(LFW、Megaface和IJB-C)上评估了我们的P2SGRAD。结果表明,P2SGRAD在训练中稳定、抗噪声能力强,并且在所有三个基准上都达到了最先进的性能。
URL
https://arxiv.org/abs/1905.02479