Considerations on the Evaluation of Biometric Quality Assessment Algorithms

Abstract
Abstract (translated)
URL
PDF

Abstract

Quality assessment algorithms can be used to estimate the utility of a biometric sample for the purpose of biometric recognition. "Error versus Discard Characteristic" (EDC) plots, and "partial Area Under Curve" (pAUC) values of curves therein, are generally used by researchers to evaluate the predictive performance of such quality assessment algorithms. An EDC curve depends on an error type such as the "False Non Match Rate" (FNMR), a quality assessment algorithm, a biometric recognition system, a set of comparisons each corresponding to a biometric sample pair, and a comparison score threshold corresponding to a starting error. To compute an EDC curve, comparisons are progressively discarded based on the associated samples' lowest quality scores, and the error is computed for the remaining comparisons. Additionally, a discard fraction limit or range must be selected to compute pAUC values, which can then be used to quantitatively rank quality assessment algorithms. This paper discusses and analyses various details for this kind of quality assessment algorithm evaluation, including general EDC properties, interpretability improvements for pAUC values based on a hard lower error limit and a soft upper error limit, the use of relative instead of discrete rankings, stepwise vs. linear curve interpolation, and normalisation of quality scores to a [0, 100] integer range. We also analyse the stability of quantitative quality assessment algorithm rankings based on pAUC values across varying pAUC discard fraction limits and starting errors, concluding that higher pAUC discard fraction limits should be preferred. The analyses are conducted both with synthetic data and with real data for a face image quality assessment scenario, with a focus on general modality-independent conclusions for EDC evaluations.

Abstract (translated)

质量评估算法可以用来估计一个生物特征样本用于生物特征识别的有用性。 "错误-排除特征" (EDC) 绘图和曲线中的 "部分平均面积" (pAUC) 值通常被研究人员用于评估这些质量评估算法的预测性能。 EDC 曲线取决于错误类型，例如 "False 不匹配率" (FNMR), 质量评估算法，生物特征识别系统，一组对应于生物特征样本对的比较，以及一个比较分数阈值对应于起始错误。为了计算 EDC 曲线，比较是按相关样本的最低质量分数逐步排除的，而错误是计算剩余的比较。此外，必须选择计算 pAUC 值的范围来计算 pAUC 值，然后用于定量评估质量评估算法。本 paper 讨论和分析了评估这种质量评估算法评估的各种细节，包括 general EDC 特性，基于 hard 的低错误限制和 soft 的高等错误限制的 pAUC 值的解释性改进，使用相对排名而不是离散排名， stepwise 和线性曲线插值，以及质量分数的归一化到 [0, 100] 整数范围内。我们还分析基于 pAUC 值在不同 pAUC 丢弃 fraction 限制和起始错误的稳定性，得出结论较高 pAUC 丢弃 fraction 限制应该优先考虑。分析使用了合成数据和真实数据的一个面部图像质量评估场景，专注于 EDC 评估的一般模式无关结论。

URL

https://arxiv.org/abs/2303.13294

PDF

https://arxiv.org/pdf/2303.13294.pdf