Abstract
Incorporating human-perceptual intelligence into model training has shown to increase the generalization capability of models in several difficult biometric tasks, such as presentation attack detection (PAD) and detection of synthetic samples. After the initial collection phase, human visual saliency (e.g., eye-tracking data, or handwritten annotations) can be integrated into model training through attention mechanisms, augmented training samples, or through human perception-related components of loss functions. Despite their successes, a vital, but seemingly neglected, aspect of any saliency-based training is the level of salience granularity (e.g., bounding boxes, single saliency maps, or saliency aggregated from multiple subjects) necessary to find a balance between reaping the full benefits of human saliency and the cost of its collection. In this paper, we explore several different levels of salience granularity and demonstrate that increased generalization capabilities of PAD and synthetic face detection can be achieved by using simple yet effective saliency post-processing techniques across several different CNNs.
Abstract (translated)
将人类感知智能融入模型训练已经在多项困难的生物特征任务中增加了模型的泛化能力,例如展示攻击检测(PAD)和合成样本检测。在初始收集阶段,人类视觉突出(例如,眼跟踪数据或手写注释)可以通过关注机制、增强训练样本或通过损失函数中的人感知相关组件进行整合。尽管它们取得了成功,但任何基于突显的训练中一个似乎被忽视的重要方面是所需的突显粒度水平(例如,边界框、单个突显图或来自多个对象的突显聚合)。在本文中,我们探讨了几个不同的突显粒度水平,并证明了通过使用简单而有效的突显后处理技术,可以在多个不同的卷积神经网络中实现PAD和合成样本检测的泛化能力的提高。
URL
https://arxiv.org/abs/2405.00650