Teaching AI to Teach: Leveraging Limited Human Salience Data Into Unlimited Saliency-Based Training

Abstract
Abstract (translated)
URL
PDF

Abstract

Machine learning models have shown increased accuracy in classification tasks when the training process incorporates human perceptual information. However, a challenge in training human-guided models is the cost associated with collecting image annotations for human salience. Collecting annotation data for all images in a large training set can be prohibitively expensive. In this work, we utilize ''teacher'' models (trained on a small amount of human-annotated data) to annotate additional data by means of teacher models' saliency maps. Then, ''student'' models are trained using the larger amount of annotated training data. This approach makes it possible to supplement a limited number of human-supplied annotations with an arbitrarily large number of model-generated image annotations. We compare the accuracy achieved by our teacher-student training paradigm with (1) training using all available human salience annotations, and (2) using all available training data without human salience annotations. We use synthetic face detection and fake iris detection as example challenging problems, and report results across four model architectures (DenseNet, ResNet, Xception, and Inception), and two saliency estimation methods (CAM and RISE). Results show that our teacher-student training paradigm results in models that significantly exceed the performance of both baselines, demonstrating that our approach can usefully leverage a small amount of human annotations to generate salience maps for an arbitrary amount of additional training data.

Abstract (translated)

在训练人类引导模型的过程中,将训练过程包括人类感知信息会提高分类任务的准确率。然而,训练人类引导模型的挑战在于收集用于人类显意识的图像注释数据的成本。收集整个训练集所有图像的注释数据可能会非常昂贵。在本工作中,我们利用训练少量的人类注释数据的人像注释模型来使用人像注释模型的显意识映射来注释更多的数据。然后,我们使用大量的注释训练数据训练学生模型。这种方法可以使以有限的人类提供注释和大量的模型生成图像注释相结合。我们比较了我们的师生训练范式的准确率与(1)使用所有可用的人类显意识注释进行训练,以及(2)使用所有可用的训练数据但没有人类显意识注释。我们使用合成人脸检测和假iris检测作为挑战性问题,并报告了四种模型架构(DenseNet、ResNet、Xception和Inception)和两个显意识估计方法(CAM和rise)的结果。结果表明,我们的师生训练范式的结果模型显著超过两个基准表现,这表明我们的方法可以利用少量的人类注释生成大量的显意识映射,为任意数量额外的训练数据生成显意识地图。

URL

https://arxiv.org/abs/2306.05527

PDF

https://arxiv.org/pdf/2306.05527.pdf

Teaching AI to Teach: Leveraging Limited Human Salience Data Into Unlimited Saliency-Based Training

Abstract

Abstract (translated)

URL

PDF Copy

PDF