Paper Reading AI Learner

Teaching AI to Teach: Leveraging Limited Human Salience Data Into Unlimited Saliency-Based Training

2023-06-08 19:55:44
Colton R. Crum, Aidan Boyd, Kevin Bowyer, Adam Czajka

Abstract

Machine learning models have shown increased accuracy in classification tasks when the training process incorporates human perceptual information. However, a challenge in training human-guided models is the cost associated with collecting image annotations for human salience. Collecting annotation data for all images in a large training set can be prohibitively expensive. In this work, we utilize ''teacher'' models (trained on a small amount of human-annotated data) to annotate additional data by means of teacher models' saliency maps. Then, ''student'' models are trained using the larger amount of annotated training data. This approach makes it possible to supplement a limited number of human-supplied annotations with an arbitrarily large number of model-generated image annotations. We compare the accuracy achieved by our teacher-student training paradigm with (1) training using all available human salience annotations, and (2) using all available training data without human salience annotations. We use synthetic face detection and fake iris detection as example challenging problems, and report results across four model architectures (DenseNet, ResNet, Xception, and Inception), and two saliency estimation methods (CAM and RISE). Results show that our teacher-student training paradigm results in models that significantly exceed the performance of both baselines, demonstrating that our approach can usefully leverage a small amount of human annotations to generate salience maps for an arbitrary amount of additional training data.

Abstract (translated)

在训练人类引导模型的过程中,将训练过程包括人类感知信息会提高分类任务的准确率。然而,训练人类引导模型的挑战在于收集用于人类显意识的图像注释数据的成本。收集整个训练集所有图像的注释数据可能会非常昂贵。在本工作中,我们利用训练少量的人类注释数据的人像注释模型来使用人像注释模型的显意识映射来注释更多的数据。然后,我们使用大量的注释训练数据训练学生模型。这种方法可以使以有限的人类提供注释和大量的模型生成图像注释相结合。我们比较了我们的师生训练范式的准确率与(1)使用所有可用的人类显意识注释进行训练,以及(2)使用所有可用的训练数据但没有人类显意识注释。我们使用合成人脸检测和假iris检测作为挑战性问题,并报告了四种模型架构(DenseNet、ResNet、Xception和Inception)和两个显意识估计方法(CAM和rise)的结果。结果表明,我们的师生训练范式的结果模型显著超过两个基准表现,这表明我们的方法可以利用少量的人类注释生成大量的显意识映射,为任意数量额外的训练数据生成显意识地图。

URL

https://arxiv.org/abs/2306.05527

PDF

https://arxiv.org/pdf/2306.05527.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot