Paper Reading AI Learner

CYBORG: Blending Human Saliency Into the Loss Improves Deep Learning

2021-12-01 18:04:15
Aidan Boyd, Patrick Tinsley, Kevin Bowyer, Adam Czajka

Abstract

Can deep learning models achieve greater generalization if their training is guided by reference to human perceptual abilities? And how can we implement this in a practical manner? This paper proposes a first-ever training strategy to ConveY Brain Oversight to Raise Generalization (CYBORG). This new training approach incorporates human-annotated saliency maps into a CYBORG loss function that guides the model towards learning features from image regions that humans find salient when solving a given visual task. The Class Activation Mapping (CAM) mechanism is used to probe the model's current saliency in each training batch, juxtapose model saliency with human saliency, and penalize the model for large differences. Results on the task of synthetic face detection show that the CYBORG loss leads to significant improvement in performance on unseen samples consisting of face images generated from six Generative Adversarial Networks (GANs) across multiple classification network architectures. We also show that scaling to even seven times as much training data with standard loss cannot beat the accuracy of CYBORG loss. As a side effect, we observed that the addition of explicit region annotation to the task of synthetic face detection increased human classification performance. This work opens a new area of research on how to incorporate human visual saliency into loss functions. All data, code and pre-trained models used in this work are offered with this paper.

Abstract (translated)

URL

https://arxiv.org/abs/2112.00686

PDF

https://arxiv.org/pdf/2112.00686.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot