Paper Reading AI Learner

Effective Guidance for Model Attention with Simple Yes-no Annotations

2024-10-29 17:53:33
Seongmin Lee (Polo), Ali Payani (Polo), Duen Horng (Polo), Chau

Abstract

Modern deep learning models often make predictions by focusing on irrelevant areas, leading to biased performance and limited generalization. Existing methods aimed at rectifying model attention require explicit labels for irrelevant areas or complex pixel-wise ground truth attention maps. We present CRAYON (Correcting Reasoning with Annotations of Yes Or No), offering effective, scalable, and practical solutions to rectify model attention using simple yes-no annotations. CRAYON empowers classical and modern model interpretation techniques to identify and guide model reasoning: CRAYON-ATTENTION directs classic interpretations based on saliency maps to focus on relevant image regions, while CRAYON-PRUNING removes irrelevant neurons identified by modern concept-based methods to mitigate their influence. Through extensive experiments with both quantitative and human evaluation, we showcase CRAYON's effectiveness, scalability, and practicality in refining model attention. CRAYON achieves state-of-the-art performance, outperforming 12 methods across 3 benchmark datasets, surpassing approaches that require more complex annotations.

Abstract (translated)

现代深度学习模型常常通过关注不相关的区域来进行预测,导致性能偏差和泛化能力有限。现有的旨在纠正模型注意力的方法需要明确标记出不相关区域或复杂的像素级地面真实注意图。我们提出了CRAYON(利用是或否的注释来纠正推理),提供了一种有效、可扩展且实用的解决方案,使用简单的“是”或“否”的标注来矫正模型注意力。CRAYON增强了经典和现代模型解释技术的能力,以识别并引导模型推理:CRAYON-ATTENTION基于显著性图的经典解释方法将注意力集中在相关图像区域上,而CRAYON-PRUNING则通过现代概念导向的方法移除被识别为不相关的神经元,从而减轻其影响。通过广泛的定量和人工评估实验,我们展示了CRAYON在优化模型注意力方面的有效性、可扩展性和实用性。CRAYON实现了最先进的性能,在3个基准数据集上超越了12种方法,并优于需要更复杂注释的方法。

URL

https://arxiv.org/abs/2410.22312

PDF

https://arxiv.org/pdf/2410.22312.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot