Paper Reading AI Learner

A HVS-inspired Attention Map to Improve CNN-based Perceptual Losses for Image Restoration

2019-03-30 12:14:50
Taimoor Tariq, Juan Luis Gonzalez, Munchurl Kim

Abstract

Deep Convolutional Neural Network (CNN) features have been demonstrated to be effective perceptual quality features. The perceptual loss, based on feature maps of pre-trained CNN's has proven to be remarkably effective for CNN based perceptual image restoration problems. In this work, taking inspiration from the the Human Visual System (HVS) and our visual perception, we propose a spatial attention mechanism based on the dependency human contrast sensitivity on spatial frequency. We identify regions in input images, based on underlying spatial frequency where the visual system might be most sensitive to distortions. Based on this prior, we design an attention map that is applied to feature maps in the perceptual loss, helping it to identify regions that are of more perceptual importance. The results will demonstrate that the proposed technique helps improving the correlation of the perceptual loss with human subjective assessment of perceptual quality and also results in a loss which delivers a better perception-distortion trade-off compared to the widely used perceptual loss in CNN based image restoration problems.

Abstract (translated)

深层卷积神经网络(CNN)特征已被证明是有效的感知质量特征。基于预先训练过的CNN特征图的感知损失已经证明对基于CNN的感知图像恢复问题是非常有效的。在这项工作中,我们从人类视觉系统(HVS)和我们的视觉感知中得到启发,提出了一种基于人类对比敏感度对空间频率依赖性的空间注意力机制。我们根据视觉系统对畸变最敏感的潜在空间频率识别输入图像中的区域。在此基础上,我们设计了一个注意力图,将注意力图应用于感知损失中的特征图,帮助它识别出更具感知重要性的区域。研究结果表明,该方法有助于改善知觉损失与人类知觉质量主观评估的相关性,并且与CNN图像恢复问题中广泛使用的知觉损失相比,该方法能带来更好的知觉失真权衡。

URL

https://arxiv.org/abs/1904.00205

PDF

https://arxiv.org/pdf/1904.00205.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot