Paper Reading AI Learner

Enhance Visual Recognition under Adverse Conditions via Deep Networks

2019-04-02 23:46:02
Ding Liu, Bowen Cheng, Zhangyang Wang, Haichao Zhang, Thomas S. Huang

Abstract

Visual recognition under adverse conditions is a very important and challenging problem of high practical value, due to the ubiquitous existence of quality distortions during image acquisition, transmission, or storage. While deep neural networks have been extensively exploited in the techniques of low-quality image restoration and high-quality image recognition tasks respectively, few studies have been done on the important problem of recognition from very low-quality images. This paper proposes a deep learning based framework for improving the performance of image and video recognition models under adverse conditions, using robust adverse pre-training or its aggressive variant. The robust adverse pre-training algorithms leverage the power of pre-training and generalizes conventional unsupervised pre-training and data augmentation methods. We further develop a transfer learning approach to cope with real-world datasets of unknown adverse conditions. The proposed framework is comprehensively evaluated on a number of image and video recognition benchmarks, and obtains significant performance improvements under various single or mixed adverse conditions. Our visualization and analysis further add to the explainability of results.

Abstract (translated)

不利条件下的视觉识别是一个非常重要和具有挑战性的问题,具有很高的实用价值,因为图像采集、传输或存储过程中普遍存在质量失真。深神经网络在低质量图像恢复技术和高质量图像识别任务中得到了广泛的应用,但对极低质量图像识别的重要问题的研究却很少。本文提出了一种基于深度学习的框架,利用鲁棒的逆向预训练或其攻击性变体,在不利条件下提高图像和视频识别模型的性能。强大的逆向预训练算法利用了预训练的力量,推广了传统的无监督预训练和数据增强方法。我们进一步开发了一种转移学习方法来处理未知不利条件的真实数据集。该框架在多个图像和视频识别基准上进行了综合评估,在各种单一或混合不利条件下获得了显著的性能改进。我们的可视化和分析进一步增加了结果的可解释性。

URL

https://arxiv.org/abs/1712.07732

PDF

https://arxiv.org/pdf/1712.07732.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot