Paper Reading AI Learner

Switchable Whitening for Deep Representation Learning

2019-04-22 06:22:55
Xingang Pan, Xiaohang Zhan, Jianping Shi, Xiaoou Tang, Ping Luo

Abstract

Normalization methods are essential components in convolutional neural networks (CNNs). They either standardize or whiten data using statistics estimated in predefined sets of pixels. Unlike existing works that design normalization techniques for specific tasks, we propose Switchable Whitening (SW), which provides a general form unifying different whitening methods as well as standardization methods. SW learns to switch among these operations in an end-to-end manner. It has several advantages. First, SW adaptively selects appropriate whitening or standardization statistics for different tasks (see Fig.1), making it well suited for a wide range of tasks without manual design. Second, by integrating benefits of different normalizers, SW shows consistent improvements over its counterparts in various challenging benchmarks. Third, SW serves as a useful tool for understanding the characteristics of whitening and standardization techniques. We show that SW outperforms other alternatives on image classification (CIFAR-10/100, ImageNet), semantic segmentation (ADE20K, Cityscapes), domain adaptation (GTA5, Cityscapes), and image style transfer (COCO). For example, without bells and whistles, we achieve state-of-the-art performance with 45.33% mIoU on the ADE20K dataset. Code and models will be released.

Abstract (translated)

归一化方法是卷积神经网络的重要组成部分。它们要么标准化数据,要么使用预先定义的像素集估计的统计数据来增白数据。与现有的为特定任务设计标准化技术的工作不同,我们提出了可切换美白(sw),它提供了统一不同美白方法和标准化方法的通用形式。软件学习以端到端的方式在这些操作之间切换。它有几个优点。首先,软件自适应地为不同的任务选择适当的增白或标准化统计数据(见图1),使其非常适合无需手动设计的各种任务。第二,通过整合不同规格化器的优点,软件在各种具有挑战性的基准测试中显示出与对应标准相比的一致性改进。第三,软件是了解美白和标准化技术特点的有用工具。我们发现,sw在图像分类(cifar-10/100,imagenet)、语义分割(ade20k,cityscapes)、域适应(gta5,cityscapes)和图像样式转换(coco)方面优于其他替代方案。例如,没有铃声和口哨,我们在ADE20K数据集上使用45.33%的MIOU实现了最先进的性能。将发布代码和模型。

URL

https://arxiv.org/abs/1904.09739

PDF

https://arxiv.org/pdf/1904.09739.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot