Paper Reading AI Learner

MixStyle Neural Networks for Domain Generalization and Adaptation

2021-07-05 14:29:19
Kaiyang Zhou, Yongxin Yang, Yu Qiao, Tao Xiang

Abstract

Convolutional neural networks (CNNs) often have poor generalization performance under domain shift. One way to improve domain generalization is to collect diverse source data from multiple relevant domains so that a CNN model is allowed to learn more domain-invariant, and hence generalizable representations. In this work, we address domain generalization with MixStyle, a plug-and-play, parameter-free module that is simply inserted to shallow CNN layers and requires no modification to training objectives. Specifically, MixStyle probabilistically mixes feature statistics between instances. This idea is inspired by the observation that visual domains can often be characterized by image styles which are in turn encapsulated within instance-level feature statistics in shallow CNN layers. Therefore, inserting MixStyle modules in effect synthesizes novel domains albeit in an implicit way. MixStyle is not only simple and flexible, but also versatile -- it can be used for problems whereby unlabeled images are available, such as semi-supervised domain generalization and unsupervised domain adaptation, with a simple extension to mix feature statistics between labeled and pseudo-labeled instances. We demonstrate through extensive experiments that MixStyle can significantly boost the out-of-distribution generalization performance across a wide range of tasks including object recognition, instance retrieval, and reinforcement learning.

Abstract (translated)

URL

https://arxiv.org/abs/2107.02053

PDF

https://arxiv.org/pdf/2107.02053.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot