Paper Reading AI Learner

Texture Learning Domain Randomization for Domain Generalized Segmentation

2023-03-21 02:23:26
Sunghwan Kim, Dae-hwan Kim, Hoseong Kim


Deep Neural Networks (DNNs)-based semantic segmentation models trained on a source domain often struggle to generalize to unseen target domains, i.e., a domain gap problem. Texture often contributes to the domain gap, making DNNs vulnerable to domain shift because they are prone to be texture-biased. Existing Domain Generalized Semantic Segmentation (DGSS) methods have alleviated the domain gap problem by guiding models to prioritize shape over texture. On the other hand, shape and texture are two prominent and complementary cues in semantic segmentation. This paper argues that leveraging texture is crucial for improving performance in DGSS. Specifically, we propose a novel framework, coined Texture Learning Domain Randomization (TLDR). TLDR includes two novel losses to effectively enhance texture learning in DGSS: (1) a texture regularization loss to prevent overfitting to source domain textures by using texture features from an ImageNet pre-trained model and (2) a texture generalization loss that utilizes random style images to learn diverse texture representations in a self-supervised manner. Extensive experimental results demonstrate the superiority of the proposed TLDR; e.g., TLDR achieves 46.5 mIoU on GTA-to-Cityscapes using ResNet-50, which improves the prior state-of-the-art method by 1.9 mIoU.

Abstract (translated)

基于深度神经网络(DNN)的语义分割模型在训练源domain时往往难以泛化到未知的目标domain,即存在域差问题。纹理常常导致域差,使DNN容易受到域移的影响,因为它们通常倾向于受到纹理偏见。现有的源普遍语义分割(DGSS)方法已经通过指导模型将形状优先级高于纹理来缓解域差问题,通过引导模型将形状优先级高于纹理,可以消除域差问题。另一方面,形状和纹理是语义分割中的两个主要且互补的线索。本文认为利用纹理是改善DGSS性能的关键。具体而言,我们提出了一种新的框架,称为纹理学习域随机化(TLDR),它包括两个新的损失,以有效地增强DGSS中的纹理学习:(1)纹理正则化损失,以避免对源domain的纹理进行过度拟合,使用图像Net上预训练的纹理特征,(2)纹理泛化损失,利用随机风格图像以自监督方式学习多种纹理表示。广泛的实验结果证明了所提出的TLDR的优越性,例如,TLDR使用ResNet-50在GTA到城市景观的语义分割任务中取得了46.5 mIoU,比先前的方法提高了1.9 mIoU。



3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot