Paper Reading AI Learner

Reducing Texture Bias of Deep Neural Networks via Edge Enhancing Diffusion

2024-02-14 19:10:40
Edgar Heinert, Matthias Rottmann, Kira Maag, Karsten Kahl

Abstract

Convolutional neural networks (CNNs) for image processing tend to focus on localized texture patterns, commonly referred to as texture bias. While most of the previous works in the literature focus on the task of image classification, we go beyond this and study the texture bias of CNNs in semantic segmentation. In this work, we propose to train CNNs on pre-processed images with less texture to reduce the texture bias. Therein, the challenge is to suppress image texture while preserving shape information. To this end, we utilize edge enhancing diffusion (EED), an anisotropic image diffusion method initially introduced for image compression, to create texture reduced duplicates of existing datasets. Extensive numerical studies are performed with both CNNs and vision transformer models trained on original data and EED-processed data from the Cityscapes dataset and the CARLA driving simulator. We observe strong texture-dependence of CNNs and moderate texture-dependence of transformers. Training CNNs on EED-processed images enables the models to become completely ignorant with respect to texture, demonstrating resilience with respect to texture re-introduction to any degree. Additionally we analyze the performance reduction in depth on a level of connected components in the semantic segmentation and study the influence of EED pre-processing on domain generalization as well as adversarial robustness.

Abstract (translated)

卷积神经网络(CNNs)在图像处理中通常会关注局部纹理模式,通常被称为纹理偏差。尽管文献中大多数工作都关注图像分类任务,但我们超越了这个范畴,研究了CNN在语义分割中的纹理偏差。在这项工作中,我们提出了一种在预处理图像上训练CNN以减少纹理偏差的方法。这里的挑战在于在保留形状信息的同时抑制图像纹理。为此,我们利用边缘增强扩散(EED),最初用于图像压缩,创建纹理减少的现有数据集的副本。对CNN和视觉Transformer模型在原始数据和EED处理数据上的 extensive numerical studies 进行了研究。我们观察到CNNs 的纹理依赖性很强,而Transformer的纹理依赖性较弱。在EED处理的图像上训练CNN使模型对纹理完全失聪,证明了纹理重新引入的任何程度上的弹性。此外,我们分析了语义分割中连接组件的深度下降,研究了EED预处理对领域泛化的影响以及对抗鲁棒性。

URL

https://arxiv.org/abs/2402.09530

PDF

https://arxiv.org/pdf/2402.09530.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot