Paper Reading AI Learner

ChangeViT: Unleashing Plain Vision Transformers for Change Detection

2024-06-18 17:59:08
Duowang Zhu, Xiaohu Huang, Haiyan Huang, Zhenfeng Shao, Qimin Cheng

Abstract

Change detection in remote sensing images is essential for tracking environmental changes on the Earth's surface. Despite the success of vision transformers (ViTs) as backbones in numerous computer vision applications, they remain underutilized in change detection, where convolutional neural networks (CNNs) continue to dominate due to their powerful feature extraction capabilities. In this paper, our study uncovers ViTs' unique advantage in discerning large-scale changes, a capability where CNNs fall short. Capitalizing on this insight, we introduce ChangeViT, a framework that adopts a plain ViT backbone to enhance the performance of large-scale changes. This framework is supplemented by a detail-capture module that generates detailed spatial features and a feature injector that efficiently integrates fine-grained spatial information into high-level semantic learning. The feature integration ensures that ChangeViT excels in both detecting large-scale changes and capturing fine-grained details, providing comprehensive change detection across diverse scales. Without bells and whistles, ChangeViT achieves state-of-the-art performance on three popular high-resolution datasets (i.e., LEVIR-CD, WHU-CD, and CLCD) and one low-resolution dataset (i.e., OSCD), which underscores the unleashed potential of plain ViTs for change detection. Furthermore, thorough quantitative and qualitative analyses validate the efficacy of the introduced modules, solidifying the effectiveness of our approach. The source code is available at this https URL.

Abstract (translated)

远程 sensing图像中的变化检测对于在地球表面跟踪环境变化至关重要。尽管在计算机视觉应用中视觉Transformer(ViT)作为后端的成功已经不言而喻,但在变化检测中,由于卷积神经网络(CNN)具有强大的特征提取能力,它们仍然没有被充分利用。在这篇论文中,我们的研究揭示了ViT在辨别大规模变化方面独特的优势,而CNN在这些方面则显得不足。借此机会,我们引入了ChangeViT框架,该框架采用一个简单的ViT后端来增强大规模变化检测的性能。此外,还加入了一个详细捕捉模块,用于生成详细的空间特征,以及一个特征注入器,用于将细粒度空间信息有效地整合到高级语义学习中。特征整合确保了ChangeViT在检测大规模变化和捕捉细粒度细节方面都表现出色,实现了不同尺度全面的变检测。没有花哨的装饰,ChangeViT在三个流行的高分辨率数据集(即LEVIR-CD、WHU-CD和LCLD)和一个低分辨率数据集(即OSCD)上的表现达到最先进水平,这表明简单的ViT在变检测方面具有很大的潜力。此外,定量和定性分析证实了引入的模块的有效性,巩固了我们的方法的有效性。源代码可在此链接处获取:https://github.com/your_username/ChangeViT

URL

https://arxiv.org/abs/2406.12847

PDF

https://arxiv.org/pdf/2406.12847.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot