Paper Reading AI Learner

CSHNet: A Novel Information Asymmetric Image Translation Method

2025-01-17 13:44:54
Xi Yang, Haoyuan Shi, Zihan Wang, Nannan Wang, Xinbo Gao

Abstract

Despite advancements in cross-domain image translation, challenges persist in asymmetric tasks such as SAR-to-Optical and Sketch-to-Instance conversions, which involve transforming data from a less detailed domain into one with richer content. Traditional CNN-based methods are effective at capturing fine details but struggle with global structure, leading to unwanted merging of image regions. To address this, we propose the CNN-Swin Hybrid Network (CSHNet), which combines two key modules: Swin Embedded CNN (SEC) and CNN Embedded Swin (CES), forming the SEC-CES-Bottleneck (SCB). SEC leverages CNN's detailed feature extraction while integrating the Swin Transformer's structural bias. CES, in turn, preserves the Swin Transformer's global integrity, compensating for CNN's lack of focus on structure. Additionally, CSHNet includes two components designed to enhance cross-domain information retention: the Interactive Guided Connection (IGC), which enables dynamic information exchange between SEC and CES, and Adaptive Edge Perception Loss (AEPL), which maintains structural boundaries during translation. Experimental results show that CSHNet outperforms existing methods in both visual quality and performance metrics across scene-level and instance-level datasets. Our code is available at: this https URL.

Abstract (translated)

尽管跨领域图像转换技术取得了进展,但在诸如SAR到光学和草图到实例的不对称任务中仍存在挑战。这些任务涉及将数据从细节较少的领域转换为内容更丰富的领域。传统基于CNN的方法在捕捉细微之处方面非常有效,但它们难以处理全局结构,导致图像区域意外合并的问题。 为了应对这一挑战,我们提出了CNN-Swin混合网络(CSHNet),该网络结合了两个关键模块:Swin嵌入式CNN (SEC) 和 CNN嵌入式Swin (CES),形成了 SEC-CES-瓶颈 (SCB) 结构。SEC利用CNN的详细特征提取能力,同时整合了Swin变换器的结构偏置。CES则保留了Swin变换器对全局完整性的关注,并弥补了CNN在结构方面注意力不足的问题。 此外,CSHNet还包括两个旨在增强跨域信息保持的组件:互动引导连接(IGC),它使SEC和CES之间能够进行动态的信息交换;以及自适应边缘感知损失(AEPL),该机制确保在转换过程中维持结构性边界。 实验结果显示,与现有方法相比,CSHNet在场景级和实例级数据集中的视觉质量和性能指标上均表现出色。我们的代码可在此处获取:[提供链接]。

URL

https://arxiv.org/abs/2501.10197

PDF

https://arxiv.org/pdf/2501.10197.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot