Abstract
Despite advancements in cross-domain image translation, challenges persist in asymmetric tasks such as SAR-to-Optical and Sketch-to-Instance conversions, which involve transforming data from a less detailed domain into one with richer content. Traditional CNN-based methods are effective at capturing fine details but struggle with global structure, leading to unwanted merging of image regions. To address this, we propose the CNN-Swin Hybrid Network (CSHNet), which combines two key modules: Swin Embedded CNN (SEC) and CNN Embedded Swin (CES), forming the SEC-CES-Bottleneck (SCB). SEC leverages CNN's detailed feature extraction while integrating the Swin Transformer's structural bias. CES, in turn, preserves the Swin Transformer's global integrity, compensating for CNN's lack of focus on structure. Additionally, CSHNet includes two components designed to enhance cross-domain information retention: the Interactive Guided Connection (IGC), which enables dynamic information exchange between SEC and CES, and Adaptive Edge Perception Loss (AEPL), which maintains structural boundaries during translation. Experimental results show that CSHNet outperforms existing methods in both visual quality and performance metrics across scene-level and instance-level datasets. Our code is available at: this https URL.
Abstract (translated)
尽管跨领域图像转换技术取得了进展,但在诸如SAR到光学和草图到实例的不对称任务中仍存在挑战。这些任务涉及将数据从细节较少的领域转换为内容更丰富的领域。传统基于CNN的方法在捕捉细微之处方面非常有效,但它们难以处理全局结构,导致图像区域意外合并的问题。 为了应对这一挑战,我们提出了CNN-Swin混合网络(CSHNet),该网络结合了两个关键模块:Swin嵌入式CNN (SEC) 和 CNN嵌入式Swin (CES),形成了 SEC-CES-瓶颈 (SCB) 结构。SEC利用CNN的详细特征提取能力,同时整合了Swin变换器的结构偏置。CES则保留了Swin变换器对全局完整性的关注,并弥补了CNN在结构方面注意力不足的问题。 此外,CSHNet还包括两个旨在增强跨域信息保持的组件:互动引导连接(IGC),它使SEC和CES之间能够进行动态的信息交换;以及自适应边缘感知损失(AEPL),该机制确保在转换过程中维持结构性边界。 实验结果显示,与现有方法相比,CSHNet在场景级和实例级数据集中的视觉质量和性能指标上均表现出色。我们的代码可在此处获取:[提供链接]。
URL
https://arxiv.org/abs/2501.10197