Paper Reading AI Learner

HDDGAN: A Heterogeneous Dual-Discriminator Generative Adversarial Network for Infrared and Visible Image Fusion

2024-04-24 17:06:52
Guosheng Lu, Zile Fang, Chunming He, Zhigang Zhao

Abstract

Infrared and visible image fusion (IVIF) aims to preserve thermal radiation information from infrared images while integrating texture details from visible images, enabling the capture of important features and hidden details of subjects in complex scenes and disturbed environments. Consequently, IVIF offers distinct advantages in practical applications such as video surveillance, night navigation, and target recognition. However, prevailing methods often face challenges in simultaneously capturing thermal region features and detailed information due to the disparate characteristics of infrared and visible images. Consequently, fusion outcomes frequently entail a compromise between thermal target area information and texture details. In this study, we introduce a novel heterogeneous dual-discriminator generative adversarial network (HDDGAN) to address this issue. Specifically, the generator is structured as a multi-scale skip-connected structure, facilitating the extraction of essential features from different source images. To enhance the information representation ability of the fusion result, an attention mechanism is employed to construct the information fusion layer within the generator, leveraging the disparities between the source images. Moreover, recognizing the distinct learning requirements of information in infrared and visible images, we design two discriminators with differing structures. This approach aims to guide the model to learn salient information from infrared images while simultaneously capturing detailed information from visible images. Extensive experiments conducted on various public datasets demonstrate the superiority of our proposed HDDGAN over other state-of-the-art (SOTA) algorithms, highlighting its enhanced potential for practical applications.

Abstract (translated)

红外和可见图像融合(IVIF)旨在保留红外图像的热辐射信息,同时整合可见图像的纹理细节,从而使捕捉复杂场景和受干扰环境中的主题重要特征和隐藏细节成为可能。因此,在实际应用中,例如视频监控、夜间导航和目标识别,IVIF具有显著的优势。然而,由于红外和可见图像的差异特征,现有的方法在同时捕捉热区域特征和详细信息方面常常面临挑战。因此,融合结果通常需要在热目标区域信息与纹理细节之间做出权衡。在这项研究中,我们引入了一种新颖的异质双判别器生成对抗网络(HDDGAN)来解决这一问题。具体来说,生成器采用多尺度跳转连接结构,促进从不同源图像中提取关键特征。为了增强融合结果的信息表示能力,采用关注机制在生成器中构建信息融合层,利用源图像之间的差异。此外,考虑到红外和可见图像之间的不同学习需求,我们设计了两部分结构不同的判别器。这种方法旨在指导模型从红外图像中学习显著信息,同时从可见图像中捕捉详细信息。在各种公开数据集上进行的大量实验证明,与最先进的(SOTA)算法相比,我们提出的HDDGAN具有卓越的实用性能,强调了其在实际应用中的潜在优势。

URL

https://arxiv.org/abs/2404.15992

PDF

https://arxiv.org/pdf/2404.15992.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot