Paper Reading AI Learner

A Task-guided, Implicitly-searched and Meta-initialized Deep Model for Image Fusion

2023-05-25 08:54:08
Risheng Liu, Zhu Liu, Jinyuan Liu, Xin Fan, Zhongxuan Luo

Abstract

Image fusion plays a key role in a variety of multi-sensor-based vision systems, especially for enhancing visual quality and/or extracting aggregated features for perception. However, most existing methods just consider image fusion as an individual task, thus ignoring its underlying relationship with these downstream vision problems. Furthermore, designing proper fusion architectures often requires huge engineering labor. It also lacks mechanisms to improve the flexibility and generalization ability of current fusion approaches. To mitigate these issues, we establish a Task-guided, Implicit-searched and Meta-initialized (TIM) deep model to address the image fusion problem in a challenging real-world scenario. Specifically, we first propose a constrained strategy to incorporate information from downstream tasks to guide the unsupervised learning process of image fusion. Within this framework, we then design an implicit search scheme to automatically discover compact architectures for our fusion model with high efficiency. In addition, a pretext meta initialization technique is introduced to leverage divergence fusion data to support fast adaptation for different kinds of image fusion tasks. Qualitative and quantitative experimental results on different categories of image fusion problems and related downstream tasks (e.g., visual enhancement and semantic understanding) substantiate the flexibility and effectiveness of our TIM. The source code will be available at this https URL.

Abstract (translated)

图像融合在多种多传感器为基础的视觉系统中发挥着关键作用,特别是用于提高视觉质量和/或提取聚合特征以感知。然而,大多数现有方法只是将图像融合视为个人任务,从而忽视了它与这些后续视觉问题的潜在关系。此外,设计适当的融合架构往往需要巨大的工程劳动。它也缺乏机制来改善当前融合方法的灵活性和泛化能力。为了缓解这些问题,我们建立了一种任务引导、隐含搜索和元初始化(TIM)的深层模型,以在一个挑战性的现实世界场景中解决图像融合问题。具体来说,我们首先提出了一种约束策略,以从后续任务中引入信息,指导 unsupervised 的图像融合学习过程。在这个框架内,我们 then 设计了一种隐含搜索策略,以高效地自动发现我们的融合模型的紧凑架构。此外,我们还引入了一种基于 pretext 的元初始化技术,利用分化融合数据支持各种图像融合任务的快速适应。不同类别的图像融合问题和相关的后续任务(例如,视觉增强和语义理解)的定量和定性实验结果证实了我们的 TIM 的灵活性和有效性。源代码将在本 https URL 上提供。

URL

https://arxiv.org/abs/2305.15862

PDF

https://arxiv.org/pdf/2305.15862.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot