Abstract
Image fusion plays a key role in a variety of multi-sensor-based vision systems, especially for enhancing visual quality and/or extracting aggregated features for perception. However, most existing methods just consider image fusion as an individual task, thus ignoring its underlying relationship with these downstream vision problems. Furthermore, designing proper fusion architectures often requires huge engineering labor. It also lacks mechanisms to improve the flexibility and generalization ability of current fusion approaches. To mitigate these issues, we establish a Task-guided, Implicit-searched and Meta-initialized (TIM) deep model to address the image fusion problem in a challenging real-world scenario. Specifically, we first propose a constrained strategy to incorporate information from downstream tasks to guide the unsupervised learning process of image fusion. Within this framework, we then design an implicit search scheme to automatically discover compact architectures for our fusion model with high efficiency. In addition, a pretext meta initialization technique is introduced to leverage divergence fusion data to support fast adaptation for different kinds of image fusion tasks. Qualitative and quantitative experimental results on different categories of image fusion problems and related downstream tasks (e.g., visual enhancement and semantic understanding) substantiate the flexibility and effectiveness of our TIM. The source code will be available at this https URL.
Abstract (translated)
图像融合在多种多传感器为基础的视觉系统中发挥着关键作用,特别是用于提高视觉质量和/或提取聚合特征以感知。然而,大多数现有方法只是将图像融合视为个人任务,从而忽视了它与这些后续视觉问题的潜在关系。此外,设计适当的融合架构往往需要巨大的工程劳动。它也缺乏机制来改善当前融合方法的灵活性和泛化能力。为了缓解这些问题,我们建立了一种任务引导、隐含搜索和元初始化(TIM)的深层模型,以在一个挑战性的现实世界场景中解决图像融合问题。具体来说,我们首先提出了一种约束策略,以从后续任务中引入信息,指导 unsupervised 的图像融合学习过程。在这个框架内,我们 then 设计了一种隐含搜索策略,以高效地自动发现我们的融合模型的紧凑架构。此外,我们还引入了一种基于 pretext 的元初始化技术,利用分化融合数据支持各种图像融合任务的快速适应。不同类别的图像融合问题和相关的后续任务(例如,视觉增强和语义理解)的定量和定性实验结果证实了我们的 TIM 的灵活性和有效性。源代码将在本 https URL 上提供。
URL
https://arxiv.org/abs/2305.15862