Abstract
In the field of image fusion, promising progress has been made by modeling data from different modalities as linear subspaces. However, in practice, the source images are often located in a non-Euclidean space, where the Euclidean methods usually cannot encapsulate the intrinsic topological structure. Typically, the inner product performed in the Euclidean space calculates the algebraic similarity rather than the semantic similarity, which results in undesired attention output and a decrease in fusion performance. While the balance of low-level details and high-level semantics should be considered in infrared and visible image fusion task. To address this issue, in this paper, we propose a novel attention mechanism based on Grassmann manifold for infrared and visible image fusion (GrFormer). Specifically, our method constructs a low-rank subspace mapping through projection constraints on the Grassmann manifold, compressing attention features into subspaces of varying rank levels. This forces the features to decouple into high-frequency details (local low-rank) and low-frequency semantics (global low-rank), thereby achieving multi-scale semantic fusion. Additionally, to effectively integrate the significant information, we develop a cross-modal fusion strategy (CMS) based on a covariance mask to maximise the complementary properties between different modalities and to suppress the features with high correlation, which are deemed redundant. The experimental results demonstrate that our network outperforms SOTA methods both qualitatively and quantitatively on multiple image fusion benchmarks. The codes are available at this https URL.
Abstract (translated)
在图像融合领域,通过将不同模式的数据建模为线性子空间的方法取得了显著进展。然而,在实践中,源图像通常位于非欧几里得空间中,其中欧氏方法通常无法捕捉到内在的拓扑结构。典型地,在欧几里得空间中执行的内积计算的是代数相似度而非语义相似度,这导致了不希望有的注意力输出以及融合性能降低的问题。同时,在红外与可见光图像融合任务中应该考虑到低级细节和高级语义之间的平衡。 为了解决这个问题,本文提出了一种基于Grassmann流形的新型注意机制(称为GrFormer),用于红外和可见光图像融合。具体来说,我们的方法通过在Grassmann流形上的投影约束来构建低秩子空间映射,将注意力特征压缩到不同等级的子空间中。这迫使特性分解为高频细节(局部低秩)和低频语义(全局低秩),从而实现多尺度语义融合。此外,为了有效地整合重要信息,我们开发了一种基于协方差掩码的跨模态融合策略(CMS),以最大化不同模式之间的互补属性,并抑制高度相关的特性,这些特性的冗余被认为可以被剔除。 实验结果表明,在多个图像融合基准上,我们的网络在定性和定量评估中都优于现有最先进的方法。代码可在提供的链接处获取。
URL
https://arxiv.org/abs/2506.14384