Abstract
Dense image correspondence is central to many applications, such as visual odometry, 3D reconstruction, object association, and re-identification. Historically, dense correspondence has been tackled separately for wide-baseline scenarios and optical flow estimation, despite the common goal of matching content between two images. In this paper, we develop a Unified Flow & Matching model (UFM), which is trained on unified data for pixels that are co-visible in both source and target images. UFM uses a simple, generic transformer architecture that directly regresses the (u,v) flow. It is easier to train and more accurate for large flows compared to the typical coarse-to-fine cost volumes in prior work. UFM is 28% more accurate than state-of-the-art flow methods (Unimatch), while also having 62% less error and 6.7x faster than dense wide-baseline matchers (RoMa). UFM is the first to demonstrate that unified training can outperform specialized approaches across both domains. This result enables fast, general-purpose correspondence and opens new directions for multi-modal, long-range, and real-time correspondence tasks.
Abstract (translated)
密集图像对应关系在许多应用中至关重要,例如视觉测距、三维重建、物体关联和重新识别。历史上,对于宽基线场景下的稠密对应关系与光流估计问题通常是分开处理的,尽管它们都旨在匹配两张图片之间的内容。在这篇文章中,我们开发了一种统一流动及匹配模型(UFM),该模型在源图像和目标图像中共视像素上进行统一流数据训练。UFM使用了一个简单的、通用的变压器架构,直接回归(u,v)流。相比先前工作中的典型粗到细成本体积方法,UFM更容易训练,并且对于大范围的流动更准确。 与最先进的流方法(Unimatch)相比,UFM精确度高出28%;同时,与稠密宽基线匹配器(RoMa)相比,它的误差减少了62%,运行速度提高了6.7倍。UFM首次证明了统一训练方法可以在两个领域内超越专业化的方法。这一成果使得快速、通用的对应关系成为可能,并为跨模态、长距离和实时对应的任务开辟了新的方向。
URL
https://arxiv.org/abs/2506.09278