Paper Reading AI Learner

PATS: Patch Area Transportation with Subdivision for Local Feature Matching

2023-03-14 08:28:36
Junjie Ni, Yijin Li, Zhaoyang Huang, Hongsheng Li, Hujun Bao, Zhaopeng Cui, Guofeng Zhang

Abstract

Local feature matching aims at establishing sparse correspondences between a pair of images. Recently, detectorfree methods present generally better performance but are not satisfactory in image pairs with large scale differences. In this paper, we propose Patch Area Transportation with Subdivision (PATS) to tackle this issue. Instead of building an expensive image pyramid, we start by splitting the original image pair into equal-sized patches and gradually resizing and subdividing them into smaller patches with the same scale. However, estimating scale differences between these patches is non-trivial since the scale differences are determined by both relative camera poses and scene structures, and thus spatially varying over image pairs. Moreover, it is hard to obtain the ground truth for real scenes. To this end, we propose patch area transportation, which enables learning scale differences in a self-supervised manner. In contrast to bipartite graph matching, which only handles one-to-one matching, our patch area transportation can deal with many-to-many relationships. PATS improves both matching accuracy and coverage, and shows superior performance in downstream tasks, such as relative pose estimation, visual localization, and optical flow estimation. The source code will be released to benefit the community.

Abstract (translated)

局部特征匹配旨在建立两图像之间的稀疏对应关系。近年来,无检测算法方法通常表现更好,但在图像 pairs 中存在大规模差异时并不令人满意。在本文中,我们提出了 patch 区域传输(PATS)方法来解决这个问题。我们不再需要建造昂贵的图像金字塔,而是从原始图像对中切分成相等大小的区块,并逐渐缩小并分割成相同的小区块,每个区块的大小相同。但是,对这些区块之间尺度差异的估计是一项艰巨的任务,因为尺度差异是由相对相机姿态和场景结构决定的,因此它们在图像对中的空间变化。此外,对于真实的场景,很难获得初始 truth。因此,我们提出了 patch 区域传输方法,它可以通过自我监督的方式学习尺度差异。与只会处理一对一匹配的二分图匹配不同,我们的 patch 区域传输可以处理多对多的关系。PATS 改善了匹配精度和覆盖范围,并在后续任务,如相对姿态估计、视觉定位和光学流估计中表现出更好的性能。源代码将用于促进社区。

URL

https://arxiv.org/abs/2303.07700

PDF

https://arxiv.org/pdf/2303.07700.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot