Paper Reading AI Learner

XFeat: Accelerated Features for Lightweight Image Matching

2024-04-30 00:37:55
Guilherme Potje, Felipe Cadar, Andre Araujo, Renato Martins, Erickson R. Nascimento

Abstract

We introduce a lightweight and accurate architecture for resource-efficient visual correspondence. Our method, dubbed XFeat (Accelerated Features), revisits fundamental design choices in convolutional neural networks for detecting, extracting, and matching local features. Our new model satisfies a critical need for fast and robust algorithms suitable to resource-limited devices. In particular, accurate image matching requires sufficiently large image resolutions - for this reason, we keep the resolution as large as possible while limiting the number of channels in the network. Besides, our model is designed to offer the choice of matching at the sparse or semi-dense levels, each of which may be more suitable for different downstream applications, such as visual navigation and augmented reality. Our model is the first to offer semi-dense matching efficiently, leveraging a novel match refinement module that relies on coarse local descriptors. XFeat is versatile and hardware-independent, surpassing current deep learning-based local features in speed (up to 5x faster) with comparable or better accuracy, proven in pose estimation and visual localization. We showcase it running in real-time on an inexpensive laptop CPU without specialized hardware optimizations. Code and weights are available at this http URL.

Abstract (translated)

我们提出了一个轻量级且准确的资源高效的视觉对应架构。我们的方法被称为XFeat(加速特征),它重新审视了卷积神经网络中用于检测、提取和匹配局部特征的基本设计选择。我们的新模型满足对于资源受限设备快速且鲁棒算法的关键需求。特别是,准确的图像匹配需要足够大的图像分辨率 - 因此,我们在网络中限制通道数量,同时尽可能地保持分辨率。此外,我们的模型还设计为在稀疏或半稀疏级别提供匹配选择,每种选择都可能更适合不同的下游应用,例如视觉导航和增强现实。我们的模型是第一个提供半稀疏匹配的,它依赖于新颖的匹配平滑模块。XFeat具有多才性和硬件无关性,在速度(最高可达5倍)和精度上超过了当前基于深度学习的局部特征,已经在姿态估计和视觉局部定位中得到证明。我们展示它在一台廉价的笔记本电脑CPU上的实时运行,没有专门的硬件优化。代码和权重可以从该网站的URL获取。

URL

https://arxiv.org/abs/2404.19174

PDF

https://arxiv.org/pdf/2404.19174.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot