Paper Reading AI Learner

Hybrid scene Compression for Visual Localization

2018-07-19 16:04:58
Federico Camposeco, Andrea Cohen, Marc Pollefeys, Torsten Sattler

Abstract

Localizing an image wrt. a large scale 3D scene represents a core task for many computer vision applications. The increasing size of available 3D scenes makes visual localization prohibitively slow for real-time applications due to the large amount of data that the system needs to analyze and store. Therefore, compression becomes a necessary step in order to manage large scenes. In this work, we introduce a new hybrid compression algorithm that selects two subsets of points from the original 3D model: a small set of points with full appearance information, and an additional, larger set of points with compressed information. Our algorithm takes into account both spatial coverage as well as appearance uniqueness during compression. Quantization techniques are exploited during compression time, reducing run-time wrt. previous compression methods. A RANSAC variant tailored to our specific compression output is also introduced. Experiments on six large-scale datasets show that our method performs better than previous compression techniques in terms of memory, run-time and accuracy. Furthermore, the localization rates and pose accuracy obtained are comparable to state-of-the-art feature-based methods, while using a small fraction of the memory.

Abstract (translated)

本地化图像wrt。大规模3D场景代表了许多计算机视觉应用的核心任务。由于系统需要分析和存储大量数据,可用3D场景的大小越来越大,使得实时应用程序的视觉本地化速度极慢。因此,压缩成为管理大型场景的必要步骤。在这项工作中,我们引入了一种新的混合压缩算法,该算法从原始3D模型中选择两个点子集:一组具有完整外观信息的点,以及一组具有压缩信息的附加的更大点。我们的算法考虑了空间覆盖以及压缩期间的外观唯一性。在压缩时间期间利用量化技术,减少运行时间。以前的压缩方法。还引入了针对我们的特定压缩输出量身定制的RANSAC变体。六个大型数据集的实验表明,我们的方法在内存,运行时间和准确性方面比以前的压缩技术表现更好。此外,所获得的定位率和姿势精度与现有技术中基于特征的方法相当,同时使用一小部分内存。

URL

https://arxiv.org/abs/1807.07512

PDF

https://arxiv.org/pdf/1807.07512.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot