Paper Reading AI Learner

Uni-Fusion: Universal Continuous Mapping

2023-03-22 16:21:44
Yijun Yuan, Andreas Nuechter

Abstract

We introduce Uni-Fusion, an universal continuous mapping framework for surfaces, surface properties (color, infrared, etc.) and more (latent features in CLIP embedding space, etc.). We propose the first Universal Implicit Encoding model that supports encoding of both geometry and various types of properties (RGB, infrared, feature and etc.) without the need for any training. Based on that, our framework divides the point cloud into regular grid voxels and produces a latent feature in each voxel to form a Latent Implicit Map (LIM) for geometries and arbitrary properties. Then, by fusing a Local LIM of new frame to Global LIM, an incremental reconstruction is approached. Encoded with corresponding types of data, our Latent Implicit Map is capable to generate continuous surfaces, surface properties fields, surface feature fields and any other possible options. To demonstrate the capabilities of our model, we implement three applications: (1) incremental reconstruction for surfaces and color (2) 2D-to-3D fabricated properties transfers (3) open-vocabulary scene understanding by producing a text CLIP feature field on surfaces. We evaluate Uni-Fusion by comparing in corresponding applications, from which, Uni-Fusion shows high flexibility to various of application while performing best or competitive. The project page of Uni-Fusion is available at this https URL

Abstract (translated)

我们引入了 Uni-Fusion,一个适用于表面、表面属性(颜色、红外等)以及更多的 universal 连续映射框架。我们提出了第一个 universal implicit 编码模型,该模型无需任何训练即可支持几何体和任意属性的编码(如 RGB、红外、特征等)。基于该模型,我们将其点云按 regular grid voxels 分割成单个的隐式映射(LIM)单元,并在每个 voxel 中产生隐式特征,以形成几何体和任意属性的隐式映射(LIM)。然后,通过将新帧的 local LIM 与 global LIM 融合,增量重建被 approached。与相应的数据编码,我们的隐式 implicit 映射可以生成连续的表面、表面属性场、表面特征场和任何其他可能的选择。为了展示我们模型的能力,我们实现了三个应用:(1)增量重建用于表面和颜色;(2)2D 到 3D 制造属性转移;(3)通过在表面上生成文本 CLIP 特征场,实现开放词汇场景理解。我们比较了相应的应用,Uni-Fusion 在表现最佳或竞争环境中表现出高灵活性。Uni-Fusion 项目的页面在此 https URL 上可用。

URL

https://arxiv.org/abs/2303.12678

PDF

https://arxiv.org/pdf/2303.12678.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot