Paper Reading AI Learner

T2TD: Text-3D Generation Model based on Prior Knowledge Guidance

2023-05-25 06:05:52
Weizhi Nie, Ruidong Chen, Weijie Wang, Bruno Lepri, Nicu Sebe

Abstract

In recent years, 3D models have been utilized in many applications, such as auto-driver, 3D reconstruction, VR, and AR. However, the scarcity of 3D model data does not meet its practical demands. Thus, generating high-quality 3D models efficiently from textual descriptions is a promising but challenging way to solve this problem. In this paper, inspired by the ability of human beings to complement visual information details from ambiguous descriptions based on their own experience, we propose a novel text-3D generation model (T2TD), which introduces the related shapes or textual information as the prior knowledge to improve the performance of the 3D generation model. In this process, we first introduce the text-3D knowledge graph to save the relationship between 3D models and textual semantic information, which can provide the related shapes to guide the target 3D model generation. Second, we integrate an effective causal inference model to select useful feature information from these related shapes, which removes the unrelated shape information and only maintains feature information that is strongly relevant to the textual description. Meanwhile, to effectively integrate multi-modal prior knowledge into textual information, we adopt a novel multi-layer transformer structure to progressively fuse related shape and textual information, which can effectively compensate for the lack of structural information in the text and enhance the final performance of the 3D generation model. The final experimental results demonstrate that our approach significantly improves 3D model generation quality and outperforms the SOTA methods on the text2shape datasets.

Abstract (translated)

近年来,三维模型被广泛应用于许多应用,例如自动驾驶、三维重建、虚拟现实和增强现实等。然而,三维模型数据的稀缺性并没有满足其实际需求。因此,从文本描述中生成高质量三维模型是一个有前途但具有挑战性的方法来解决这个问题。在本文中,基于人类从不确定描述中补充视觉信息细节的能力,我们提出了一种新的文本-三维生成模型(T2TD),该模型引入了相关的形状或文本信息作为先验知识,以提高三维生成模型的性能。在这个过程中,我们首先介绍了文本-三维知识图,以保存三维模型和文本语义信息之间的关系,可以提供相关的形状来指导目标三维模型生成。其次,我们集成了有效的因果推断模型,从这些相关的形状中选择有用的特征信息,删除了不相关的形状信息,仅保留与文本描述密切相关的特征信息。同时,为了有效地将多模态先验知识集成到文本信息中,我们采用了一种新的多层Transformer结构,逐步融合相关的形状和文本信息,可以 effectively弥补文本中的结构信息缺失,并提高三维生成模型的最终性能。最终的实验结果显示,我们的方法显著提高了三维模型生成质量,在文本2shape数据集上优于最先进的方法。

URL

https://arxiv.org/abs/2305.15753

PDF

https://arxiv.org/pdf/2305.15753.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot