Paper Reading AI Learner

FullFormer: Generating Shapes Inside Shapes

2023-03-20 16:19:23
Tejaswini Medi, Jawad Tayyub, Muhammad Sarmad, Frank Lindseth, Margret Keuper

Abstract

Implicit generative models have been widely employed to model 3D data and have recently proven to be successful in encoding and generating high-quality 3D shapes. This work builds upon these models and alleviates current limitations by presenting the first implicit generative model that facilitates the generation of complex 3D shapes with rich internal geometric details. To achieve this, our model uses unsigned distance fields to represent nested 3D surfaces allowing learning from non-watertight mesh data. We propose a transformer-based autoregressive model for 3D shape generation that leverages context-rich tokens from vector quantized shape embeddings. The generated tokens are decoded into an unsigned distance field which is rendered into a novel 3D shape exhibiting a rich internal structure. We demonstrate that our model achieves state-of-the-art point cloud generation results on popular classes of 'Cars', 'Planes', and 'Chairs' of the ShapeNet dataset. Additionally, we curate a dataset that exclusively comprises shapes with realistic internal details from the `Cars' class of ShapeNet and demonstrate our method's efficacy in generating these shapes with internal geometry.

Abstract (translated)

隐式生成模型被广泛应用于建模三维数据,并且最近成功在编码和生成高质量的三维形状方面取得了成功。这项工作基于这些模型,并减轻当前限制,通过呈现第一个隐式生成模型,促进了生成复杂具有丰富内部几何细节的三维形状。为了实现这一点,我们的模型使用无符号距离场表示嵌套的三维表面,从矢量化形状嵌入中借用丰富的上下文代币。我们提出了一个基于Transformer的自回归模型,用于生成三维形状,该模型利用向量量化形状嵌入中的上下文代币。生成的代币被解码为无符号距离场,并将其渲染为具有丰富内部结构的新的三维形状。我们证明了我们的模型在 ShapeNet 数据集上实现最先进的点云生成结果,其中“汽车”、“飞机”和“椅子”流行类的元素仅包含 ShapeNet `Cars` 类中的具有实际内部细节的形状。此外,我们创建了一个仅包含 ShapeNet `Cars` 类中具有实际内部细节的形状的单独的数据集,并证明了我们的方法在生成具有内部几何形状的形状方面的有效性。

URL

https://arxiv.org/abs/2303.11235

PDF

https://arxiv.org/pdf/2303.11235.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot