Paper Reading AI Learner

SALAD: Part-Level Latent Diffusion for 3D Shape Generation and Manipulation

2023-03-21 23:43:58
Juil Koo, Seungwoo Yoo, Minh Hieu Nguyen, Minhyuk Sung

Abstract

We present a cascaded diffusion model based on a part-level implicit 3D representation. Our model achieves state-of-the-art generation quality and also enables part-level shape editing and manipulation without any additional training in conditional setup. Diffusion models have demonstrated impressive capabilities in data generation as well as zero-shot completion and editing via a guided reverse process. Recent research on 3D diffusion models has focused on improving their generation capabilities with various data representations, while the absence of structural information has limited their capability in completion and editing tasks. We thus propose our novel diffusion model using a part-level implicit representation. To effectively learn diffusion with high-dimensional embedding vectors of parts, we propose a cascaded framework, learning diffusion first on a low-dimensional subspace encoding extrinsic parameters of parts and then on the other high-dimensional subspace encoding intrinsic attributes. In the experiments, we demonstrate the outperformance of our method compared with the previous ones both in generation and part-level completion and manipulation tasks.

Abstract (translated)

我们提出了基于零件级别的隐含三维表示的级联扩散模型。我们的模型实现了最先进的生成质量,并在条件设置下无需额外的训练即可实现零件级别的形状编辑和操纵。扩散模型通过引导逆过程展示了在数据生成和零次完成和编辑任务方面令人印象深刻的能力。最近的3D扩散模型研究主要关注通过多种数据表示来提高生成能力,而缺乏结构信息则限制了完成和编辑任务的能力。因此我们提出了我们的新型扩散模型,使用零件级别的高维嵌入向量来学习扩散。为了有效地学习由零件级别的高维嵌入向量学习的扩散,我们提出了级联框架。我们首先学习零件外部参数的低维子空间,然后学习另一个高维子空间以学习内部属性。在实验中,我们证明了我们方法相比之前方法在生成和零件级别完成和操纵任务方面的表现优异。

URL

https://arxiv.org/abs/2303.12236

PDF

https://arxiv.org/pdf/2303.12236.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot