Paper Reading AI Learner

NAP: Neural 3D Articulation Prior

2023-05-25 17:59:35
Jiahui Lei, Congyue Deng, Bokui Shen, Leonidas Guibas, Kostas Daniilidis

Abstract

We propose Neural 3D Articulation Prior (NAP), the first 3D deep generative model to synthesize 3D articulated object models. Despite the extensive research on generating 3D objects, compositions, or scenes, there remains a lack of focus on capturing the distribution of articulated objects, a common object category for human and robot interaction. To generate articulated objects, we first design a novel articulation tree/graph parameterization and then apply a diffusion-denoising probabilistic model over this representation where articulated objects can be generated via denoising from random complete graphs. In order to capture both the geometry and the motion structure whose distribution will affect each other, we design a graph-attention denoising network for learning the reverse diffusion process. We propose a novel distance that adapts widely used 3D generation metrics to our novel task to evaluate generation quality, and experiments demonstrate our high performance in articulated object generation. We also demonstrate several conditioned generation applications, including Part2Motion, PartNet-Imagination, Motion2Part, and GAPart2Object.

Abstract (translated)

我们提出了神经网络3D关节构造前奏(NAP),这是合成3D关节对象模型的第一种3D深度生成模型。尽管研究了生成3D物体、组合或场景的广泛研究,但仍缺乏关注捕捉关节对象分布的重点,这是人类和机器人交互的常见对象类别。生成关节对象,我们首先设计了一个 novel 关节树/图参数化,然后应用一个扩散除噪的probabilistic模型,在这个表示上,可以从随机完整图生成关节对象。为了捕捉 both the geometry 和运动结构,Whose distribution will affect each other,我们设计了图注意力除噪网络,以学习逆扩散过程。我们提出了一种新的距离,该距离适应广泛使用的3D生成度量任务,以评估生成质量,并实验表明我们在关节对象生成方面表现出高性能。我们还展示了多个条件生成应用,包括Part2Motion、PartNet-Imagination、Motion2Part和 GAPart2Object。

URL

https://arxiv.org/abs/2305.16315

PDF

https://arxiv.org/pdf/2305.16315.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot