Paper Reading AI Learner

Towards Affordance-Aware Articulation Synthesis for Rigged Objects

2025-01-21 18:59:59
Yu-Chu Yu, Chieh Hubert Lin, Hsin-Ying Lee, Chaoyang Wang, Yu-Chiang Frank Wang, Ming-Hsuan Yang

Abstract

Rigged objects are commonly used in artist pipelines, as they can flexibly adapt to different scenes and postures. However, articulating the rigs into realistic affordance-aware postures (e.g., following the context, respecting the physics and the personalities of the object) remains time-consuming and heavily relies on human labor from experienced artists. In this paper, we tackle the novel problem and design A3Syn. With a given context, such as the environment mesh and a text prompt of the desired posture, A3Syn synthesizes articulation parameters for arbitrary and open-domain rigged objects obtained from the Internet. The task is incredibly challenging due to the lack of training data, and we do not make any topological assumptions about the open-domain rigs. We propose using 2D inpainting diffusion model and several control techniques to synthesize in-context affordance information. Then, we develop an efficient bone correspondence alignment using a combination of differentiable rendering and semantic correspondence. A3Syn has stable convergence, completes in minutes, and synthesizes plausible affordance on different combinations of in-the-wild object rigs and scenes.

Abstract (translated)

配装物体(rigged objects)在艺术家的工作流程中经常被使用,因为它们能够灵活适应不同的场景和姿态。然而,将这些配装物调整成符合现实情景、遵守物理法则并体现对象个性的姿势仍然是一项耗时且高度依赖于有经验艺术家的人工劳动的任务。本文提出解决这一新颖问题的方法,并设计了A3Syn系统。给定一定的上下文信息(如环境网格和所需姿态的文字提示),A3Syn可以为从互联网上获取的任意开放域配装物体合成出相应的关节参数。 这项任务极具挑战性,原因在于缺乏训练数据且我们不对开放域配装物做任何拓扑假设。为此,我们提出利用二维修补扩散模型以及几种控制技术来生成符合上下文的相关信息(affordance)。接着,通过结合可微渲染和语义对应关系的方法开发了一种高效的骨骼对齐机制。 A3Syn能够稳定收敛,并在几分钟内完成任务,在不同的开放域物体配装组合及场景中都能合成出合理的相关性。

URL

https://arxiv.org/abs/2501.12393

PDF

https://arxiv.org/pdf/2501.12393.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot