Paper Reading AI Learner

S2O: Static to Openable Enhancement for Articulated 3D Objects

2024-09-27 16:34:13
Denys Iliash, Hanxiao Jiang, Yiming Zhang, Manolis Savva, Angel X. Chang

Abstract

Despite much progress in large 3D datasets there are currently few interactive 3D object datasets, and their scale is limited due to the manual effort required in their construction. We introduce the static to openable (S2O) task which creates interactive articulated 3D objects from static counterparts through openable part detection, motion prediction, and interior geometry completion. We formulate a unified framework to tackle this task, and curate a challenging dataset of openable 3D objects that serves as a test bed for systematic evaluation. Our experiments benchmark methods from prior work and simple yet effective heuristics for the S2O task. We find that turning static 3D objects into interactively openable counterparts is possible but that all methods struggle to generalize to realistic settings of the task, and we highlight promising future work directions.

Abstract (translated)

尽管在大型3D数据集方面已经取得了很大进展,但目前可用的互动3D物体数据集数量仍然很少,而且它们的规模受到手动构建所需的努力的限制。我们引入了静态到开合(S2O)任务,该任务通过开合部件检测、运动预测和内部几何完成,从静止的对应物中创建交互式可缩放3D对象。我们提出了一个统一框架来解决这个任务,并创建了一个具有挑战性的开合3D物体数据集,作为系统评估的测试平台。我们的实验基准了之前的工作方法和简单而有效的策略,用于S2O任务。我们发现,将静态3D对象转换为可交互的对应物是可能的,但所有方法都难以将它们应用到任务的现实设置中,并我们强调了有前景的未来研究方向。

URL

https://arxiv.org/abs/2409.18896

PDF

https://arxiv.org/pdf/2409.18896.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot