GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts

2022-11-10 00:30:22

Haoran Geng, Helin Xu, Chengyang Zhao, Chao Xu, Li Yi, Siyuan Huang, He Wang

arXiv_CV

arXiv_CV Segmentation Adversarial Pose_Estimation Pose Action 3D

Abstract
Abstract (translated)
URL
PDF

Abstract

Perceiving and manipulating objects in a generalizable way has been actively studied by the computer vision and robotics communities, where cross-category generalizable manipulation skills are highly desired yet underexplored. In this work, we propose to learn such generalizable perception and manipulation via Generalizable and Actionable Parts (GAParts). By identifying and defining 9 GAPart classes (e.g. buttons, handles, etc), we show that our part-centric approach allows our method to learn object perception and manipulation skills from seen object categories and directly generalize to unseen categories. Following the GAPart definition, we construct a large-scale part-centric interactive dataset, GAPartNet, where rich, part-level annotations (semantics, poses) are provided for 1166 objects and 8489 part instances. Based on GAPartNet, we investigate three cross-category tasks: part segmentation, part pose estimation, and part-based object manipulation. Given the large domain gaps between seen and unseen object categories, we propose a strong 3D segmentation method from the perspective of domain generalization by integrating adversarial learning techniques. Our method outperforms all existing methods by a large margin, no matter on seen or unseen categories. Furthermore, with part segmentation and pose estimation results, we leverage the GAPart pose definition to design part-based manipulation heuristics that can generalize well to unseen object categories in both simulation and real world. The dataset and code will be released.

Abstract (translated)

URL

https://arxiv.org/abs/2211.05272

PDF

https://arxiv.org/pdf/2211.05272.pdf