Abstract
In this work, we explore a novel task of generating human grasps based on single-view scene point clouds, which more accurately mirrors the typical real-world situation of observing objects from a single viewpoint. Due to the incompleteness of object point clouds and the presence of numerous scene points, the generated hand is prone to penetrating into the invisible parts of the object and the model is easily affected by scene points. Thus, we introduce S2HGrasp, a framework composed of two key modules: the Global Perception module that globally perceives partial object point clouds, and the DiffuGrasp module designed to generate high-quality human grasps based on complex inputs that include scene points. Additionally, we introduce S2HGD dataset, which comprises approximately 99,000 single-object single-view scene point clouds of 1,668 unique objects, each annotated with one human grasp. Our extensive experiments demonstrate that S2HGrasp can not only generate natural human grasps regardless of scene points, but also effectively prevent penetration between the hand and invisible parts of the object. Moreover, our model showcases strong generalization capability when applied to unseen objects. Our code and dataset are available at this https URL.
Abstract (translated)
在这项工作中,我们探讨了一个新颖的任务:基于单视图场景点云生成人类抓取,这更准确地反映了从单个视点观察物体的典型现实情况。由于对象点云的不完整性和场景点的存在,生成的手容易穿透对象的不可见部分,模型也容易受到场景点的干扰。因此,我们引入了S2HGrasp,一个由两个关键模块组成的框架:全局感知模块全局地感知部分对象点云,而DiffuGrasp模块旨在根据包括场景点的复杂输入生成高质量的人抓取。此外,我们还引入了S2HGD数据集,它包括大约99,000个独特的单个对象单视图场景点云,每个点云都有一个人类抓取的标注。我们丰富的实验结果表明,S2HGrasp不仅可以生成无论场景点如何自然的的人类抓取,而且还能有效防止手与物体不可见部分之间的穿透。此外,当应用于未见过的物体时,我们的模型具有很强的泛化能力。我们的代码和数据集可在https://这个网址找到。
URL
https://arxiv.org/abs/2404.15815