Abstract
Robotic grasping is a fundamental ability for a robot to interact with the environment. Current methods focus on how to obtain a stable and reliable grasping pose in object wise, while little work has been studied on part (shape)-wise grasping which is related to fine-grained grasping and robotic affordance. Parts can be seen as atomic elements to compose an object, which contains rich semantic knowledge and a strong correlation with affordance. However, lacking a large part-wise 3D robotic dataset limits the development of part representation learning and downstream application. In this paper, we propose a new large Language-guided SHape grAsPing datasEt (named Lang-SHAPE) to learn 3D part-wise affordance and grasping ability. We design a novel two-stage fine-grained robotic grasping network (named PIONEER), including a novel 3D part language grounding model, and a part-aware grasp pose detection model. To evaluate the effectiveness, we perform multi-level difficulty part language grounding grasping experiments and deploy our proposed model on a real robot. Results show our method achieves satisfactory performance and efficiency in reference identification, affordance inference, and 3D part-aware grasping. Our dataset and code are available on our project website this https URL
Abstract (translated)
机器人的抓取是机器人与周围环境交互的基本能力。目前的方法侧重于如何从对象层面获得稳定和可靠的抓取姿态,而与精细抓取和机器人价值相关的部分(形状)层面的抓取研究较少。部分可以被视为构成物体的原子元素,其中包含丰富的语义知识,并且与价值之间存在强烈的相关性。然而,缺乏大量的部分(形状)层面的三维机器人数据集限制了部分表示学习和后续应用的发展。在本文中,我们提出了一种新的大型语言引导的SHAPE数据集(称为 Lang-SHAPE),以学习三维部分价值与抓取能力。我们设计了一个 novel 2级精细机器人抓取网络(称为 PIONEER),其中包括一个 novel 3D 部分语言grounding模型,以及一个部分aware抓取姿态检测模型。为了评估效果,我们进行了多难层级的部分语言grounding抓取实验,并将我们提出的模型应用于一个实际机器人。结果表明,我们的方法在参考识别、价值推断和三维部分aware抓取方面取得了令人满意的表现和效率。我们的数据集和代码可在我们的官方网站 https 可用。
URL
https://arxiv.org/abs/2301.11564