Paper Reading AI Learner

Learning 6-DoF Fine-grained Grasp Detection Based on Part Affordance Grounding

2023-01-27 07:00:54
Yaoxian Song, Penglei Sun, Yi Ren, Yu Zheng, Yue Zhang

Abstract

Robotic grasping is a fundamental ability for a robot to interact with the environment. Current methods focus on how to obtain a stable and reliable grasping pose in object wise, while little work has been studied on part (shape)-wise grasping which is related to fine-grained grasping and robotic affordance. Parts can be seen as atomic elements to compose an object, which contains rich semantic knowledge and a strong correlation with affordance. However, lacking a large part-wise 3D robotic dataset limits the development of part representation learning and downstream application. In this paper, we propose a new large Language-guided SHape grAsPing datasEt (named Lang-SHAPE) to learn 3D part-wise affordance and grasping ability. We design a novel two-stage fine-grained robotic grasping network (named PIONEER), including a novel 3D part language grounding model, and a part-aware grasp pose detection model. To evaluate the effectiveness, we perform multi-level difficulty part language grounding grasping experiments and deploy our proposed model on a real robot. Results show our method achieves satisfactory performance and efficiency in reference identification, affordance inference, and 3D part-aware grasping. Our dataset and code are available on our project website this https URL

Abstract (translated)

机器人的抓取是机器人与周围环境交互的基本能力。目前的方法侧重于如何从对象层面获得稳定和可靠的抓取姿态,而与精细抓取和机器人价值相关的部分(形状)层面的抓取研究较少。部分可以被视为构成物体的原子元素,其中包含丰富的语义知识,并且与价值之间存在强烈的相关性。然而,缺乏大量的部分(形状)层面的三维机器人数据集限制了部分表示学习和后续应用的发展。在本文中,我们提出了一种新的大型语言引导的SHAPE数据集(称为 Lang-SHAPE),以学习三维部分价值与抓取能力。我们设计了一个 novel 2级精细机器人抓取网络(称为 PIONEER),其中包括一个 novel 3D 部分语言grounding模型,以及一个部分aware抓取姿态检测模型。为了评估效果,我们进行了多难层级的部分语言grounding抓取实验,并将我们提出的模型应用于一个实际机器人。结果表明,我们的方法在参考识别、价值推断和三维部分aware抓取方面取得了令人满意的表现和效率。我们的数据集和代码可在我们的官方网站 https 可用。

URL

https://arxiv.org/abs/2301.11564

PDF

https://arxiv.org/pdf/2301.11564.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot