Abstract
In recent years, modern techniques in deep learning and large-scale datasets have led to impressive progress in 3D instance segmentation, grasp pose estimation, and robotics. This allows for accurate detection directly in 3D scenes, object- and environment-aware grasp prediction, as well as robust and repeatable robotic manipulation. This work aims to integrate these recent methods into a comprehensive framework for robotic interaction and manipulation in human-centric environments. Specifically, we leverage 3D reconstructions from a commodity 3D scanner for open-vocabulary instance segmentation, alongside grasp pose estimation, to demonstrate dynamic picking of objects, and opening of drawers. We show the performance and robustness of our model in two sets of real-world experiments including dynamic object retrieval and drawer opening, reporting a 51% and 82% success rate respectively. Code of our framework as well as videos are available on: this https URL.
Abstract (translated)
近年来,深度学习和大规模数据集技术在3D实例分割、抓持姿态估计和机器人领域取得了令人瞩目的进展。这使得可以在直接在3D场景中准确检测物体的准确度,实现物体和环境感知的抓取预测,以及稳健且可重复的机器人操作。本文旨在将这些最近的方法整合到一个以人为中心的环境中机器人交互和操作的全面框架中。具体来说,我们利用商品3D扫描器的3D重构进行开放式词汇实例分割,并辅以抓持姿态估计,展示了动态抓取物体和打开抽屉。我们在包括动态物体检索和抽屉打开的两种真实世界实验中分别评估了我们模型的性能和鲁棒性,报告的成功率分别为51%和82%。我们框架的代码和视频可在此处访问:https:// this URL。
URL
https://arxiv.org/abs/2404.12440