Paper Reading AI Learner

Inverse Kinematics for Neuro-Robotic Grasping with Humanoid Embodied Agents

2024-04-12 21:42:34
Jan-Gerrit Habekost, Connor G\"ade, Philipp Allgeuer, Stefan Wermter

Abstract

This paper introduces a novel zero-shot motion planning method that allows users to quickly design smooth robot motions in Cartesian space. A Bézier curve-based Cartesian plan is transformed into a joint space trajectory by our neuro-inspired inverse kinematics (IK) method CycleIK, for which we enable platform independence by scaling it to arbitrary robot designs. The motion planner is evaluated on the physical hardware of the two humanoid robots NICO and NICOL in a human-in-the-loop grasping scenario. Our method is deployed with an embodied agent that is a large language model (LLM) at its core. We generalize the embodied agent, that was introduced for NICOL, to also be embodied by NICO. The agent can execute a discrete set of physical actions and allows the user to verbally instruct various different robots. We contribute a grasping primitive to its action space that allows for precise manipulation of household objects. The new CycleIK method is compared to popular numerical IK solvers and state-of-the-art neural IK methods in simulation and is shown to be competitive with or outperform all evaluated methods when the algorithm runtime is very short. The grasping primitive is evaluated on both NICOL and NICO robots with a reported grasp success of 72% to 82% for each robot, respectively.

Abstract (translated)

本文提出了一种新颖的零 shot运动规划方法,允许用户在二维空间中快速设计平滑的机器人运动。通过我们基于Bézier曲线的人体启发式逆运动学(IK)方法CycleIK,将基于Bézier曲线的二维计划变换为机器人空间轨迹。该运动规划器在人类监督下的两个大型机器人NICO和NICOL上的物理硬件上进行评估。我们使用具有身体代理的自主机器人(LLM)来部署该方法。我们还将基于NICO的 embodied agent 扩展到也具有NICO 的身体代理。该代理可以执行一系列物理动作,并允许用户通过口头指令控制各种不同机器人。我们在其动作空间中添加了抓握原语,允许用户精确操作家庭用品。与流行的数值IK求解器和最先进的神经IK方法在模拟中进行了比较,并在算法运行时间非常短时,证明了该方法与所有评估方法具有竞争性或优越性。抓握原语在NICOL和NICO机器人上的报告抓握成功率在72%到82%之间。

URL

https://arxiv.org/abs/2404.08825

PDF

https://arxiv.org/pdf/2404.08825.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot