Abstract
In this work, we propose a novel discriminative framework for dexterous grasp generation, named Dexterous Grasp TRansformer (DGTR), capable of predicting a diverse set of feasible grasp poses by processing the object point cloud with only one forward pass. We formulate dexterous grasp generation as a set prediction task and design a transformer-based grasping model for it. However, we identify that this set prediction paradigm encounters several optimization challenges in the field of dexterous grasping and results in restricted performance. To address these issues, we propose progressive strategies for both the training and testing phases. First, the dynamic-static matching training (DSMT) strategy is presented to enhance the optimization stability during the training phase. Second, we introduce the adversarial-balanced test-time adaptation (AB-TTA) with a pair of adversarial losses to improve grasping quality during the testing phase. Experimental results on the DexGraspNet dataset demonstrate the capability of DGTR to predict dexterous grasp poses with both high quality and diversity. Notably, while keeping high quality, the diversity of grasp poses predicted by DGTR significantly outperforms previous works in multiple metrics without any data pre-processing. Codes are available at this https URL .
Abstract (translated)
在这项工作中,我们提出了一个名为Dexterous Grasp TRansformer(DGTR)的新颖的抓取生成框架,能够通过仅一次前向传递处理物体点云来预测多样的一组抓取姿势。我们将抓取生成定义为预测任务,并为此设计了一个基于Transformer的抓取模型。然而,我们发现这种集预测范式在抓取领域遇到了几个优化挑战,导致性能受限。为了应对这些问题,我们在训练和测试阶段都提出了渐进的策略。首先,我们引入了动态静态匹配训练(DSMT)策略来提高训练阶段的优化稳定性。其次,我们引入了一对对抗性损失的 adversarial-balanced 测试时间适应(AB-TTA)策略来提高测试阶段的抓取质量。 DexGraspNet 数据集上的实验结果表明,DGTR 具有预测高质和多样抓取姿势的能力。值得注意的是,尽管保持高质量,DGTR 预测的抓取姿势多样性在多个指标上显著超过了之前的工作,而无需进行数据预处理。代码可在此处访问:https:// this URL.
URL
https://arxiv.org/abs/2404.18135