Abstract
Bimanual manipulation is imperative yet challenging for robots to execute complex tasks, requiring coordinated collaboration between two arms. However, existing methods for bimanual manipulation often rely on costly data collection and training, struggling to generalize to unseen objects in novel categories efficiently. In this paper, we present Bi-Adapt, a novel framework designed for efficient generalization for bimanual manipulation via semantic correspondence. Bi-Adapt achieves cross-category affordance mapping by leveraging the strong capability of vision foundation models. Fine-tuning with restricted data on novel categories, Bi-Adapt exhibits notable generalization to out-of-category objects in a zero-shot manner. Extensive experiments conducted in both simulation and real-world environments validate the effectiveness of our approach and demonstrate its high efficiency, achieving a high success rate on different benchmark tasks across novel categories with limited data. Project website: this https URL
Abstract (translated)
双臂操作对于机器人执行复杂任务来说是必要的但也很具挑战性,这需要两个手臂之间进行协调合作。然而,现有的双臂操作方法通常依赖于昂贵的数据收集和训练过程,并且在面对新型类别中的未见过物体时很难高效地泛化应用。 本文介绍了一种名为Bi-Adapt的创新框架,旨在通过语义对应实现高效的双臂操作泛化能力。Bi-Adapt利用视觉基础模型的强大功能来完成跨类别的可操作性映射,并且在对新型类别进行少量数据微调后,能够在零样本(zero-shot)情况下显著地将技能泛化到该类中的未见过物体。 通过模拟和真实环境的大量实验验证了我们方法的有效性并展示了其高效率,在不同基准任务中使用有限的数据就能达到较高的成功率。项目网站:[此处应提供具体链接,请参考原文献获取准确信息]
URL
https://arxiv.org/abs/2602.08425